Situation-dependent blending of multiple forecasting models based on machine learning

Significant improvements in renewable energy forecasting have been demonstrated via situation-dependent error correction and blending of models using machine learning.
18 November 2015
Hendrik F. Hamann and Siyuan Lu

Although the cost of renewable energy technologies has steadily decreased, the economic value of generating renewable energy has also decreased as it has become a larger fraction of the total energy mix.1 This trend undoubtedly poses a serious challenge for the further adoption of renewable energy. There are many underlying reasons for this, including the deterministic rules used today for power systems, which require highly predictable energy sources. In contrast, renewable energy is inherently intermittent, and there is effectively no viable method for large-scale energy storage to mitigate the intermittent output of renewable sources. Improved forecasting of the output of renewable energy is considered the best alternative to enable cost-effective grid integration, but this has so far been challenging because of the complexity of this multi-physics problem, which involves multiple time and length scales.2, 3

Purchase Polymer Photovoltaics: A Practical ApproachA common approach to such complex problems involves a combination of physical and statistical modeling techniques, such as model output statistics and ensemble averaging.4 Recently, we showed that significant improvements in accuracy can be achieved if we combine forecasts from individual models using an approach based on machine learning, which takes into account appropriate additional state parameters, as well as those that need to be explicitly forecasted.5

The details of blending multiple models—in the context of renewable energy—can be best understood from Figure 1, which summarizes the accuracy of various forecasting approaches for different time horizons.6 Long-term forecasts use averaged climatological values or data from climate models. The model approach corresponds to the use of numerical weather prediction models, which rarely achieve the best accuracy at short time horizons due to the period of time they require to achieve numerical stability. A Lagrangian approach provides good accuracy for a few hours ahead by, for example, extrapolating the most recent upwind observations in space and time to estimate when clouds might reach a forecasting site. Finally, the Eulerian approach is based on persistence of the last observation, which has the highest accuracy at very short forecasting horizons. There are multiple sources of information and models for each approach, examples of which are listed in Table 1.

Figure 1. The accuracy of forecasts using various approaches and models is summarized as a function of the time horizon of the forecast.6
Table 1. Examples of different forecast systems and models.
EulerianLagrangian modelsNumerical weather modelsClimate models
Weather stations Advection-based models NAM (North American mesoscale) GCM (general circulation model)
Radar Time-series analysis HRRR (high-resolution rapid refresh) CGCM (coupled general circulation model)
RAP (rapid refresh)

At a given forecast horizon τ and location x, the individual models independently provide a set of forecasts Cmx) for a given parameter such as wind and irradiance, where the index m corresponds to each different model, including Eulerian, Lagrangian, numerical, and climatological models. A blended forecast Cblend with enhanced accuracy may be generated as the sum of the individual values of wmCm, where wm is the respective weighting coefficient. It is important to recognize that conventional approaches using a constant value of wm are not likely to work well in all scenarios, as the errors in the individual models often vary with the forecast horizon, location, and weather situation. Optimal forecasting accuracy may be obtained by using machine learning to derive a set of situation-dependent weighting coefficients wmx; s(E)), where s is the weather situation defined by a set of appropriately selected environmental parameters E (including forecasted values).

Figure 2 provides a systematic view of this technology as we apply it to renewable energy forecasting. A ‘big’ data bus extracts atmospheric data (such as temperature, wind, and cloud properties) from various forecasting models. For solar energy, a radiative transfer model converts this atmospheric data into irradiance values. We blend the different forecasts and then convert them into power using irradiance-to-power (for solar) or wind-to-power (for wind) models. A machine-learning module provides the blending coefficients. Initially, we train the system using historical data, but as new measurements become available it constantly improves. The training starts by analyzing how the errors of the individual forecast models depend on a large set of atmospheric state parameters using functional analysis of variance. From this analysis, we select a reduced set of state parameters E that have a significant impact on the forecasting errors of the models. The parameters E and the forecasted and measured parameter of interest are then fed into a machine-learning module to derive situation-dependent weighting coefficients.5

Figure 2. Systematic view of situation-dependent blending of multiple models based on machine learning.

Figure 3 shows two test cases of model blending, where we blended three state-of-the-art numerical weather models (see Table 1 for details). From these models, we obtained day-ahead forecasts for two parameters—global horizontal solar irradiance (GHI) and wind speed (10m above ground)—as well as candidate parameters for distinguishing between weather situations, including ground pressure, temperature, column-integrated cloud water, and so forth. We carried out training to obtain the blending coefficients using measured GHI and wind values at the seven SurfRad weather stations, which are located in different parts of the United States.7 The root mean square errors of forecasts obtained using individual models, simple blending not accounting for the weather situation, and situation-dependent blending are summarized in Figure 3. The error in GHI using situation-dependent blending is 107W/m2: see the blue bar in Figure 3(A). This is a 34% improvement with respect to the best individual model. The error in blended wind speed shows a 23% improvement: see Figure 3(B).

Figure 3. Comparison of the root mean square errors from different methods of forecasting (A) global horizontal irradiance and (B) wind speed at 10m above ground at the seven SurfRad weather stations. The validation period is from 5 July 2013 to 21 January 2014.

As well as solar and wind, we also applied these techniques to several other forecasting problems, such as temperature, precipitation, and particulate matter. In all cases, we observed substantial improvements in accuracy, demonstrating that situation-dependent blending of multiple models can serve as a general framework for enhancing modeling of complex systems. In the future, we believe that research with deeper networks, larger data sets, and more sophisticated blending algorithms will provide significant additional improvements, not only in renewable energy forecasting but also for modeling a wider range of complex physical, chemical, and biological systems. In the future, we aim to carry out research with deeper networks, larger data sets, and more sophisticated blending algorithms with the goal of obtaining even higher accuracy. This will apply not only to renewable energy forecasting but also for modeling and predicting the behavior of a wider range of complex physical, chemical, and biological systems.

We acknowledge contributions from Edwin Campos (from Argonne), Jon Lenchner and Gerald Tesauro (from IBM), Brad Lehman (from Northeastern University), Joseph Simmons (from Florida Gulf Coast University), Bri-Matthias Hodge (from National Renewable Energy Laboratory), Glenn Higgins (from Northrop Grumman), and many others. This report was prepared as an account of work sponsored by an agency of the United States government. Neither the United States government nor any agency thereof, nor any of their employees, makes any warranty, expressed or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represented that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States government or any agency thereof. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States government or any agency thereof.

Hendrik F. Hamann, Siyuan Lu
IBM T. J. Watson Research Center
Yorktown Heights, NY

Hendrik F. Hamann received his PhD from the University of Göttingen. He is currently a principal research staff member and manager for physical analytics in the Physical Sciences Department.

Siyuan Lu received his PhD from the University of Southern California. He is currently a research staff member for physical analytics in the Physical Sciences Department.

1. A. Mills, R. Wiser, Strategies for mitigating the reduction in economic value of variable generation with increasing penetration levels, tech. rep., LBNL-6590E, Ernest Orlando Lawrence Berkeley National Laboratory, 2014.
2. W. Glassley, J. Kleissl, C. P. van Dam, H. Shiu, J. Huang, G. Braun, R. Holland, California renewable energy forecasting, resource data, and mapping, tech. rep., California Institute for Energy and Environment, 2010.
3. K. D. Orwig, M. L. Ahlstrom, V. Banunarayanan, J. Sharp, J. M. Wilczak, J. Freedman, S. E. Haupt, et al., Recent trends in variable generation forecasting and its value to the power system, IEEE Trans. Sustain. Energy 6(3), p. 924-933, 2014.
4. N. Schuhen, T. L. Thorarinsdottir, T. Gneiting, Ensemble model output statistics for wind vectors, Mon. Weather Rev. 140(10), p. 3204-3219, 2012.
5. S. Lu, Y. Hwang, I. Khabibrakhmanov, F. J. Marianno, X. Shao, J. Zhang, B.-M. Hodge, H. F. Hamann, Machine learning based multi-physical-model blending for enhancing renewable energy forecast—improvement via situation dependent error correction, Proc. Euro. Control Conf., 2015.
6. U. Germann, I. Zawadzki, B. Turner, Predictability of precipitation from continental radar images. Part IV: Limits to prediction, J. Atmos. Sci. 63(8), p. 2092-2108, 2006.
7. J. A. Augustine, J. J. DeLuisi, C. N. Long, Surfrad—a national surface radiation budget network for atmospheric research, Bull. Am. Meteorol. Soc. 81(10), p. 2341-2357, 2000.
Sign in to read the full article
Create a free SPIE account to get access to
premium articles and original research
Forgot your username?