Machine learning is great, but let's not forget about the physics!

At Bitbloom we have several years’ experience in using machine learning to model temperature signals in the drive train of a wind turbine, where we apply what is usually called normal behaviour modelling – training a model on a period of time when the drive train was believed to be operating normally and then using these models to predict new and current temperature signals. If the measured temperatures stray consistently above the predictions, then this is often an early indication of component failure or other issues.

This approach has worked great for us so far and generally has a good industry track record of correctly identifying issues before critical failures occur. However, machine learning is actually a pretty blunt tool. Yes, it can do amazingly powerful things, partly thanks to the very clever algorithms that have been developed, but also thanks to the large amount of computing power that is thrown at them. Have we been missing a trick here by applying practically zero physics to the problem? Sure, we lack a lot of detailed information about the drive train (unless we’re an OEM), but can we improve our models by applying some basic high-level physics?

>
“…we have shown that physical models predict drive train temperatures as or even more accurately than their machine learning counterparts with a fraction of the training time and computational effort…”

This summer we were joined by Ben Wade who has just started the final year of is MEng in Mechanical Engineering at Loughborough, to help us apply a physics-based alternative approach to modelling drive train temperatures. To be honest, this was a little speculative – we didn’t know whether such approaches could really compete with the sledgehammer that is machine learning. But we have been blown away by the success of this approach and we are now already using it to complement our existing machine learning approach. To cut to the chase, we have shown that these physical models predict drive train temperatures as or even more accurately than their machine learning counterparts with a fraction of the training time and computational effort (and are therefore also greener!). For the rest of this article, I will briefly describe our physics-based approach.

Let’s consider a scale of models from the physics-heavy to physics-absent (as in the figure): at one end we have potentially complex physics models where all input data is known in advance of the modelling – these models need no operational data apart from the input conditions in order to run; at the other end we have machine learning models where no physical knowledge goes into the model but a large amount of operational data is used to train it. The approach we have developed could be classed as a semi-empirical model and sits in the middle of the scale – we have a high-level understanding of the basic physics going on but a lot of unknown coefficients. We can then learn these coefficients from real operational data.

Take for example a model of a single main bearing. It gains and loses heat in a number of ways:

It gains heat from frictional heating of the bearings against the rotating main shaft, dependent on contact pressures, lubrication and other factors.
It loses heat to the surrounding nacelle according to its (unknown) heat capacity and thermal resistance between the bearing and nacelle.
There is also further heat transfer with the gearbox via conduction along the main shaft and also with ambient air via the shaft and hub.

These are known thermodynamic relationships and can be expressed as differential equations of the temperatures in question, albeit with some unknown coefficients. These differential terms can be accumulated to create a master differential equation to model the temperature of the main bearing. Furthermore, in a system where we want to model multiple drive train components, we can chain these together in order to form a set of simultaneous differential equations.

These differential models can be represented in matrix form and numerically integrated using input SCADA data with the standard logging timestep (usually 10 minutes). The unknown coefficients are “learnt” by running this integration inside an optimisation loop that attempts to minimise the residual (i.e. the difference) between the real and predicted temperature signal or signals.

One of the benefits of this approach is that the unknown coefficients can be learnt from a relatively short training period and produce a predicted signal that fits quite well with reality. They are able to use their physical understanding of the system to generalise, whereas machine learning models need to see a comprehensive cross-section of representative scenarios to train from in order to generalise well. This means that the latter need to be trained on several months of data if not years, covering a wide seasonal range of temperatures, to get anything approaching a good fit. The figure below compares the median training times per turbine and the evaluated (on a separate 4-month period) accuracy in predicting the main bearing temperature signal for 2MW turbines in a European wind farm. The machine learning (ML) model in this case was a random forest regressor with 40 trees and hyper parameters tuned with a cross-validation search.

What is noticeable is that the physics models already fit pretty well with a single day of training (as long as it’s a somewhat typical day!), and increasing the training time to a multi-season period does reduce the error but not by all that much. This is because, with a good physical representation, the coefficients are in theory constants, and a few cycles in ambient temperature and rotor speed variation is enough to learn these coefficients.

Furthermore, the training time of the physics models does not scale as rapidly as those of the machine learning model, with the latter needing 5 core hours to train on a year’s data! Although, the running of the physical model inside the integration loop increases linearly with training period, a longer training period with a large number of residuals allows the optimiser to find the best coefficients in a fewer number of iterations thus rendering the training time less than linear.

>
“…[by defaulting to machine learning] we may overlook alternative, more accurate and more efficient methods that allow us to reach better conclusions faster”

There is no doubt that machine learning is going to play a huge role in the operation and optimisation of wind power plants in the future, and most other aspects of life for that matter. By and large this should bring about positive change, but nature loves the path of least resistance and the temptation to default to simply throwing cheap computing power around is strong. In doing so we may overlook alternative, more accurate and more efficient methods that allow us to reach better conclusions faster.

ML might be the coolest kid on the block, but let’s not forget about physics!

Built by experts, for experts

Building a brighter future

Machine learning is great, but let’s not forget about the physics!