When Failure is Not an Option: The Problem with Focusing on Machine Failure

Written by Brian Turnquist, Boon Logic

Posted on:

September 8, 2023

Predictive Maintenance Value Pillars

When we talk with customers about what they are looking for in their predictive maintenance strategy, we usually hear them say some version of “we want to know in advance if our asset is going to fail.” This seems like a fair statement of value, but there is so much more that is needed in order for a predictive maintenance strategy to provide value to your reliability teams. A predictive maintenance strategy must include three pillars of value: usability, scalability, and accuracy. If it lacks even one of these pillars, it will topple under its own weight.

Value Pillar 1: Usability

One of the most important transformations that should occur when implementing your predictive maintenance strategy is a shift from a reactive and scheduled maintenance approach to a predictive asset health approach based on live sensor telemetry. The outputs of your predictive maintenance system must be usable in the following sense:

Concise: The predictive analytics platform you choose must distill the thousands of live sensor values collected from hundreds of assets into a few simple, usable asset health measurements.

Prescriptive: Usability also means that asset health measurements should point maintenance teams toward possible root causes of non-compliant assets.

The Amber Warning Level
For each asset being monitored, Amber distills the complex relationships between sensors monitoring that asset into a simple value called the Amber Warning Level, which ranges from 0.0 to 2.0.

  • Equal to 0.0 (Compliant): The asset is fully compliant, showing no variation or only minor variation from its trained model.
  • Close to 1.0 (Changing): The asset is changing, but not yet critical. There is enough variation from the trained model to pay attention to this asset.
  • Close to 2.0 (Critical): The asset is significantly outside of its model of compliant behavior. Maintenance should be scheduled immediately.

Root Cause Analysis
When the Amber Warning Level is 1.0 or greater, the asset’s health is changing and there is a need for reliability teams to begin a discovery process and to plan maintenance. For this purpose, Amber provides a root cause analysis vector of values corresponding to each of the sensor telemetry values collected from the asset. The largest values in the vector indicate the sensors most implicated in the asset non-compliance, as reported by the Amber Warning Level. For example, Amber’s root cause analysis for the cobot in the figure below shows that joints 3 and 4 are implicated in a non-compliant Amber Warning Level. This may indicate a larger than expected load on the cobot or that joints 3 and 4 have significant mechanical wear. In the particular case below, no load was present when one should have been (nothing picked up in a pick-and-place motion) and that deviation from the compliant model was most strongly detected on joints 3 and 4.





In the diesel engine figure below, root cause analysis shows that the Fuel Rate, Boost Pressure, and Accelerator Pedal Position are the implicated sensors in the asset non-compliance. Maintenance engineers know from experience that this points to the fuel system. Indeed, in this case, a fuel injector failed several weeks later.





Value Pillar 2: Scalability

One of the common “gotchas” of a predictive maintenance strategy happens when a team of data scientists and reliability engineers builds a predictive model for a commonly used high-value asset in their production process. This typically takes several months. They then try to apply the model to the many other similar assets in their process and find out it does not work. The reasons for this transfer learning failure are described in our blog about universal models. This “data science + lots of time” approach is not scalable across an entire production process.

Amber is self-configuring and builds a model of compliant operation customized for each asset, using Boon Logic’s proprietary unsupervised machine learning approach. Data scientists are not needed. Instead, the reliability engineer who knows the asset simply chooses the sensor telemetry streams that are relevant to asset health. Amber takes it from there and builds a high-dimensional, customized model for the asset, transitioning automatically from buffering, to learning, to monitoring mode.

Value Pillar 3: Accuracy

Defining the meaning of accuracy in a predictive maintenance strategy seems straightforward. A common definition is: Accuracy = Sensitivity + Specificity.

In the case of asset predictive analytics, we have

  • Sensitivity: Non-compliant asset behavior is detected when it occurs. Customers will say something like “I need to get early warning of an asset failure”.
  • Specificity: Compliant asset behavior is not flagged. Customers will say something like “I don’t want a bunch of false alerts wasting our time.”

Sensitivity: Here’s where this common definition gets tricky. Assets valuable enough to apply a predictive maintenance model cannot usually be run to failure. That is why they are serviced well before the end of their remaining useful life. As a result, “sensitivity” typically must be interpreted as detecting abnormal conditions in the asset that can lead to asset failure. These conditions may include abnormal vibration of rotating assets, abnormal current draw of servo motors or transformers, or abnormal relationships between telemetry values like RPM, current, vibration, temperature, and pressure.

When it comes to detecting these non-compliant sensor relationships, there is no technology more sensitive than Amber, which trains high-dimensional unsupervised machine learning to characterize the compliant behavior of each individual asset. (See our blog on the challenges of using statistics for predictive analytics.)

Specificity: In practice, this criterion means there should be no false alerts to waste the time of maintenance teams. Here is a story we have seen repeated several times. One of our customers had a high-value compressor essential to their manufacturing process. As with many OEM systems, there was a built-in condition monitoring system provided. In addition, the customer had installed their own vibration sensors and developed their own threshold-based system as part of their predictive maintenance initiative. Third, they connected Amber to the vibrational sensors to test its unsupervised machine learning approach.

So…three monitoring systems: OEM-based, customer home-grown, and Amber. Around the end of December, Amber started flagging anomalies. The other two monitoring systems showed no change. In early February, Amber was flagging even more anomalies. The other systems showed no alerts. To the customer, Amber’s warnings may have appeared to be false alerts. Finally, in March, after a third elevation of Amber’s warning level and still no alerts from the other two systems, maintenance teams were dispatched, and they found a cracked bracket and broken welds around it. As I am writing this blog, I just heard from another customer who had this same experience, although this time it was the OEM comparing its built-in monitoring system to Amber and to a third system built by one of their customers. Once again, Amber’s apparent “false alerts” were in fact early warnings not detected by the other monitoring systems.

In detection systems, there is an ironclad rule that increasing sensitivity decreases specificity and vice versa. Amber is no exception to this rule. However, Amber’s high-dimensional unsupervised machine learning models allow it to push this rule back further than any other technology. Amber provides more than 10 machine learning based measurements for each asset telemetry vector sent into it. These ML measurements account for changing degrees of non-compliance, the constant ping-ping of background anomalies common in complex assets, and the upward and downward trends of asset health. Amber gives maintenance teams the most comprehensive, sensitive, and specific measurements possible on which to base a data-driven reliability strategy.

Dr. Brian Turnquist is the CTO of Boon Logic. Brian has worked in academics and industry for the past 25 years applying both traditional analytic techniques and machine learning. His academic research is focused on biosignals in neuroscience where he has 15 publications, collaborating with major universities in the US, Europe, and Asia. In 2016, Turnquist came to Boon Logic to apply these same techniques to industrial applications, especially those focused on anomaly detection in asset telemetry signals and video streams.

Protect your critical assets.

Detect signs of failure 6 weeks in advance
  • No data science required

  • 6 weeks average warning before asset failure

  • Up to 500 tags in single model

  • 5-minutes to create a model from scratch

  • All within your AVEVA PI System

Make failure and downtime an anomaly