Efficient and effective maintenance is critical to the success of all types of enterprises. In the manufacturing, extractive, utilities, and logistics industries, any unnecessary downtime, directly affects the bottom line. Failures of building systems, both commercial and residential, lead to extra expense and unhappy occupants.

Maintenance systems can be looked at through the lens of five different practices as shown in the Maintenance Maturity Model (Figure 1):

  • Reactive maintenance. This isn't really a system at all, it's basically run to failure: when something breaks, then fix it. A surprise breakdown can occur at any moment. It could be a critical machine in a processing plant — most likely to occur at a time of maximum usage, which is exactly the worst time. It could be an HVAC system at times of maximum stress because of extreme heat or cold — again, exactly the time when it's most needed. And you'd better hope you've thought of having spare parts on hand.
  • Preventive maintenance. You have a fixed maintenance schedule, perhaps based on manufacturers' recommendations, say for lubrication or replacing gaskets. This can mean that you're shutting down too often, fixing what's not going to be broken anytime soon. Or, even if you're following one manufacturer's schedule, you might be overlooking something else that needs more frequent attention, and you're faced with an unexpected breakdown.
  • Condition-based maintenance. You open the door to much more efficient maintenance by taking advantage of continuous monitoring using the Internet of Things (IoT). The IoT connects devices so that they can share information for lots of different purposes. It could be closing the loop in complex interconnected systems of production or distribution, or providing realtime data for managers. The IoT runs on a network of sensors residing on all of the connected devices. At the same time as they are providing functional data for systems, those sensors could be providing data on the operating condition of the connected devices. That data can bring maintenance to a new level by basing it on actual real-time conditions.The data can be used in different ways. It can improve reactive maintenance by quickly identifying the point of failure. Or, it can provide data for a more dynamic system of preventive maintenance.
  • Predictive maintenance. You input real-time sensor data along with historical context such as past maintenance records, to machine-learning-based analytics software, which can make predictions about the probable time to failure. The many advantages to this type of maintenance system include being able to schedule maintenance to avoid unplanned shutdowns and schedule it for convenient times, such as overnight or weekends.
  • Prescriptive maintenance. Based on the predictive maintenance data, you know approximately when something's going to happen, and how to fix it. Your software can take steps to make sure the right parts are on hand, and the right information is available for the maintenance technician. That way when you reach the optimum time for the maintenance shutdown, you'll be ready to go without delay.

An IOT Application for Predictive Maintenance

I spoke to Jai Suri, Vice President, Product Management, IoT and Blockchain Applications, Oracle Corporation about their maintenance offering. “As far as we see, many of the companies we talk to are in the reactive phase. However, my team has built an IoT application that supports condition-based, predictive, and will eventually support, prescriptive maintenance systems,” said Suri.

Their Software as a Service (SaaS) application enables you to connect to any kind of physical asset, which could be a motor, pump, air compressor, a forklift — the list goes on — and collect detailed real-time information about each of them. The Oracle software uses this information to build a “digital twin,” which is a virtual representation of the physical asset. For example, if the asset is used in an HVAC system, the data could include fan speed, motor temperature, and vibration, as well as flow, pressure, and temperature of the inlet and output air.

Figure 2. Digital twins provide virtual representation of physical assets. (Graphic courtesy of Oracle)

The digital twin could be interrogated for this data at any moment in time. It can be used to trigger alarms for data points. It can also provide data for identifying trends and making predictions.(Figure 2)

Analytics

I asked Suri to explain how the Oracle software uses the data to determine when conditions are trending towards failure. He detailed a few different methods.

Figure 3. Oracle automatic anomaly detection.

The first is what they call trends, which is based on statistical process control algorithms that have been developed over many years by industry. For example, an algorithm could be: if six (or more) points in a row are continually increasing (or decreasing), that indicates that there might be a potential failure. Oracle has packaged together a catalogue of eight statistical algorithms that have been developed over the years for making failure predictions.(Figure 3)

There are a number of predictive algorithms that Oracle uses internally in its IoT application. A couple of them are:

Symbolic Aggregate Approximation: An operator can visualize the sensor data on a user interface. The advantage is that experienced operators can recognize patterns. Someone who's been using a machine for the last two or three years will know that if the temperature and pressure of a particular device both spike up at the same time, that indicates a failure is about to happen.

“We want to digitize the analog knowledge that exists within an organization. There are people who have knowledge in their heads out of shear experience that cannot be easily replicated by mathematical modelling,” Suri explained. The idea is to incorporate in the software what a skilled operator would do in the field: look at the data and recognize that a particular waveform means something is wrong. Oracle's software constructs a machine learning model using that pattern, which can then be continuously applied to streaming data in real time. “This was a challenge requiring a fair amount of time-series analytics research,” said Suri. “Although pattern-detection on data-at-rest is well understood, pattern detection on streaming time-series data is still a very evolving technology,” he added. The patterns don't always match the model because machine behavior is not always the same. However, on the plus side, the spikes the operator is looking for do follow a certain kind of behavior. “So, the machine learning model is not looking for a precise match, it's looking for a certain type of behavior in the sensor data that allows it to identify anomalies,” said Suri.

The Kernel Density Estimation (KDE) predictive maintenance algorithm: With this algorithm, you input historical data, typically collected over a week or two. Based on that, the algorithm models the behavior of the machine, even taking into account seasonalities, such as time of day or day of the week. Any deviation from what could be deemed the “golden” dataset, can then be detected in near-real time, say every five minutes, and would be considered anomalous behavior during that period.

KPIs and Predictions

The challenge, after identifying normal conditions and trending problems, is to make predictions for specific devices and systems. The first step is to construct a Key Performance Indicator (KPI) because sensor data by itself does not provide enough performance context. It's important to decide which are the most important measures of system performance. A KPI could be Mean Time Between Failures (MTBF), utilization, or power consumption per hour, for example. The goal is then to predict what this KPI could be a day from now or a week from now. Using a historical dataset, you construct a machine learning model for that particular KPI and then find an algorithm that is a best fit for making the prediction — Oracle has a catalogue of algorithms to start with. So, based on the KPI and the calculations from the dataset, Oracle's AutoML technology, which is part of their IoT application automatically runs multiple algorithms to find one that will provide the best accuracy. One component of the AutoML technology is to automatically retrain the machine learning models periodically in order to find better-fit algorithms as the data evolves, typically over the course of a week.

Actions

All of these results are then evaluated by a rules engine. And this is another point where manual input comes in. The operator decides: “If this anomaly has occurred, what action do I need to take?” The actions could be to send out notifications, create a maintenance work order, or send an email.

One action, particularly with condition-based maintenance, could be to send a command back to the machine to make a change, like “lower the speed” or “turn yourself off.” Data can not only be consumed, but commands can be sent back to the machine via the IoT.

Figure 4. Machine learning and AI based predictive maintenance in Oracle IoT.

With edge analytics, many of these algorithms can be pushed down to the sensor itself, so that it doesn't make the round trip to the cloud. Evaluating the algorithms at the machine enables low-latency responses. Many of the rules can also be pushed down to the device itself. Let's say the temperature's gone up: you don't want to wait until a rule is evaluated in the cloud and then something comes back, you want an action in seconds or less. Only after the rule gets evaluated and an action occurs does the cloud get notified.

Recommendations

How can all of the information be made actionable for the maintenance manager? One way to do that is to optimize the preventive maintenance schedule. One of the key inputs the maintenance managers provide to the system is to construct maintenance programs. These are based on a preventive maintenance schedule, following manufacturer input, or their own experience.

With information on anomalies, and with predictions, the system can make recommendations for optimal time windows for the maintenance. You don't want to perform the maintenance too early or too late. The preventive maintenance schedule should be merely a guideline. Ideally, it should vary significantly based on how much you used the asset, how it is used, who is using it, and so on. A recommendation might show up in the UI that could say: “Your current maintenance window for HVAC1 is three months. We recommend that you increase it to four months because your time to failure is five months.” Maintenance mangers can then decide whether or not to use the recommendations. That way, the preventive maintenance schedules don't stay static — they can evolve with time and with the actual machine data.

Looking Ahead

At this point in time the costs of hardware and software are significantly lowering while their sophistication is significantly increasing — you can get a really high-powered computing system onto your machine for literally dollars.

And communication costs are down. With 3G/4G cellular, it'll cost around $5 or $10 a month to connect every asset. That's a lot if you have 10,000 or 100,000 assets. With new technologies such as LoRa, 5G, and NB-IoT, however, the costs are around $2 - $5 a year, per asset. Also, protocols are becoming much more standardized. You used to have to invest in huge amounts of software to run this kind of data pipeline within your own enterprise. Now, you can just buy a subscription and run everything in your cloud.

Predictive maintenance is poised to become much more widely adopted, and that will help to boost productivity in a time when we really need it.

This article was written by Ed Brown, Editor of Sensor Technology. For more information, visit here .