- By Fredrik Wartenberg
- August 24, 2020
- Feature
Summary
Machine learning (ML) is present in many aspects of our lives, to the point that is difficult to get through a day without having contact with it.

Machine learning (ML) is present in many aspects of our lives, to the point that is difficult to get through a day without having contact with it. The most common example is doing a simple Google search, trained to show you the most relevant results. But ML can also be found in our smartphones, through assistants like Siri or Alexa. The technology is also starting to approach safety critical domains as autonomous driving and surveillance powered by facial recognition.
All these applications have been made possible by a combination of research, commercial factors, and the availability of data for generating and training the models underlying them. In the industrial context there is also the promise that machine learning will help predicting when to perform maintenance on machinery, identify anomalies in machine operations, or help process engineers to identify the factors which make the difference between a good or bad product batch.
Although these introductory remarks by no means constitute a deep analysis of the relatively slow take-up of machine learning techniques in the industrial domain as compared with other areas, there are several factors which make its application in industries fundamentally more difficult than in products directed to the final consumer.
1. Data readiness
In a plant with highly specialized processes, there is a lot of data available. Data is collected from the machines' condition monitoring systems, from process control and monitoring, from the administrative systems, and documentation sources. Some of it will be stored continuously as time-series data in historian databases.
Although the data storage is both vast and long term and, thus, should constitute a perfect base for machine learning, there are some fundamental hurdles that need to be overcome for making the data useful. Some of those may seem trivial, like to associate the data channels ('tags') in the Historian Database with relevant meta information to allow for such basic tasks as selection of the right channels, interesting time periods, or just to get the correct units of measurement. Besides, there is still the task of ensuring data integrity, by identifying non-functional sensors, missing or out-of-range values, or reallocation of measurement points.
While these tasks seem easy to solve, they may become a difficult problem due to lack of integration of data sources, organizational structures, missing documentation, among other factors. In addition, there is a huge amount of side information in the form of maintenance logs, alarm logs, visual inspection logs along with the sensor measurements that need to be taken into consideration when attempting to understand and analyze the sensor data. In short, the way to data-readiness in industrial applications is much harder than in many other areas.
2. Data annotation
Using data for machine learning will often require some connection between the observed data to the 'ground truth'. In other words, the observed data needs to be made interpretable so that actual decisions or conclusions can be drawn from it.
In many cases, an application will require an annotated data set to train the models that will be used for prediction. This becomes a challenge because data annotation can only be performed by a very exclusive group, namely the experts working with the specific industrial processes or assets. Due to the complexity of the processes and data, this is where the real bottleneck is.
In comparison to training machine learning for language processing operations, for example, mostly everybody is expert enough to write a transcript of a recorded speech. This is the second fundamental difference between ML in industrial applications and the more established areas.
3. Application scale
Industrial processes are to its nature very specialized, which means that there is no economy of scale in this area. It is not possible to directly apply a solution developed for a car manufacturer into the food industry, for example. Even processes in the same plant will require different approaches. This influences the amount of research relevant for machine learning use in those areas, which may be less than in others.
Another consequence is that projects become more expensive and complex, as solutions already available in the commercial or public domain require some degree of customization. However, profitability can still be reached if there is a solid business case behind the ML project.
How to solve it?
There is no quick path for building machine learning applications in the industrial area. Going for the implementation without first preparing the data will most probably be unsuccessful, which is reflected in the accounts that most projects for predictive maintenance fail. So, the journey towards it needs to start from the basics by ensuring data readiness, expert annotation of existing data, and model building and optimization.
One area to specifically focus on is to help the experts to integrate, visualize, and annotate the data more efficiently. In this domain, much can be achieved by employing unsupervised approaches to machine learning, in which algorithms find interesting or relevant events, patterns, or time periods in the high dimensional and complex data characteristic of the industrial domain. It can be a tough path, but the outcomes of unsupervised and active learning approaches make the life of experts–maintenance and reliability engineers–much easier and allow them to perform the necessary step towards machine learning for prediction in an effortless way.
About The Author
Fredrik Wartenberg is Data Scientist at Viking Analytics, a start-up from Sweden that offers self-service analytics software used by domain-experts to prepare, analyze, and organize large sensor data without advanced data-analytics skills.
Did you enjoy this great article?
Check out our free e-newsletters to read more great articles..
Subscribe