In a recent installment of the International Society of Automation’s “Ask the Automation Pros” series, Erik Cornelsen, automation and process control engineer at DPS Group in Scotland, asked: “In a continuous-process manufacturing plant in which data is cyclically recorded, for example at every 0.5, 1 or 2 seconds, what interesting insights can we get from this large amount of historical data, and which are the main challenges to extract value from it?” What follows are highlights of the responses from ISA Fellows, senior members and other automation professionals. Click here for the full story.
George Buckbee, PE, ISA Fellow and president of Sage Feedback LLC
What a great question! Historical data is a tremendous asset and is often highly underutilized. Some studies have shown that less than 0.5% of data is analyzed. A typical oil refinery might have 3,000 to 5,000 control loops, each updating PV [process variable], setpoint [SP], output [OP] and mode every 0.5 to 2 seconds. Uncompressed, that could be as much as 15 MB per hour.
Insights can be developed using software tools and analysis. Control loop performance monitoring (CLPM) software should be an absolute must in today’s world. This will help you identify which loops are performing poorly and help direct you to corrective actions. Most plants do not have enough staff to address all the issues, and CLPM software can help you prioritize those areas that need your attention the most.
With CLPM software, you can identify poorly tuned controllers, instrumentation issues, valve problems, operator overload and even changes in the process. CLPM tools are not just for control engineers, but can also deliver insights to operations, process engineers and management. When CLPM results are benchmarked against peer operations, tremendous insights can be gained.
There are many challenges in extracting value. First, many historians are not gathering data at the frequencies you suggest. Many plants are still defaulting to once/minute data, which is too slow to be used for analysis of process dynamics and control. Furthermore, data compression settings in historians can eliminate even more information. In today’s world, data storage is extremely cheap, and you should not risk losing insights by compressing data.
Second, historical data must be appropriately organized to allow you to extract related information as a group. Each controller’s PV, SP, OP and MODE should be grouped and/or named similarly.
Third, data historians do not always capture the relevant context of the process. For example, the historian should contain flags or methods to determine if the process is in startup, shutdown, product transition or some other abnormal state of operation. You should always analyze process and controller data in the context of the overall operation.
Fourth, the historian and CLPM system should be monitored to ensure they are functioning properly. Too many things can go wrong: server updates, communication losses and permissions issues can cause a loss of continuity between the control system, the historian and analytical systems. The data is useless if it is not communicated.
Michel Ruel, retired engineer, ISA Fellow and recognized expert in process control
I often explained to my clients that they are "data-rich but information-poor." While they may have a wealth of stored data, it’s crucial to recognize that this data can serve as a goldmine of valuable insights.
First, it’s important to distinguish between raw data and actionable insights. Raw data can be extensive and overwhelming, but effectively transforming that data into meaningful information can significantly enhance decision-making processes. Additionally, I emphasize the power of algorithms in identifying critical issues such as oscillations, valve stiction and performance problems within control loops. By deploying these advanced tools, organizations can proactively address inefficiencies and improve overall system performance.
Continuous real-time monitoring of key variables (such as PV, SP, controller output (CO), mode and alarms) is essential. This ongoing vigilance not only ensures optimal operations but also leads to quicker response times and more efficient management of control processes.
While automation plays a crucial role in data analysis, it is equally important to have skilled personnel overseeing this process; these persons should have an excellent understanding of the process. Human oversight helps capture nuances that might otherwise be overlooked, ensuring that the insights generated are contextually relevant and actionable.
By focusing on these aspects, organizations can better leverage their data to derive meaningful insights and drive improved performance.
Julie Smith, retired global automation and process control leader, DuPont
I agree with everyone that there is a lot of data out there, and it’s a challenge to turn it into usable information. Performance metrics are one way to mine the data, but they require training to interpret it properly. Sometimes, a simpler tool works better. I’ve found graphical methods can be helpful in many cases. Not only the standard variable versus time or metric over time, but also unconventional graphs like histograms or PV/OP scatter plots. Scatter plots are particularly useful in finding hidden [oscillations] caused by over-tuned loops. I recall one process had an overly aggressive loop that, over a 24-hour period, generated a beautiful Fibonacci spiral. That made it easy to convince people that a change was needed.”
Mark Darby, independent consultant with CMiD Solutions
A key function of a typical historian is its trending capability. Imagine tuning or troubleshooting a control loop with just a few trend recorders and the error indicators on board-mounted PID controllers. Time-series trends are useful for cause-and-effect analysis and assessing dynamic behavior. XY plots help assess static relationships and nonlinearities. A function I believe is not used enough is histograms (or frequency plots), which are useful for assessing variance and the shape of the distribution (e.g., similar to Gaussian, or multiple peaks, non-symmetric, etc.), as opposed to just reporting the mean and standard deviation.
Others have reported on the problem of using compressed data. Non-compressed data is preferred, but if the compression parameters are correctly set, it will not be a problem. If the compression is overdone, it can lead to problems of causality, such as output responses appearing before input changes.
Historians often include calculation capability and some may include access to programming languages like Python for performing more sophisticated calculations and analytics. Historians are often accessed to bring data into 3rd party tools for fitting models. While normal operating data is often not informative (think of noisy data imposed on a nearly constant signal), careful "slicing" out of "bad" data can yield good results. Slicing criteria include data that represents loops not in the proper mode, saturation of valves (at 0% or 100%), upsets resulting in large deviations and sections in which there is little movement in setpoints and measured disturbance variables.
For loop analysis, especially flow and pressure loops, one-minute data is not sufficient. Fortunately, modern historians support faster data collection. If faster data cannot be achieved, one needs an external data collection routine to support faster collection, which can be applied to a limited number of user-specified variables temporarily.
Historians are often the source of data from an executed plant test, not just analyzing longer-term history. One must be concerned about extrapolation from old data.
Pat Dixon, PE, PMP, president of system integrator www.DPAS-INC.com
Historians are an essential part of any industrial operation. Those who do not learn from history are doomed to repeat it. Our historians are full of data, yet much of it is rarely seen. That effectively turns historians into digital landfills. Data can decompose if it isn’t used when produced. While history can inform present-day decisions, aging data is not as pertinent as recent data.
Data at frequencies higher than one second can be useful for analyzing machine health, but is not typically stored in a historian. Process data at a one-second sample rate or longer can be useful for process analysis, but faster is better if you want to ensure accurate dynamic correlation and identification.
It is not necessary to store every piece of data in a historian. Data that rarely changes, such as configuration or tuning changes, do not need to fill historians with one-second samples. However, failure to include an important process value means the data is lost and cannot be retrieved.
Accumulation of data in a historian can cause storage demands to grow while much of the information becomes stale.
To manage this load, historians rely on exceptions, compression and separation of short- and long-term data so that fidelity is preserved without overwhelming memory. Exceptions filter out unimportant samples near the source, although these settings must be updated as instruments age. Compression removes low-value or repetitive data using algorithms such as swinging-door compression. Long-term averages further reduce storage requirements. Today, storage cost is less of a constraint than in the past, but historians must still support backfilling and future forecasting. Most facilities have far too many tags for humans to monitor, so contextualization tools like Asset Framework organize signals into equipment- or process-based structures that make information easier to use. Data also gains value when combined across multiple facilities, making cloud connectivity essential.
Artificial intelligence [AI] is becoming a critical tool in Industry 4.0 for forecasting, maintenance, diagnostics and control. Achieving these capabilities requires overcoming challenges such as overfitting, limited datasets and black-box models.
Techniques like physics-informed neural networks improve accuracy and interpretability, while sensitivity analysis helps expose model behavior—though correlated inputs must be considered. High-quality datasets remain a major challenge, as industrial data contains noise, sparse lab samples and limited process variation.
Ultimately, the value of data depends on having a complete data analytics ecosystem. This includes connectivity, visualization, preprocessing, analysis and collaboration. A robust solution must connect to diverse data sources (process data, lab data, analyzers and databases), visualize large datasets effectively, provide powerful tools for cleansing and preprocessing, support a wide range of analytical methods and enable collaboration across teams and sites. With these capabilities, organizations can turn their "digital landfills" into strategic assets. In the 4th industrial era, those who understand how to use data—not just collect it—will lead.
Greg McMillan, ISA Fellow and retired Senior Fellow from Solutia Inc.
Data historians often have too much compression and too slow an update rate that hides the process response intricacies and the critical, accurate identification of the total loop dead time, especially for fast loops. In a conversation I just had with Peter Morgan, we are both on the same page that the compression and update rate set by information technology (IT) is often too large and slow, which is particularly concerning for the investigations of incidents. The knowledge that is lost cannot be recovered, so we must be proactively involved with setting the compression and update rate before commissioning an automation system.
Additionally, I advocate for measurement spans to be as small as possible since the accuracy of most measurements other than Coriolis and magnetic flowmeters are in a percent of span with limited rangeability. Also, control valves must be precise throttling valves, not oversized, since resolution is percent of span and [valve] friction near the closed position is much higher. Operation near the closed position has much worse nonlinearity, resolution and lost motion. Limit cycles from a poor valve response are confusing to say the least, hiding incidents and other problems, particularly in terms of tuning and interactions. The actions of on-off valves associated with control loops need to be included in the data history.
Since I have personally experienced stroking times of more than 100 seconds for some large valves, the historization of limit switch feedback is needed for state-based control to automatically deal with incidents and for procedure automation to automate startups, product grade and type transitions.
“Online loop performance metric programs need to be employed to automatically compute from data history metrics such as maximum absolute integrated error and peak error for load disturbances and rise time, overshoot, undershoot and settling time for setpoint changes. To generate good data for data analytics and artificial neural networks, the manipulated flows and all sources of change (e.g., raw material and recycle stream flow, temperature and composition) need to be included. Hopefully, there are setpoint changes to capture setpoint response metrics. Ideally, tests are conducted where the controller is put in manual and a small output change is made to identify the open-loop and closed-loop load response metrics. The controller is only momentarily put in manual to see the closed-loop response, but traditionally needs to be in manual for 98% response time for a traditional identification of the open-loop response time of a self-regulating process.
I created a method to identify essential open-loop dynamics in a test duration within six dead times by identifying dead time, slope and inflection point. I developed for the digital twin a mimic rapid modeler block that can identify the essential dynamics of a self-regulating, integrating and runaway process in less than six dead times. Processes with large time constants are treated as near-integrating, enabling a transition to integrating process tuning rules that have proven to be advantageous. I also developed for the digital twin a mimic PID [proportional-integral-derivative] performance block that can automatically capture the metrics for the PID responses to load disturbances and setpoint changes. I provided a user’s guide and shared test results via Word and PowerPoint files. (Send me a message on LinkedIn with your email if you want me to send you these supporting files.)
There is a concern about too much transmitter damping and signal filtering affecting data integrity. The running averages are started at the beginning of the representative part of a continuous or batch operation. Time periods could be about eight hours to capture shift performance. Shorter time periods are used to provide prediction of batch endpoints and the procedure automation capability to deal with startup and abnormal conditions.
While there are many KPIs [key performance indicators] used in industry, the most impressive are those that show the benefits in dollars from increases in process efficiency, capacity and flexibility on a dynamic, ongoing basis.
