Seventh sense for fault analysis

The gas/steam turbine power plant in Budong, South Korea detects faults up to an hour in advance and provides advice on how they can be avoided. This has been made possible by the retrospective installation of diagnostic and simulation software which predicts alarms and shutdowns on the basis of current data.

Diversity is a proven concept in safety engineering: a second functionally though differently structured surveillance system sits alongside the actual automation system which it monitors and checks; if faults are identified, it intervenes to correct then in order to avoid any adverse effects. infoteam Software in conjunction with its Korean partner BNF Breakthrough and Fusion took up this concept and implemented it in a software architecture for the monitoring and surveillance of complex systems. The software architecture was intended to emulate the functionality of a complete distributed control system (DCS) on a single server. However, the attempt to use commercially available products failed at the trial stage. Chief engineer Jong-Hun Lee from Korea South East Power: “All the products we tried ended in failure. It took much too long to trace the cause of a fault, with the result that it was usually too late for countermeasures. It was also impossible to recreate the internal conditions of the control logic, so software faults simply could not be identified using such systems.” It became apparent that both multi-core PCs with RAID architecture and, above all, operationally proven and extremely powerful software had to be used.

Requirements: The requirements for the new functions were determined as part of a research project in 2005. They are:
  • Synchronous recording of process data as with a transient recorder, but also of all relevant internal program data from the DCS system and the control actions
  • At least two hours’ recording at a resolution of 20ms
  • Simultaneous display of the data as a control display screen, continuous function chart and trend chart
  • Free choice of the moment of observation for all historic data with a playback function as on a video recorder
  • Early detection of faults based on heuristic rules by means of automated database monitoring and alerting of the operators.
  • Automated analysis of the cause of the fault and output to the continuous function chart
  • Recommended action by the operators for timely fault resolution
  • Validation of changes to the process and control logic with the aid of historic data to avoid similar faults in future The “Trip Information System (TIS)” architecture is the result of the specification profile.
     
TIS architecture and mode of operation
The plant in Budong is based on a DCS system by ABB with continuous function chart programming in Progress2 which supplies data for the higher-order control system via a field bus. This distributed control system has now been upgraded to include a central data server in TIS which stores all the I/O data from the field bus in a historic database. The related logic system for the DCS system – more than 700 pages – is implemented as an IEC 61131-3-compliant continuous function chart on the data server. The TIS system benefits from the power of a modern soft PLC which enables the functionality of around a dozen distributed control systems in total to be implemented on a single PC. The original application is divided into 35 tasks in order to enable a more detailed assignment of the function unit to the application program. This is one of the prerequisites for being able to carry out specific, speedy diagnostic operations at a later stage. This mirror application is performed by a soft PLC which receives the I/O data cyclically from the database and saves all the application’s local readings cyclically every 20ms to the database. The database therefore contains all the data from the plant covering a time window of 1 to 2 hours. Clients from higher-order engineering stations can also access this database and output process displays and trend charts for the recorded readings to the continuous function chart, in addition to the program and variable status.

Operating modes
The complexity of the remit requires the coordinated interaction of very many system components, on both the server and multiple clients. Operators are overburdened by such complexity when faced by a fault and when under stress. The TIS functionality was therefore adapted to the various issues which occur during a shift by introducing the Monitor, Alarm, History and Simulation modes:
  • Monitor: The normal operating mode in which the process data is recorded on the server, and the control stations permit the staff different views of the process being controlled – ranging from standard operation and monitoring, to trend displays to continuous function chart details whose logic is animated using real-time data from the plant.
  • Alarm: Using a defined rule, the automated fault detection system has detected that at least one of the monitored signals is approaching a critical limit. The cause of the TRIP signal triggering the alarm is determined and displayed in colour-coded form on the continuous function chart so that the operator can instigate countermeasures immediately. Where there are a number of signals, the sequence of events [SoE] is recorded.
  • History: For the purposes of accurate causal analysis, the moment of observation for all analyses can be moved to anywhere within a window of two hours before the alarm to half an hour after it. As with a video recorder, the operator can then automatically reconstruct the sequence of events in 20 ms increments.
  • Simulation: To validate planned modifications, the user can change the settings of signals in accordance with the KKS power station designation system so that they are different from the historic value and therefore test the impact on the control program. However, he can also make changes to the continuous function chart and verify correct functioning with historic data.


Data processing
The data from the DCS system is directly recorded in the switchgear cabinet via communication cards which monitor all readings and control signals and transmit them to the TIS server via a 1 GHz Fast Ethernet connection. The real-time database archives the data once this has been converted from a manufacturer-specific to a standard format and has been date-stamped. Since the data in a distributed system comes from a variety of sources, it is cyclically and synchronously transmitted from the database interface to the operating time system for processing via an OPC server. The soft PLC processes the continuous function charts which were previously drawn up using the OpenPCS IEC 61131-3-compliant programming environment in several dozen processes. These accurately reproduce the function of the automation software in the DCS system and contain functions such as the monitoring of vibrations, oil pressure and temperature, speed and temperature of the gas turbine and also the pressure, throughput and temperature in the steam turbine.

But auxiliary systems such as the condenser, boiler, and water and steam circuit are also fully simulated. “Simulated” because the calculated control signals are not transmitted to the process but are stored in the database together with the internal conditions of the program logic, also date-stamped. The database therefore contains a complete, synchronous copy of all readings. During the normal Monitor mode, operating staff use various client stations to monitor and control the plant. Up till now these display options were assigned to a range of different software tools, and the simultaneous output of a continuous function chart, a user interface and a trend chart (8) of important readings in a framework application was not possible. Only with the advent of synchronised animation has it become possible to rule out misinterpretations arising from an inconsistent display.

The current readings can be compared with their permissible limit values by the operator comparing, for example, the oil temperature trend with the current turbine speed displayed on the control panel. A glance at the control logic’s continuous function chart shows him the identical figures together with the oil cooling program and the overspeed protection system. The operator can rely on an accurate match between the representation of the plant displayed on the continuous function chart and the actual control algorithm! Hardcopy documentation which is generally used, on the other hand, is usually out of date.

Alarm: the sooner, the better!
Every DCS system has alarm functions when limit values are reached and a recording function which logs the time of the alarm and its clearing by the operator. What is usually lacking, however, is a component to identify the cause and suggestions to the operator for proven recommended action to restore the exceptional situation that has occurred to its normal status. This is where TIS comes in with its heuristic control system: first the chief engineer specifies which signals in accordance with the KKS power station designation system are relevant for alarms. He selects these from the TIS signal table and then uses the TRIP Rule Editor to specify thresholds and evaluation rules that are to initiate a causal analysis. Selection is not limited to values from the process but contains all the database entries on the server.

The current values are displayed for the operator relative to their limit value. This enables warning stages to be defined before the DCS system itself would trip an alarm. While the system is in its normal data recording and processing mode, the rule monitoring system automatically continuously evaluates the rules for tripping an alarm. Since these also permit fuzzy conditions such as combinations of rules, some of the rules may be only partially complied with. However, as soon as a rule is fully complied with, TIS automatically moves to “Alarm” mode. While data is still being written to the server, the client freezes the display of the current status. The trip engine’s most important task now is to determine the cause of the alarm. This involves backtracking along the signal path in the function block program using the database. Starting from the alarm signal, the system determines which input signal in the function block changed last. Since function blocks can require a number of cycles to calculate relatively complex algorithms, the database is rolled back in 20 ms increments until a change in the input is identified.

This analysis requires close cooperation between various software components:
1. The rule engine identifies the alarm signal based on the violated rule.
2. The trip engine requests the signal names of the inputs of the last function block which output the alarm signal from the continuous function chart editor.
3. In response, the database supplies the signal name which last changed.
4. This whole process is repeated until a reading from the process has been identified as the cause in the left margin of the continuous function chart.

This process typically takes about a minute compared with hours or days for manual evaluation. Once the path information is available, the continuous function chart automatically displays the right program at the relevant location with the KKS signals’ identified path in red. Since the cause of the fault generally lies some time in the past, it is extremely beneficial for the operator to have an automatic display of all the information such as the HMI and trend chart at the time the fault was caused. The relevant signal which caused the fault has already been filtered out of the flood of many thousands of signals by TIS. However, additional information on eliminating the possible cause of the fault is also stored with each rule. For example: Alarm, pressure increase in the condenser, triggered by the rule “Pressure is less than 10% from the limit value of 0.2 bar”. Depending on whether one of the vacuum pumps is also indicating a fault and how quickly the pressure change is taking place, the recommendation is automatically made either to switch on a backup pump or to check the filter. This not only saves valuable time, it also ensures that less experienced shift staff take the right action without having to wait for the specialists to analyse the fault.

Accelerated fault analysis
Since data continues to be recorded for a certain time when an alarm is tripped, the History mode can show not only a detailed causal analysis but also a study of the effect of the action taken. The complete recording of all readings and internal data means that a fault and the process leading up to it can be comprehensively analysed, as with an aircraft’s flight data recorder. Archived data from previous faults can also be used, however, to check the effectiveness of rules and measures. The development of the knowledge base is validated by historic data. As with a video recorder, the moment of observation can be forwarded or rewound at high speed or in slow motion using the Data Manager (9). Every condition in the plant can therefore be reproduced and replicated in the HMI, on the trend chart or on the continuous function chart with historic data. The step-by-step execution of each individual cycle also helps in debugging the control logic. Whereas in the past the operator did not have access to software faults, the entire operation is now transparent.

Please click here for more information and to visit Infoteam Software's website