Institutionalizing Alarm Management

Introduction
What is alarm management? Most plant personnel equate alarm management with reducing alarms; however, this is only one piece of the puzzle. The whole puzzle involves providing operators with enough information to prevent abnormal situations and to prevent the escalation of those abnormal situations that cannot be prevented.
 
Poor alarm management results in billions of dollars lost every year to accidents, equipment damage, unplanned plant or unit outages, off-spec production, regulatory fines and huge intangible costs related to environmental and safety infractions.
 
At the end of the day, it is clear that alarm management is one of the best, and best understood, activities that can bring significant hard benefits to your organization. This paper reviews the benefits of alarm management, the causes of alarm problems, and a description of a proven approach that many industry leaders have adopted to realize the benefits of good alarm management practices.
 
The objectives of this document are to:
  1. Review alarm management benefits
  2. Provide a brief overview of the problem
  3. Review the phases needed to implement alarm management
    1. Review common approaches and tasks successfully used to realize results
Alarm management is about safety, the environment, optimizing operations, and increasing corporate profits!
 
Alarm Management Benefits
Often the return of an alarm management project is thought of as an insurance policy or as a hedge against the likelihood of costly process disturbances—rather than as a source of tangible financial benefits and direct ROI. But sound alarm management practices are becoming recognized as the cornerstone of regulatory compliance, improved operational integrity and productivity, and enterprise-wide business improvements that repay investment in such initiatives.
 
More and more, proper alarm management is becoming a precondition to extending a facilities operating license. Also, with health and safety regulatory bodies closely monitoring the way facilities operate, alarm management provides a concrete demonstration of a plant’s commitment to the safety, environmental and well being of the local community. Too often it takes significant downtime due to an unexpected, costly incident before management will endorse an alarm management project. Fortunately, more companies are now proactively engaging in these types of initiatives.
 
So, is there a hard cost associated with alarm management? It is hard to argue with the HSE, Responsible Care™, OSHA, OSH, ISO, global regulatory bodies and insurance companies. For example, the insurance industry spends $22 billion per year on equipment damage claims and, as a result, is forcing many insured facilities to implement alarm management solutions. The Health and Safety Executive is also cracking down on alarm management by enforcing deadlines for facilities if they want to retain their operating licenses. Finally, many companies that are members of the Responsible Care™ program are likely in the midst of improving their alarm management programs as this has been strongly recommended over the past few years.
In addition to licensing, regulatory compliance and incident prevention, improved alarm management provides further business benefits:
 
    • Proper staffing levels must be maintained to handle abnormal situations, but alarm management practices make it possible to assess and streamline resource requirements. Loops per operator and consideration of other operator responsibilities provide an abstract picture of the true workload. By factoring alarm rates and operator actions into this equation, a more precise and concrete representation of workload may be developed. If control room consolidation is a goal, one should measure alarm loads and pursue alarm improvements to ensure the reduced staff count is able to safely and efficiently control the process.
    • Studies show that facilities experience 3 - 8% in production losses due to abnormal situations throughout a year. This only represents smaller cumulative losses and not the larger, well-publicized catastrophes.
    • Reduced insurance premiums upon completion of an alarm management project for insured facilities are available, depending on the insurer.
    • Alarm Key Performance Indicators will benchmark and measure best-in-class sites throughout an organization. These alarm KPIs when compared to a site’s unplanned outages, lost production, revenue, trips/incidents, and profit margins, can then be compared across all facilities to understand the relative performance of each and to identify opportunities for improvement.
  • Finally, alarms identify problems that cost money. If you have nuisance alarms, you need more operators or run the risk of encountering significant upsets. If they are legitimate alarms, it is an indicator that the operation is not healthy and costing money. A few simple examples that come up frequently are:
    • Alarms that indicate variability or tuning problems
    • Alarms that indicated valve problems
    • Operator actions showing the need for specific training or advice
    Fundamentally, alarms are indicators of lost revenue or profit.

The Alarm Management Problem
A couple decades ago, hard-wired alarms were the main mechanism for alerting control room operating staff to potential problems. Given the cost of running wire, the addition of alarms was done very sparingly, so operators were rarely flooded with alarms during abnormal situations and, instead, relied more heavily on strip charts, panel lights, and field operator support to diagnose problems.
 
The advent of the Distributed Control System (DCS) provided significant benefits in improving control, as well as alerting operators to potentially costly or dangerous situations. But, since the addition of alarms was perceived to cost nothing, a rigorous engineering process was not always employed when new alarms were configuredAs a result most facilities alarmed virtually every reading, creating a much more costly issue—alarm floods that prevented operators from properly assessing the root cause of problems. Increased levels of automation and fewer operators taking responsibility for increasingly larger plant areas compounded the difficulty of handling large numbers of alarms. Even seasoned operators faced the challenge of understanding and knowing how to monitor and handle specific events, especially during stressful, safety-related situations where their colleagues’ health and safety were at risk.
 
Another problem, often not considered, is misdiagnosis of a seemingly insignificant alarm. Understanding the meaning of an alarm and the consequence of not responding to it is crucial to operator effectiveness. For example, if a redundancy status alarm rings in on your DCS, do your operators know if it will shut down your entire control system and thus your plant? Most engineers are unaware of what the true implications are for certain alarms, so how can the operators be expected to know? Proper alarm engineering can aid in understanding all these issues by providing online documentation and alerting operators effectively with consistent display of the severity of the alarm and how much time can elapse without response before the situation escalates to a new, more urgent level.
 
This is what caused the fundamental alarm management drive to reduce the number of alarms and provide operators with improved online alarm guidance.
 
The final issue that many face is lack of experience and understanding of how to establish and execute an alarm management project, leading them to put off alarm management activities in favor of "higher priority" issues. Hundreds of plants have gone through alarm management exercises to date. With the vast experience in industry today, a skilled facilitator and effective products enable alarm management to be implemented quickly and effectively, resulting in even more than the business benefits described above.
 
Institutionalizing an Effective Alarm Management Program
How do you go about tackling the alarm management problem?
 
Clearly alarm management is necessary to leverage and sustain plant assets (equipment, devices, DCSs, and people). The first step in institutionalizing alarm management is getting site or corporate buy-in. It is only when the majority of the stakeholders agree to this that the cultural change can be made and the maximum benefits realized. Chances are if your plant manager has read this far, he has recognized the need, and this section outlines the process and options (both business and technical) to effectively institutionalize alarm management.
 
A phased methodology that provides a proven process and infrastructure that will suit all sites’ needs based on financial and resource availability is recommended.
 

Phase 1: Installation of enterprise alarm and event historian

A history of alarm system performance is necessary to provide concrete evidence of your facility’s performance. First it makes it possible to define practical and concrete goals within your alarm management philosophy that are realistically achievable. But more importantly, it is this information that will guide your alarm management process, enabling you to focus on problem areas and achieve measurable (Six Sigma) improvements. It is this type of product that sustains the improvements to the alarm system and your investment, by monitoring the performance over an extended period of time. Time, after all, is part of what has caused alarms to be added and their performance to deteriorate (plant and equipment degradation, process and equipment changes, expansions, incidents).
 
Enterprise alarm and event historians that collect alarms and events in real-time also provide an operational and engineering support tool. For example, the inside and outside operators can leverage an historian for shift hand-off procedures or as a tool to review and monitor what happened on the previous shift. It may also be leveraged to review incidents, by correlating alarms and operator actions with process data, and to verify safe shutdowns and startups.
 
Studies resulting from partnerships with the top oil, gas, and chemicals producers in North America and Europe have identified need for facilities to benchmark all plant areas across the entire enterprise to compare alarm performance across like facilities. Using Alarm Key Performance Indicators and having this information available online, in real-time, the entire enterprise is instantly able to monitor plant alarm performance, which is a key indicator of production reliability.
 
Depending on which category your facility falls under, your alarm management resources should be directed to the areas of greatest improvement and risk. Financial and safety gains can also be measured by correlating corporate financial and operating KPIs to alarm performance. By measuring corporate KPIs against the alarm performance indicators throughout the life of the alarm management project, a reasonable benefits analysis is achieved.
 
A simple, rolled up view of the overall alarm performance is available for senior technical management, or plant management, so that they do not need to worry about whether numerical targets are on track or not. Instead, a summary of the plant state is illustrated in Figure 1. Better than EEMUA targets the state is "Predictive", meet EEMUA benchmarks and the state is "Robust", a little bit off and the state is "Stable", and with more serious alarm problems a facility will be assigned a "Reactive" or "Overloaded" state.
 
Figure 3 provides trends that show the details behind how the category was derived, which enables the appropriate engineer or technician to start investigating or understanding the alarm problem by looking at the correct issues. For example, you may have constant alarm issues because your average rate is too high, in which case you will start to look at your nuisance alarms, or you may have alarm flooding problems in which case you may look at your chattering and parent child alarms and how your control strategy impacts them.
 
The bottom line is that measuring enterprise alarm performance and monitoring it over time, is facilitated by simply looking at a web page – with no additional work required.
 

Phase 2: Create an alarm philosophy

Once you know how your facility is operating it is possible to document objective and measurable goals in your alarm philosophy. Creating an alarm philosophy is an essential part of the process. The alarm philosophy is the guiding principles and targets by which you configure alarms and measure alarm performance. Most philosophies cover the following criteria at a minimum:
  1. What is an alarm
  2. How priorities are set based on criticality and time to respond
  3. General alarm considerations, e.g. How to deal with BADIO alarms
  4. Alarm performance criteria and resolution activities

These are but a few of the key categories to consider when developing your alarm philosophy. It is strongly recommended to consult the Engineering Equipment Materials Users Association (EEMUA) #191 publication available on the www.eemua.co.uk website. In addition, generic philosophies based on other manufacturing sites for various industries are also available.


From an overall business perspective, the philosophy document outlines operational guidelines for alarm management and also, directly or indirectly, the business objectives and criteria to abide by. If your alarm performance objectives are achieved after phase 4 or phase 5, is it necessary to implement a high-end alarm management process?
 
Phase 3: Top 20 Review
The Top 20 review meeting is intended to identify problematic alarms and fix them.
 
The Top 20 review is incorporated into a site’s existing operational policies. It may be part of a monthly Tail Gate Safety meeting or, more often, the review is done during a weekly operational or engineering meeting. This review is done regularly because the process, equipment, and outside environment are constantly changing, and thus alarm parameters need to be adapted to these changing conditions. Periodic monitoring keeps alarm counts down and helps identify other problems, such as tuning, valve sizing, transmitter issues, and many other performance-limiting or safety-related issues.
 
Once the Top 20 review is incorporated into a regularly scheduled meeting, the process takes very little time, typically between 5 and 10 minutes, plus the time to fix problems that are costing money or that are a safety hazard. The benefits, however, by far outweigh the time invested.
 
The process that is followed will vary, and should take into account each site’s operating practices and available time. However, in general most sites include analysis of the most frequent alarms and an analysis of any recent floods or trips for the week or month in their alarm reviews.
 
The details of the review meeting can be as simple as bringing in a chart of the Top20 alarms and flood analysis charts, then verbally reviewing what the alarm problems are and what should be done about them. In most cases, change and work orders can be signed on the spot to make the process as efficient as possible.
 
A more detailed report may be used for each meeting and can be implemented if a dedicated alarm management champion is warranted. A template for weekly or monthly reports can be used repeatedly to provide consistent, detailed information for each alarm and its potential cause.

Phase 4: Documentation and Operator Assist
In some cases a more comprehensive alarm rationalization is required. The objective is to reengineer the alarm priorities and trip limits consistently, and to provide the operators with online documentation of:
 
  • the causes of a specific alarm,
  • how to verify it is in fact the problem suspected,
  • corrective actions,
  • and the consequences if the alarm is not handled properly

This online information helps operators to respond more effectively to abnormal situations.
 
The review process entails having the P&IDs in hand, and reviewing and documenting every tag. In addition, the consequence level of not responding to the alarm is documented along with the maximum time that can elapse without taking action before the consequence occurs or escalates. Typically most companies follow these guidelines from the EEMUA #191 documentation to then assign a priority.
 
Another key benefit of this review is that it is an effective way of capturing the knowledge from veteran operating staff and passing it on to a less experienced crew.
 
For documentation and rationalization, a Management of Change product is instrumental in quickly rationalizing alarms. Conventional products can limit reviews to 100 tags per day; but a Management of Change product enables a more rapid review through its ease of use. This type of management facilitates a faster process by organizing and linking tags in a logical manner (tag name or P&ID), and through smart auto fill-in technology. For example, the documentation fields provide intelligent auto-complete options.
 
After going through all this effort, all alarm documentation can be viewed online, in real-time, via the web, either through a real-time viewer by selecting the alarm that just rang in and choosing to view the documentation.
 
Phase 5: Management of Change
This is the monitoring and verification of any authorized or unauthorized changes to the engineered alarm-related settings. This includes trip points, priority, controller mode, setpoint, alarm creation, deletion, disabling, and inhibiting on any tag.

To function effectively, an alarm management of change product should provide a master alarm database where the engineered settings are stored, along with the alarm documentation for each alarm. By automatically synchronzing with the DCS, the alarm management of change product can notify appropriate personnel of any discrepancies.


Phase 6: Dynamic Alarming

Dynamic alarm management is often viewed as the ideal target to achieve, in essence, telling the operator what the problem is and how to handle it. First and foremost, the key to effective dynamic alarm management is to ensure the alarms on the control system are good, and thus this should be one of the last phases in any alarm management process. If there are a lot of ineffective or nuisance alarms on the DCS, then the dynamic alarm management package must also deal with the excess information.
 
Once the DCS alarm system housekeeping is taken care of, a key discussion that typically comes up is whether the dynamic alarm management package should reside on or off of the DCS. Dynamic alarm management solutions, in reality, are in their infancy, but the appropriate answer can be both on and off DCS. Given that there are options to consider for dynamic alarm management, the following are but a few that may be appropriate:
  1. Simple on DCS logic using cut out or suppression logic
  2. On or off DCS bulk changing of alarm settings for well known plant states
  3. On DCS logic for dynamic limit changes
  4. Off DCS operator advisory for root cause alarm problems (which alarm in the flood?)
  5. Off DCS operator advisory for predictive process disturbances or equipment monitoring
From a safety perspective, fast response times are needed and often DCS logic is the best solution. However, there are certain states that are recognizable, and in some cases, can only be recognized by an intelligent external application.
 
Regardless of the chosen solution, implementing dynamic alarm solutions on or off the DCS requires a complete operational model for the entire plant, or areas that will be monitored. All the different process conditions, operating states, and potential failures must be identified and handled appropriately.
 
Given the far-reaching dynamic alarm management possibilities it is important to outline and identify what dynamic alarm objectives you wish to achieve, what it will take to maintain the solution, and the expected benefits both short term and long term.
 
Conclusion
Providing operators with enough information to prevent abnormal situations and to diminish the impact of unpreventable abnormal situations is the key to an effective alarm management process. The phased alarm management approach offers a proven, measurable and easily implemented methodology that will guide a plant to a higher level of safety, environmental and production standards. The significant costs associated with accidents, equipment damage, unplanned plant or unit outages, off-spec production, regulatory fines, etc., will be drastically reduced as each phase is realized. Alarm Management is not just about reducing alarms; it is about responsible plant management and increased plant efficiency.

About The Author


This article was written by Jeff Gould of Matrikon.  Matrikon provides web-based products and optimization services to leaders in oil and gas, petrochemicals, cement, utilities, pulp and paper, automotives and heavy equipment, discrete manufacturing, pharmaceuticals, and food and beverage processing. 

Click Here for More Information

Did you enjoy this great article?

Check out our free e-newsletters to read more great articles..

Subscribe