Creating the Next-Generation Historian with the Cloud |

Creating the Next-Generation Historian with the Cloud

Creating the Next-Generation Historian with the Cloud

By Jan Pingel, Product Director, Process History and Analytics, Honeywell Process Solutions

& Matthew Burd, Chief IIoT Solutions Architect, Honeywell Process Solutions

When discussing the new era of connected technologies – be it consumer Internet of Things (IoT), or the emerging concept of the Industrial Internet of Things (IIoT) – the cloud can often be found in the center of many conversations. After all, the cloud is considered a trendy and exciting technology, and most everyone involved in technology has a point of view about it.

What about the process historian? It may not be seen as exciting, but the truth is it’s a very important piece of the IIoT puzzle. In fact, people can’t afford to not include the process historian in IIoT discussions because it has one of the most-critical roles to play in bringing it to fruition. This is because IIoT requires three main elements:

  • The ability to securely gather and access data
  • Analytics technology that helps make sense of the data
  • Domain expertise that can determine how to act upon the data

The process historian is the lynchpin to the first step, and (with new technological advancements) can provide a solid platform for the second.

As enterprises digitally transform, the historian must also evolve to bring IIoT to reality. And yes, the cloud plays a significant role in this evolution. So in many ways, conversations about the cloud and process historians go hand in hand.

For these reasons, having a greater handle on the new technologies that will be required to support massively scalable enterprise historians is more important than ever. Understanding how the technology will start to contain many other data types, and being able to present that data in more useful formats for big data analysis and visualization is also vital in the new business environment.

Moving Historians to the Cloud

The process historian was introduced in the 1990’s and remains a key component of every process facility’s system architecture. Higher-level historians were gradually introduced for the enterprise while the main process historians handled day-to-day management and plant floor improvements. Typically, it was connected to two or more process historians through a firewall. Instead of collecting data directly from devices, the enterprise historian gathers information from the process historians, so the same data is available to all enterprise network users.

The cloud marks the next evolution in this space; In order to understand the cloud’s impact on historians, though, it’s essential to understand the drivers behind cloud adoption.

More data storage is needed for smart device proliferation, and the cloud is the obvious solution. Cloud technologies scale better and have capabilities to better utilize large and complex data sets with so-called big data analysis technologies (a key component of IIoT). In fact, a recent survey of 200 manufacturing executives indicated that over two-thirds of companies are already investing in data analytics, and most plan to increase those investments.

Even as many companies are forced to cut budgets, cloud investments are rising, and it’s clear to see why. Flexibility and scalability is much higher, cost of ownership is lower based on the economy of scale with the cloud. Systems can be set up much faster, based on scaling existing systems up to cover customer needs, as opposed to installing new systems from scratch.

One of the most obvious benefits is that a cloud model shifts computing resources from the customers’ facilities and data centers to a cloud provider, reducing infrastructure requirements. The cloud provider can leverage greater economies of scale to offer this at an attractive cost.

The benefits of offloading computing resources apply whether an existing historian is simply re-hosted in the cloud or it is re-implemented as a Software as a Service (SaaS) offering. A SaaS offering offers further benefits by taking over the routine software management and maintenance, which is often more expensive than computing resources.

The real value of the cloud, though, is the ability to provide a better solution for managing and using historical data. Today, companies are demanding more data be collected and available, but current historians’ scalability is often limited. With the cloud, a site that traditionally collected basic process data at minute intervals can now look to collect two or three times more tags at much higher frequencies. Cloud technologies support much greater ability to scale in terms of both throughput and storage than current historian architectures that tend to be bound to a single server. They also employ clustering and load-balance, as well as new storage options to support virtually unlimited scaling.

The ability to scale also supports multi-site scenarios. Today, individual sites in an enterprise tend to have their own historians, making it difficult to perform cross-site analysis and troubleshooting (such as from corporate centers of excellence). With the cloud, it is possible to bring data from all sites into a common environment, making them equally accessible from the enterprise level. Finally, cloud technology differentiates itself by supporting big data analytics. Technologies like Hadoop, R, Python, etc. can support rich analysis of process data combined with other data types to uncover insights not achievable with existing historian tools.

The industry has used various approaches to pull process data into cloud based solutions, where the most predominant seem to be either virtualizing the process historian in the cloud or pulling the data into a so-called “data lake”.

Virtualizing the Historian in the Cloud

The obvious step to reduce on-site hardware infrastructure is to virtualize servers in the cloud. Most process historians support virtualization of the software with VMWare, Microsoft Hyper-V, etc. to make server management easier and more flexible, and this can also be used with the cloud. Manufacturers are already virtualizing server components in the cloud, including their process historians. This approach is also offered by some process historian vendors as a first step toward cloud applications. Instead of customers virtualizing their servers themselves, vendors will offer pre-configured cloud technology.

This approach provides better scalability because virtual images can scale more easily than physical computers, and can better share the overall server farm resources. Scalability is still limited, though, by the historian’s traditional architecture. Though virtualized in the cloud, it is not using any new cloud technology, and cannot scale as large or scale on the fly, as would be possible with actual cloud technologies used today.

Ultimately, this approach merely provides a historian in the cloud as opposed to residing in the customer’s own server farm. It is not providing any new technology or any new features, and the primary reason to choose this approach is reducing hardware infrastructure and cost of ownership.

The Data Lake Approach

This approach is all about pulling any type of data into a common, less structured database in the cloud, where new cloud tools can manipulate the data and find correlations in the data that are normally very hard to find with traditional tools. This new big data technology is called “data lakes”.

Many customers and vendors are looking at big data technologies to derive more value from process data. Whether it’s Hadoop or other data lake technologies, columnar databases like Cassandra or other approaches, the goal is generally to load them with large volumes of process data to support analysis. In most cases, customers and vendors are still evaluating the ability of the data and tools to support predictive analytics or other value-adding outcomes. Some have managed to populate the data, but are uncertain how to proceed.

One issue is that raw data itself isn’t sufficient. Despite some vendor claims, it’s not realistic to just “point the tools at the data” and expect meaningful results. Process data generally lacks structure, making it difficult to combine with and compare to other data. A useful step is organizing the data according to an asset model, giving context to the process values and allowing easy comparison of similar assets such as compressors or heat exchangers. It’s often necessary to relate this data to other sources such as maintenance records, which may identify failures or other periods of interest to correlate.

The process of uploading the raw data, organizing it and relating it to other data - often termed “data wrangling” – can consume 80 percent of the effort of an analysis project, usually up front before any meaningful analysis can be performed. A solution that addresses these wrangling issues in a systematic way can greatly speed time to value for this type of analysis.

Another characteristic of typical big data tools is that they don’t differentiate time-series from other forms of data. This isn’t a major issue for off-line analysis, but these tools can struggle to deal with the type of interactive time-series queries that are typical in the process industries, especially when it comes to aggregation requirements and performance. An ideal solution would provide specialized time-series capabilities to support operational needs (fast queries, troubleshooting tools, time-series pattern matching, etc.) while also fitting into the big data analytics framework.

Other Approaches

Other approaches are starting to emerge that include infrastructures of data based on data lake technologies combined with context and analysis tools. Customers then work with vendors on pushing the data to the cloud, typically by generating large off-line files ultimately made available for off-line analysis.

These systems cannot be considered historians, as they are not presenting data in real-time. The analysis results can still be used to ultimately improve a process on the customer’s plants and sites.

The Next-Generation Historian

A Model of the Honeywell Approach to IIoT

With the advancement of IIoT, the lines between process and enterprise historians will eventually blur, if not altogether disappear. Cloud deployment is one of the biggest reasons for that change. There are four major aspects the next-generation cloud historian needs to support:

  1. Traditional time series data, alarm and event data, etc: Traditional tools can be used to visualize and analyze data. Most analysis and root cause detection on process data is still done more efficiently by visualizing data over time to track anomalies and related process variables.
  3. Data lake for big data-type analysis: This is a key driver for medium-to-large organizations looking at cloud technologies. All plant and site data should be pulled into this environment, so that new advanced tools can be used to detect hard-to-find correlations.
  5. Broader data types: All relevant data is stored in the data lake, and the tools can be used on top of that system, without having to connect to anything else – both for simplicity and for performance. Aside from time-series data, it should store:
    • Alarms and Events
    • Production data
    • Transactional data
    • Application data
    • Geo location data
    • Complex data
    • Internet data such as weather, real-time pricing etc.
  6. Enterprise asset context data: When working with massive data sets, it is very hard if not impossible to perform proper analysis without asset context. Tag names are primarily only known to local process engineers and operators. Once data is pulled into the cloud and available to the enterprise, more relevant and useful data context is needed for users to make sense of it, and to perform relevant correlations (either across plant and site data or across similar assets in the enterprise).

The next-generation historian based on cloud technologies must be more than a traditional historian virtualized or developed for the cloud. It also must be more than a lake of unstructured data. The cloud historian of the future must be a combination of both, and much more. It must be the data platform for all cloud applications, as well as for applications on site connecting to the cloud.


Related Articles

Gateways, Partitions and Tenants: A Checklist for Building the Right Architecture for Cloud-based Historians