Demystifying the Semantic Model | Automation.com

Demystifying the Semantic Model

Demystifying the Semantic Model

By Jay Funnell, David Dickson and Joel Chacon, Honeywell Process Solutions

A Canadian, a Venezuelan, and an Australian get into a car…

Far from the opening line to a joke, the following situation occurred while en route to a client site and product demonstration. There was the usual ebb and flow of traffic as the driver adeptly navigated the busy streets of Brisbane when suddenly a peculiar car drove past.

The Canadian said, "Look it's a Ranchero". The Australian said, "That's a Ute". And not to be left out, the Venezuelan chimed in, "No, no, El Camino."

Each of the riders grew up in different surroundings yet had a shared understanding of the vehicle that looked part car, part truck. They shared their individual experiences with this vehicle with three different names and did so without much formality.

This is the nature of semantic modeling. Humans intuitively describe, categorize, compare, and contrast things they encounter with their senses. When talking with others a sort of handshake takes place between minds that allows for the exchange of ideas while simultaneously increasing the understanding of the world. It seems counter-intuitive to first identify all possible aspects of something before speaking of it, yet this is exactly what traditional modeling tools ask of us.

The Traditional Approach
To illustrate the point, below is a description of the half car, half truck using a traditional relational model. It then expands the conversation and as a guide and is contrasted against the process of creating a semantic model.

To continue the previous conversation:
Canadian: I remember seeing those back in 1979. They were part car and part truck. I saw a blue one that was made by Ford. That can’t be a Ute though, aren’t those farm vehicles?
Australian: No, that’s a Sports Ute. They are popular with the younger guys. The classic Holden Ute is the one you’re thinking about.
Venezuelan: The El Camino was very similar. It was a two-seater and some had V8 engines.

The Relational Model
The relational model starts with an existing set of data that is unique to each person’s experiences, much like the data silos in an enterprise. Imagine that each table is stored in a separate repository to mimic an enterprise environment.



Next, take for example a report that contains a summary of all known half car, half truck entities in the enterprise. The data is spread across the El Camino, Ranchero, and Ute repositories even though they describe the same thing – a “half car, half truck.” In an upstream oil and gas environment it might be necessary to deal with ten to fifteen repositories or sources of master data – perhaps a well data historian, well maintenance system, geological survey database, production database, environmental and compliance system, etc. There are several options to accomplish this.

1.    Keep the data isolated

  • As the path of least resistance, this is the current reality for most enterprises. It essentially ignores integration altogether.
  • Each repository has a separate user interface and data is not shared. There is no single place where someone can “at a glance” obtain a summary of key performance indicators (KPIs) that span data sources. This is a significant barrier to becoming an agile enterprise.

2.    Leave it up to the reports

  • Each report queries each data source separately then merges the data before presenting it.

The key problem with this approach is that each report must be intimately aware of the data structures in each repository and must know how to harmonize the data between silos. Further, each new report duplicates the effort already expended by the previous report builders. Clearly this is not an efficient solution.

3.    Create a data warehouse, introduce a point-to-point integration project, and synchronize the data

  • With the relational method, it is necessary to unify these concepts into a common table to create queries and reports. 
  • The key challenges of this method include:
  • How recent is the data? For historians and real-time data sources, the synchronization process must run constantly to ensure that the warehouse is up to date. This is not an effective use of corporate bandwidth and places a huge load on the operational data sources. Therefore in practice, the information in data warehouses is often out of date.
  • What if a new column is added to the El Camino, Ranchero, or Ute tables to describe a different aspect? The “HalfCar-HalfTruck” table won’t know about it so it is necessary to go back to the data warehouse and add a new column then tell it where to get the data. This is costly from a maintenance perspective.
  • Data is duplicated around the enterprise to the warehouse; again introducing master data governance overheads and headaches.

Unfortunately, none of these solutions are ideal.



The Semantic Approach

After seeing the half car, half truck the riders didn’t have to pull over and draw up a schema in a master data management system to talk about it. Instead they simply discussed it based on their diverse yet imperfect knowledge, similar to the semantic approach.

Evolution of Semantic Modeling

The semantic database is built on the principle that the entire Internet is one large, federated database. This allows queries that span many external data sources without the need to replicate the information.

For example, Tank X’s disposition (the material in the tank) is in the scheduling database, the specification of the material is in the laboratory database, and the outstanding maintenance work orders are in the maintenance database. All of these can be federated into a single report about that tank without the need to replicate data. Better still, the semantic modeling approach does not require replicating the models for this federation to work. This significantly lowers the total cost of ownership by offering a more rapidly deployable solution and reducing ongoing operating expenditures. (Fig. 2)

Innovations in semantics now allow the master plant model to be stored in a semantic database structure. This combines a simple structure with the ability to express complex relationships.  A relational database relies on referential integrity to control the contents of the database, but a semantic database uses “rules” that are verified by the built-in inference engine. Relational databases, object databases and real-time historians are great at storing information, but they have limited or no intelligence. The semantic database is built on artificial intelligence principles, thus allowing intelligence to be gathered from the information.

For example, if FI101 is the flow of Pump X, and Pump X is upstream of Tank Y, then the flow into Tank Y can be inferred as FI101 directly from the semantic database: no programs, no duplicate data, just intelligence.

Or: if it has four wheels like a Ute; is the size of a Ute; has a tray like a Ute; then it is a Ute.

Breaking down the semantic approach


Conceptually, a semantic database is one table. The key is to reduce the metadata to simple statements that contain a subject, predicate, and object. This has its roots in the W3C RDF standard. In this example the resource called “thing1” is used to identify the half car, half truck that was spotted.

A new table was not created. In fact, a virtual table of statements was created instead. Further, it is not necessary to fully define the attributes and relationships at this time. At this point it is known that “thing1” exists and that it has three names.

Take the following statement:
Canadian: I remember seeing those back in 1979. They were part car and part truck. I saw a blue one that was made by Ford. That can’t be a Ute though; aren’t those farm vehicles?



Additional statements have now been added semantic database and is expanding on the criteria called “Ranchero.”

And add another statement:
Australian: No, that one’s more of a sporty Ute. They’re popular with the younger guys. The classic Holden Ute is the one you’re thinking about.



Now there is additional knowledge about Utes.

Venezuelan: The El Camino was very similar. It was a two-seater and some had V8 engines.



These statements have built a set from which to share knowledge. From the sub-context of the conversation it is going to assert that the Ranchero, Ute, and El Camino are all half-car, half-truck vehicles. This is accomplished using an OWL “sameAs” predicate.



The power of the semantic model is that it allows for the harmonization of data from different repositories. An oil and gas “well” might be called an “asset” in an ERP system, a “producer” in a relational database, and a collection of “tags” in a historian. We can use the notion of “sameAs” to assert equality.

Common queries
Some common queries demonstrate the semantic approach. For example, say the Canadian wants to know what things are like a (of type) “Ranchero” and what they are called; the query would be written like this:

SELECT ?thing
WHERE
{
?thing rdf:type Ranchero
}



The data is left in the original tables and federated at query time. This allows the report writer to use the vocabulary that is most comfortable.

Another common request is to find out everything that known about a particular criteria. Unlike a master data schema, this can be quite fluid in a semantic model. Progressing from the query above, the command wants to know everything that there is to know about “thing1”.

SELECT ?predicate ?object
WHERE
{
thing1 ?predicate ?object
}



This simply asks the system, “What do you know about thing1?” The response tells says that thing1 can be described as a Ranchero, Ute, or an El Camino. Now it is possible to ask what is known about these types.

SELECT ?property ?value
WHERE
{
?x ?property ?value .
FILTER( ?x = El Camino || ?x = Ranchero || ?x = Ute)
}



Questions can also be asked based on the relationships created. “What vehicles are typically driven by someone in their twenties?”

SELECT ?vehicleType
WHERE
{
?vehicleType avgDriverAge ?age .
FILTER( ?age > 19 && ?age < 30) .
}



Semantic models are an excellent approach to data federation for a number of reasons:

  • They embrace diversity. Traditional master data management techniques stifle diversity by forcing conformity with a master schema.
  • They can apply equality to similar concepts from separate ontology, such as facts found about the same entity (“a pump” or “a well”) in different systems (historian, ERP, CMMS).
  • They can be built incrementally as information becomes available.
  • They are lightweight and flexible.
  • The semantic model leaves data in its original repository and federates it on demand; a significant benefit that drastically reduces master data management headaches.

Moving forward with the semantic model

The semantic model provides several benefits over other models. These include:

Errors can be accommodated through flexibility

Traditional relational meta-models have a great deal of flexibility – until they are put into production. Once data lands in a table and reports are built, it becomes difficult to change the tables and entity relationships. Those early decisions about table structure, keys, and data types will influence every future decision for applications. Semantic models allow for structure definition whenever necessary. They also allow for the inference of structure at runtime. Semantic models are flexible and adept at handling complexity and managing change. This is especially important for expanding production environments when, for example, managing the lifecycle of thousands of assets from engineering to operate and maintain to retirement and the associated information flows.

This is not to say that traditional models are obsolete or unimportant, especially for applications that have a fairly static information model or complex transactional processing. Semantic models are better suited for supporting an ecosystem of federated data, especially in an environment where change is the norm.

Using the vehicle example, add that “thing1” has a kangaroo in the back – something that would not likely have been anticipated in a master schema. Simply add a statement to the existing table asserting the fact that thing1 “isCarrying” kangaroo.

There is no schema change required because it is just another statement. It is simple to add new repositories and new attributes to existing tables.



A virtual data warehouse can be created that spans multiple data silos, leaving the data as is

This is known as “data federation”. Traditional systems require creating a new data repository and copying all the necessary operational data to the data warehouse. This approach delays access to merged information, requires software and hardware to copy the data, and requires someone to maintain the new repository whenever a source system changes. The semantic model gets the benefits of federating data without requiring data to be copied. It automatically detects new data, which significantly simplifies the job of updating the model as things change.



The semantic model uses Resource Description Framework (RDF) to describe “things” and the relationships between them as statements, in the form of: subject, predicate, object. Taken together, these RDF statements form a “graph” or ontological model. The example above shows a plant model graph federating two other ‘databases’: a real-time database (tag, value), and a maintenance system database (work orders associated with an asset).

Different graphs can share nodes or branches. This shared information provides the ‘links’ that allow the graphs to be merged temporarily.



Once the merged graph is created, queries can be created that span multiple data sources. The data federation is temporary – it is merged for the duration of the query only and the master information source keeps the original data. This is a key benefit that supports the low overhead cost of semantic modeling. Federation also allows for the use of the semantic model without needing to know about the data structures and naming conventions of the other data sources.

Going back to the car example, the following tables illustrate how subsets of statements can exist in isolated repositories.

Conversation Repository


Ranchero Repository


Ute Repository


El Camino Repository


Each repository above can exist in isolation and then be stitched together at query time. This is the nature of federation.

Embrace standards in a non-standard environment

Data repositories often grow organically and without standards in mind. It is a costly proposition to apply standards to migrate legacy data sources. The semantic model discussed allows for the creation of a layer on top of the native data sources and the re-shaping of it into a standard model. We can then create dashboards, reports, and other visualizations that are written as if the native data was using ISA-95, ISO-15926, MIMOSA, or other standards.

For instance, take the example dataset and create a new inference rule to show the application of a standard ontology to non-standard data.



This inference rule says, “Every Ranchero is an ISO-15926 Arranged Individual”. ISO-15926 defines the arranged individual as “An individual that is an arrangement of components.”

WITH
(
CONSTRUCT
{
?subject rdf:type iso15926:ARRANGED_INDIVIDUAL
}
WHERE
{
?subject rdf:type Ranchero
}
)
The following query asks the system for all “Arranged Individuals”.
SELECT ?subject
WHERE
{
?subject rdf:type iso15926:ARRANGED_INDIVIDUAL
}

The following result is produced through inference because it is known that “thing1” is a Ranchero and therefore also an Arranged Individual. It is unnecessary to explicitly store the fact that thing1 is an arranged individual, although it can be done for efficiency. This allows users to “think” in standards even though the underlying data is stored in non-standard formats.



In all, employing the semantic model promotes a lower total cost of ownership throughout the lifecycle by better supporting rapid deployment and a lower operating expenditure within the federated data environments typical of the operational, production and asset management functions. Ultimately, this architecture will help to dramatically improve efficiency and cost-effectiveness for upstream oil and gas analysis, operations, and business.

 

Did you Enjoy this Article?

Check out our free e-newsletters
to read more great articles.

Subscribe Now

MORE ARTICLES

VIEW ALL

RELATED