Digital twins evolved from physical system twinning with the advent of digital technologies. Digital technologies such as the industrial Internet of Things (IIoT), industrial Ethernet, cloud computing, data analytics and artificial intelligence (AI) have created a class of cyber-physical systems (CPS) that have enabled digital twin implementations across all industrial domains. Digital twins link a physical system with a virtual representation for organizational understanding, optimization and predictive capabilities.
The use of digital twins helps to mitigate process uncertainty due to variation and emergent properties typical of mass customization. Although any type of computational system can convert process data into process knowledge, a digital twin can combine aspects of human intelligence with machine intelligence to provide optimized process surveillance, control and planning. A review of enabling technologies and foundational concepts can enable a better understanding of digital twin design, implementation and operation.
Digital twins are a type of CPS used widespread across industries to support smart manufacturing. Within the manufacturing domain, a digital twin is a “connection of data between a physical entity and its virtual representation that is made for the purpose of improving the performance of the physical part using computational simulations and techniques.”
Although currently championed in the manufacturing domain, digital twins initially began in the aerospace industry. They have pervaded multiple industries and product/service lifecycles now and generally support the Industry 4.0 (smart manufacturing) concept.
Digital twins use technologies such as IoT, industrial Ethernet, cloud computing, data analytics and AI to integrate physical systems with digitization technologies to increase operational visibility, system optimization and prediction. These enabling technologies allow digital twins to optimize the interaction of physical processes with human operators to overcome nonlinear and unpredictable cause-and-effect relationships between processes, resources and people. Unpredictable (non-deterministic) relationships result from emergent properties in complex systems.
Digital twins rely on the networking of components and the insights into performance available from the process data. Data can be processed through a combination of human and machine intelligence to turn the data into operational knowledge.
An understanding of enabling technologies can help with understanding how digital twins operate and are implemented. Furthermore, learning a bit about systems theory—concentrating on complex systems and emergent properties—can help one appreciate how digital twins can be used to solve complex problems that human operators may not be able to resolve in real time. Lastly, a distinction between the conversion of data into information and then into knowledge coupled with the differences between human and AI can help automation professionals understand how digital twin designs that use a hybrid intelligence approach can provide an optimized digital twin solution.
Smart manufacturing and Industry 4.0
Mechanization of industry during the 1700s and early 1800s marked the first industrial revolution. The introduction of the internal combustion engine and electrification differentiated the second industrial revolution in the late 1800s. Electronic technologies began the third industrial revolution in the 1900s and replaced analog with digital devices. The digital revolution quickly enabled a fourth industrial revolution that shifted the focus of mass production to mass customization. This new industrial era is referred to as Industry 4.0 (or smart manufacturing).
Smart manufacturing “refers to the state of manufacturing, in which real-time transmission and analysis of data from across the product lifecycle, along with model-based simulation and optimization, create intelligence to yield positive impacts on all aspects of manufacturing.” Industry 4.0 is a phrase used to describe widespread integration of information and communication technology in industrial production. Both are synonymous and the two terms can generally be used interchangeably, although smart manufacturing is a narrower definition than Industry 4.0.
Industry 4.0 uses information technology (IT)/operational technology (OT) tools such as digital twins to enhance organizational agility to successfully adapt to events and changes in the operating environment. Industry 4.0 provides organizations with the ability to facilitate, accelerate and optimize organizational decision making with adaptive capability to changing environments. Industry 4.0 is focused on “interconnection, information transparency and decentralized decisions” founded on automation to achieve optimization on a large scale.
Industry 4.0/smart manufacturing combines many different enabling digital technologies, such as the Industrial Internet of Things (IIoT), industrial Ethernet, data analytics, cloud computing and AI, to support mass customization. Each of the enabling technologies contributes to the creation of digitized models of a physical process for the purpose of analysis and support of organizational strategies. Further, each enabling technology provides for the horizontal and vertical integration of modern production systems, all of which are used by digital twins.
Internet of Things, industrial and otherwise
A high-level definition of IoT is “the networking capability that allows information to be sent to and received from objects and devices (such as fixtures and kitchen appliances) using the Internet.” A more thorough definition is “the Internet of Things describes the network of physical objects—‘things’—that are embedded with sensors, software and other technologies for the purpose of connecting and exchanging data with other devices and systems over the Internet. These devices range from ordinary household objects to sophisticated industrial tools.”
IoT is becoming as pervasive as the Internet, experiencing strong growth in various fields and is also constantly evolving with continuing advances in technology. The number of IoT-enabled devices is expected to grow from 18.8 billion in 2024 to 41.1 billion in 2030.
The IoT and the IIoT provide digital twins with data representing past and current process conditions.
The IoT is a network interface that is embedded into everyday objects, enabling them to seamlessly communicate with other devices and provide the end-user with a comprehensive and integrated set of data. For example, typical IoT devices are sensors integrated into many of the relatively commonplace physical objects. The IoT is characterized by self-configuring capabilities, interoperable communication protocols and seamless integration over a global network infrastructure that can map any monitored physical property with a virtual representation.
The IoT connects devices such as utility meters (monitors a process) and Fitbits (monitors people) that collect data such as location, vibration, motion, temperature, etc. All collected data is connected to any other device over the Internet to provide real-time process and/or condition monitoring thanks to its sensing technology coupled with the communication capabilities it provides. By connecting the physical world with the digital world, IoT connects intelligent buildings for improvements in efficiency and security, enables smart cities to intelligently manage traffic and also enables agricultural innovation. Some examples are asset tracking (e.g., parcels and pallets), process tracking (e.g., HVAC and water treatment plant operations) and living organism health monitoring (e.g., vital signs).
IIoT refers to the application of IoT within industrial settings. IIoT is a technology that is applied to and used in the overall product lifecycle stages. A formal definition of IIoT is provided as “a global infrastructure for the information society, enabling advanced services by interconnecting physical and virtual things based on the existing and evolving interoperable information and communication technologies.” IIoT adds value to organizations by creating an interchange of data and ideas between manufacturers, operators and the equipment.”
The IoT and the IIoT provide digital twins with data representing past and current process conditions. Typical IIoT devices used for digit twins include location sensors, optical/vision sensors, light sensors, image sensors, sound sensors, temperature sensors, heat sensors, electrical sensors, pressure sensors, motion sensors, orientation sensors, physical movement sensors, biosensors, vital sign processing devices, wearable sensors, identification and traceability sensors, etc.
Industrial ethernet
The application of Ethernet technologies in the industrial setting is referred to as industrial Ethernet. The digitization of industrial sensors, controllers and actuators in industrial settings incurred networking and integration requirements to enable them to communicate. Previous fieldbus-style industrial connectivity gave way to Ethernet technologies to better enable the integration of different devices from different suppliers. Using Ethernet technologies allowed both horizonal and vertical integration of organizational process control and automation systems within the corporate level IT domain.
Ethernet has been used in industrial settings since its adoption in the 1980s but has faced difficulties in some applications due to its non-deterministic behavior. The solution was to implement various industrial related protocols using the Ethernet link referred to as Ethernet/Industrial Protocol (EtherNet/IP). EtherNet/IP integrates the standard hardware and TCP/IP protocols (Ethernet frame) with additional control and information. Some examples include Modbus TCP (located in the application layer on top of TCP/IP), Profinet (can use either TCP or UDP) and EtherCAT (supports networked devices in the traditional master-slave relationship).
The disparate nature of open and proprietary EtherNet/IP protocols has enabled the growth of IIoT integration into industrial settings due to suitability in various network topologies. The additional control and information presented in the EtherNet/IP environment has contributed to digital twinning of industrial environments.
Cloud computing
Cloud computing is not a singular technology but a concept for computer architecture that takes advantage of remote computational resources. The National Institute of Standards and Technology (NIST) provided a definition for cloud computing as a model and not just a technology: “Cloud computing is a model for enabling ubiquitous, convenient [and] on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.”
Cloud computing is the concept of moving software services, computational resources and data storage services from local devices to remote data centers via the Internet. Cloud computing can deliver many more IT-related resources on demand than are typically available on local hosts.
Cloud computing is an enabling technology to AI due to the large amount of processing power required locally but available remotely and accessed seamlessly. Remote resources can also be moved to intermediate devices on the network edge.
Cloud computing is an enabling technology to AI due to the large amount of processing power required locally but available remotely and accessed seamlessly. Remote resources can also be moved to intermediate devices (i.e., not local nor remote) that provide on-demand services closer to where the data are collected, i.e., the network edge.
Digital twins typically use cloud or edge computing services due to the large computational resource requirements. Although cloud computing provides the digital twin with the benefit of unlimited computational resources, it also provides security challenges.
Big data and data analytics
Data grew from megabytes (MB) to gigabytes (GB) in the 1980s and then to terabytes (TB) by the 1990s. By the 2000s, data was being referred to as petabytes (PB). As of 2018, the global population produces 2.5 quintillion bytes of data every day. Most of the daily production of data is distributed, unstructured and incomplete.
NIST defines big data as “the large amount of data in the networked, digitized, sensor-laden, information-driven world.” Big data is essentially very large and unstructured datasets. Data analytics refers to the use of computational resources to analyze big data to collate unstructured data into patterns, correlations and other useful information.
Data analytics is able to identify and transform data into information to support organizational decision-making. It is used to collect (mine) data, then clean it for outliers or errors, model attribute relationships and then provide analysis of the newly structured data. For example, data analytics can identify correlations and patterns that may hold value in understanding organizational processes or the environment.
When transforming big data into useful information, data analytics relies on a variety of data attributes: volume, velocity, variety, veracity and value. Volume refers to the quantity (scale) of data, velocity refers to the speed that data is produced and processed, variety refers to the different types and sources of data, veracity refers to the correctness of data and value represents the usefulness of the data.
Data analytics, ML and AI are core enabling technologies for digital twins.
Data analytics typically makes use of visualization tools to help organizational use of the information and convert it into knowledge. It typically uses machine learning (ML), which is a subset of AI. Data analytics, ML and AI are core enabling technologies for digital twins.
Artificial intelligence
Computer sciences have developed programs and computer services to perform variable and complex tasks beyond simple repetitious activities that require human intelligence. The goal of AI is to enable computational technologies to mimic human behavior. AI can be implemented in systems to provide real-time decisions based on data and user input. It is able to analyze much larger amounts of data in a shorter timeframe than skilled personnel can perform in even larger timeframes.
Machine learning is a subset of various AI technologies (i.e., ML is a type of AI, but AI is much more than ML). ML technologies allow computational devices to learn from a set of data. It applies algorithms to the data to extract useful analysis and insights without being specifically programmed to do so. ML uses predefined algorithms to infer relationships and build digital models based on data, and then makes decisions and predictions without being directly programmed to do so.
Digitization
Digitization is a general-purpose technology that refers to the process of gathering information from objects (e.g., IoT) and processes and then converts the information into a digital representation of the object or process. Computational technologies such IoT, industrial Ethernet, cloud computing and data analytics enable information gathering and virtual modeling (digitization) to be automated. Automation lowers the cost of data production and integration into organizational infrastructure systems.
Digitization involves computerization and connectivity of organizational processes. Computerization transforms manual processes using computers to perform repetitive tasks more efficiently. Connectivity is the process of integrating and linking each separate computerized process together into a single system such as a manufacturing execution system (MES).
Data generation occurs in all activities across both the horizonal and vertical aspects of an organization. Digitization of objects and processes provides the means to coordinate previously uncoordinated activities, thereby improving efficiency at the system level.
Digitization efforts help break the barriers between product design and manufacturing through simulation and interaction and is the first part of the Industry 4.0 transformation. It is the foundation for digital twins.
Cyber-physical systems
The advent of digitization and other Industry 4.0 enabling technologies link physical systems with software applications to create cyber-physical systems. A CPS is a physical and engineering system that monitors, controls, coordinates and integrates physical elements by using computing and communication technologies. Within production applications, a cyber physical production system (CPPS) is a physical and engineering system that can overcome limitations of the personalized production process, as it is designed to improve the performance of manufacturing systems.
Similar to the open systems interconnection (OSI) model, CPSs can be abstracted in service layers such as the physical layer, network layer, virtual layer and application layer. As an example, the physical layer within a CPS is the actual data generated by plant instrumentation such as IIoT-enabled devices.
The CPS uses data to virtually (digitally) represent the physical system. CPS data can consist of static data required to define the physical element (configuration data), runtime data gathered from the continuous changes in system parameters and states and model data used to optimize the virtual representation. For example, runtime data can reflect the operation of the machine and the inventory data.
The service-oriented architecture (SOA) of a CPS helps structure the CPS in a horizontally and vertically scalable series of applications that can be joined together to provide a combination of services that are adaptive, flexible and extensible. The CPS allows various options and implementations that best fit the system requirements.
A digital twin is a type of CPS and uses an SOA that can be abstracted in terms of dimensions and frameworks just like any other computer system.
Systems, emergent properties and variability
A system is a set of elements (objects, processes, etc.) that have relationships between them. Systems can be natural (e.g., weather systems) or manmade (e.g., water drainage system). Systems range from simple to complex. Simple systems are completely predictable with highly visible inputs, transparent actions and predictable outputs. Complex systems cannot be completely understood or predicted by comprehensive knowledge of each element, relationship or process due to unpredictable emergent properties and system variability.
Emergent properties. Emergent properties are system states or outputs that cannot be defined in terms of the individual elements (objects, processes, etc.). Traits, characteristics, properties and behaviors emerge such that the system is much more than the sum of all its components. For example, a water molecule behaves in a much different manner than either the oxygen atom or the two hydrogen atoms. Emergent properties can be static and dynamic. Dynamic emergent properties are the result of randomness within the system or its inputs (i.e., changes within the system produce new and unexpected system properties) and are the hardest to predict.
In a simple system, inputs and environmental interaction form a consistent relationship and result in predictable properties. They can exhibit emergent properties, but such properties are completely predictable. In a simple system, expected properties emerge from the system in a consistent manner.
No level of knowledge of elements and their relationships can provide an understanding of the emergent properties, whether they are new properties or the disappearance of existing properties.
Complex systems may have a larger number of elements, several many-to-many communication channels and higher levels of relationships between elements. The biggest marker of a complex system is the degree of predictability. Complex systems cannot be understood solely at the component level. No level of knowledge of elements and their relationships can provide an understanding of the emergent properties, whether they are new properties or the disappearance of existing properties. Complex systems exhibit system states that are difficult to predict and emergent properties that are not only unplanned/unexpected, but also undesirable.
In manmade systems, emergent properties can impact the system’s function, performance, utility and cost. People cannot model and simulate all possibilities of a complex system. However, current computational power can model and simulate most possibilities, even in complex systems. Using digital twin technologies, organizations can assess internal and external impacts to complex systems to identify unpredictable system states and emergent properties to aid in decision making.
Variability. The range of values for each element, relationship and process within a system affects the variation or change in emergent properties and future states. Variability includes the range of values for each element, relationship and process within a system as well as different timescales. For example, solar energy varies each second and can also vary over decades due to local and large-scale weather and climate. The greater the range of elements and timescales within a system, the greater the range of total possible outcomes and inability to accurately predict those outcomes.
System variability creates uncertainty. For example, in manmade systems (such as a manufacturing plant), variability starts at the range of physical characteristics for each raw material/component parameter. Process variability adds to component variability due to the ranges of processing time, speed or other process variables such as level, pressure or temperature. External variability also exerts influence on the system, for example changes in the technology, process and market conditions.
Within a system, all elements and processes are related. The larger the number of processes (or states) within the system, the greater the correlations between each element increases the system uncertainty. Variability and uncertainty create dynamic interference that hinders the planning, understanding and execution of operations.
Digital twin technologies enable organizations to automatically and continuously predict system outcomes based on internal and external variation (e.g., mass customization).
Mass customization and smart manufacturing
Industry 4.0/smart manufacturing has promoted the mass customization of products and services that provide customers with a high differentiation level inexpensively. Mass customization relies on IT/OT and communication technologies (i.e., industrial Ethernet) across the entire value chain due to the continuous adjustment of organizational processes from idea conceptualization to final delivery. Mass customization relies on AI/ML technologies due to the dynamic interference of process requirements to deliver customized products and services in small batches that are characterized by multi-variety and large randomness.
Mass customization introduces a large degree of randomness in pre-planned processes. For example, a manufactured process experiences variation in its production orders based on specific customer requirements, which in turn induces unplanned variation in the production time, production sequence, production quantity and production equipment of the products, which also affects distribution and logistics. Mass customization aggregates the variability and uncertainty at each stage of the production process. Further, inefficient collaboration between different organizational groups is also compounded by rapid product development and delivery.
Digitization technologies are used to enable mass customization. Digital twins are used to optimize mass customization by enabling flexible and sustainable production systems for individualized demands.
Getting from data to knowledge
Intelligence can be attributed to both humans and computers. It is based on data as the basic building block, transformed into information through context, and eventually realized as knowledge. Data is the raw material for information, and information is the raw material for knowledge.
Data. The raw building block of intelligence is data. Data is any information, symbol, fact or statistic gathered through observation or analysis but devoid of any meaning without context or purpose. Data can be quantified, measured, counted and stored and is used for both human learning and machine algorithms.
Data can originate from the IoT sensors (lowest level) to organizational enterprise-wide processes (highest level). It can originate from assets, people, activities and their interactions. Data can also originate externally to a process to provide a more comprehensive model of the system.
When used within a digital twin (or any CPS), data should be fresh (i.e., current and not out of date) and dense. Data density is the degree the data is linked to the actual object of measurement.
Information. Data can become meaningful information when collected or processed such as categorization or counting. When context is applied to raw data, the data can be interpreted. Information is defined as “data that has been categorized, counted and thus given meaning, relevance, or purpose.” For example, binary code (data) can be processed into graphical objects (information).
Knowledge. Knowledge is based on information that has been accumulated and analyzed to provide a new context or meaning. It is therefore not just combined series of information, but information compiled in such a manner that gaps can be filled in to understand what is lacking.
Human intelligence. Intelligence can be defined in terms of the “ability of mental activities in knowing, perceiving, remembering, problem solving, reasoning and understanding.” Human intelligence further includes the ability to take general content and then process/analyze that content to support further relevant knowledge. It includes a large number of different components to include cognitive abilities, learning and memory, creativity, adaptability and communication. Human intelligence also includes categories such as emotional intelligence and social intelligence.
Human intelligence can perform computational problem solving but is generally limited by time and capacity. Any single person or group of people has a finite amount of capacity to think and a finite amount of time to learn. As a result of human limitations, humans have created artificial intelligence, but ultimately human intelligence decides artificial intelligence applications, their use and how they affect society.
Digital twins apply human intelligence via physics-based models.
Artificial intelligence via machine learning
ML is a subset of AI. ML applies optimization algorithms to historical data over a period of time to solve/optimize a solution based on an identified objective or function. It works by identifying patterns, rules and correlations in data and improves itself over time [through] repetitive trial and error. Similar to human intelligence, AI can detect outlying (potentially anomalous) information as well as fill-in any missing data. Digital twins apply ML via probabilistic-based modeling. The data’s volume, velocity, variety, veracity and value largely determine which type of ML approach is implemented.
Hybrid intelligence. Hybrid intelligence refers to systems that combine artificial and human intelligence. While AI is better at rule-based systems involving big data and complex computations, human intelligence is still better at complex problem solving involving ambiguity and can more easily adapt to different contexts. Further, human intelligence is still required to generate new knowledge through the lens of creativity, emotions and social situations.
Digital twins often combine physics-based models (human intelligence) with probabilistic-based models (AI) for optimal performance. Human intelligence is also used in some digital twin applications to provide a secondary verification (e.g., human-in-the-middle).
Final thoughts
Digital technologies such as the IIoT, industrial Ethernet, cloud computing, data analytics and ML have enabled smart manufacturing and Industry 4.0. Smart manufacturing/Industry 4.0 relies on CPS that combine the organizational IT and OT domains for continuous computational modeling of complex production systems. A digital twin is a type of CPS that implements many different digitization and computational technologies to increase operational visibility, system optimization and prediction. An understanding of enabling technologies can further the understanding of digital twin design, implementation and operation.
Mass customization leads to process variability, which is a hallmark of smart manufacturing. Digital twins help solve nonlinear and unpredictable cause-and-effect relationships within complex systems involving processes, resources and people. Understanding variability and emergent properties can help understand how digital twins can provide process simulation, optimization and prediction.
Digital twins can use physics-based models predetermined by human intelligence or rely on ML for extracting process knowledge from raw data. A combination of human-based intelligence and AI can provide a hybrid digital twin for optimized performance.
An understanding of enabling technologies and foundational concepts can enable a better understanding of digital twin design, implementation and operation.



