Machine Learning, Big Data, and Manufacturing: The Perfect Trifecta

May 16, 2019
Feature

Summary

By Michael Schuldenfrei, OptimalPlus Big Data Analytics (BDA) tools have changed how organizations everywhere do business, and they are already saving many electronics manufacturers millions of dollars today.

Machine Learning, Big Data, and Manufacturing: The Perfect Trifecta

By Michael Schuldenfrei, Chief Technology Fellow, OptimalPlus

What if you could learn how to improve your manufacturing from every item that you made? Every unit can teach us something if only we have the tools to enable us to learn. Big Data Analytics (BDA) tools have changed how organizations everywhere do business, and they are already saving many electronics manufacturers millions of dollars today.

Taking full advantage of all that BDA tools have to offer requires the integration of modern BDA tools such as Data Visualization, Machine Learning, and Artificial Intelligence into the manufacturing process. And maximizing value often depends on integrating these tools throughout the value chain.

The transformation into a data-driven organization is viewed by many as a strategic advantage. Top names in management consulting, finance, tech, and beyond have been touting the value of becoming a data-driven organization for years, and the importance of BDA tools has been accepted by most enterprises. According to a 2016 Forrester survey, 74% of organizations surveyed expressed a desire to be data-driven, but only 29% had been successful at gaining actionable insights from their data.

The utility of BDA tools is not industry specific. Organizations from nearly every sector have demonstrated benefits from their adoption of these tools. However, it has also been shown that success with data-driven endeavors clearly requires in-house expertise or help from outside third-party experts and partners.

Big Data and the Electronics Industry

The demand for electronics never really seems to diminish. And within the semiconductor industry, demand for fabrication capacity has never been so high. The rise of Solid State Drives (SSDs) has created a boon for semiconductor manufacturers, and this has translated into increased demand for the electronics that turn those semiconductors into finished products. Cloud computing, mobile and wearable devices, the Internet of Things (IoT), and increased automation of everyday systems are also keeping electronics manufacturers occupied by driving demand for new electronics.

However, while demand may be high, there is still a constant pressure internally to drive up gross margins. This, in turn, is countered by customers who are always demanding lower prices. New manufacturing technologies only deliver any noticeable improvement in manufacturing costs once a generation or so—and this is far too long to wait.

Fortunately, there is significant room for improving manufacturing processes with existing technologies. These process improvements are enabled by BDA tools and can lead to substantial savings without having to wait for a new generation of technologies to be invented, tested, and proven to work. Perhaps even more important than squeezing every last percentage point of efficiency out of existing processes is that BDA tools are not limited to existing manufacturing technologies. As new technologies are added to an electronics manufacturer’s production capabilities, these tools can dramatically shorten the time it takes to adapt to the new equipment and help ensure that maximum efficiency is achieved right out of the gate.

Machine Learning Basics

Data Analytics, which includes machine learning, has three basic concepts: collect, detect, act, and then repeat. Everything flows from these three concepts.

The collection and retention of test data is critical and has long been used by experts for failure analysis, PCB rework, welding inspection, crack detection, and integration of "cookie cutter" circuit components. Using test data, BDA tools can assist in real-time manufacturing. However, BD tools can also provide value by collecting and retaining data for analysis later on in the product life cycle.

Combining test data with data from Manufacturing Execution Systems (MES) and Product Lifecycle Management (PLM) tracking allows BDA tools to examine more data than a human mind can hold. As an example, environmental variables associated with test runs can be combined with production data as well as product returns to identify environmental or process weaknesses that would otherwise go undetected.

Teams of data scientists have been doing this sort of post-process analytics for some time. Machine learning, however, focuses more on real-time analysis and detection. Machine learning solutions need to be trained with initial data that is typically historical, and machine learning algorithms operate 24/7, generating proposed manufacturing rule changes in real time.

Artificial intelligence can be used to take action in an automated fashion. This can range from the automated generation of triggers to the autonomous application of rule changes suggested by machine learning algorithms.

Ultimately the results of BDA tools are integrated throughout the entire supply chain. The automated collection of data, detection of process errors, and the act of remediation simply become one more set of tools, albeit tools whose purpose is to greatly increase the efficiency of all the other tools in use.

Hyperscale Infrastructure Challenges

None of this is to say that BDA tools are easy to introduce into an organization. Even the largest companies frequently find themselves lacking the resources required to create an entire machine learning solution from scratch. And their main bottleneck is often finding people with the experience needed to do this.

Talent in Demand

There are two broad categories of individuals required to make any BDA effort successful: hyperscale infrastructure specialists and data scientists. Data scientists occupy themselves both with determining which data to collect and retain as well as with creating the algorithms that are used to interrogate the data collected.

Hyperscale infrastructure is in many ways different from traditional IT infrastructure. The most obvious difference being that BDA infrastructure uses different applications than traditional IT.

In traditional IT, one might have a financial package that runs on a relational database, such as SQL. Analytics are largely pre-canned, and even large deployments don't typically encounter scaling issues. The data volumes involved in modern BDA solutions, however, are so large that they became their own category of IT: Big Data.

Scaling Out Not Up

The data volumes collected and analyzed with Big Data are simply beyond what earlier tools could ever handle. Here, solutions such as NoSQL, Hadoop, and Vertica are used and are designed to scale linearly to nearly incomprehensible size, allowing organizations to grow as needed. Today’s Big Data solutions replace the old “Scale Up” paradigm, where the central database and storage systems become exorbitantly expensive as they grow. With “Scale Out”, growth is achieved by adding “commodity” servers, which distribute storage and computation evenly as well as cost-effectively.

Third-Party Requirements

Incorporating BDA tools into an organization requires both skill sets of hyperscale infrastructure specialists and data scientists. Unfortunately, both skill sets are still rare. Consider a survey done by DataStax Academy in March of 2016, wherein it reported that 73% of respondents felt that having expertise in handling the BDA tool Apache Cassandra was vital to their jobs. Six months prior, 60% of respondents to a similar survey had not felt that it was an important skill to have.

Of the respondents to the DataStax Academy survey, only 8% believed there were enough experienced NoSQL technologists to meet demand. IBM, meanwhile, predicted last year that there will be 700,000 job openings for data scientists, data developers, and data engineers by 2020, with the total number of individuals employed in these positions by 2020 exceeding 2.7 million in the United States alone. Europe is facing similar challenges.

In the real world, this means organizations seeking to implement BDA tools typically end up turning to third-party hyperscale infrastructure specialists, allowing them to remain focused on acquiring top-notch data scientists.

Data Science

There is an old nerd joke that goes (roughly): biology is applied chemistry, which is applied physics, which is applied mathematics. In this same vein, data science is applied statistics.

Though academics have argued about the term data science for decades, in today's world data scientists are individuals who apply vast and deep statistical knowledge to large sets of data. The tools of the data scientist are grander in scale than the humble spreadsheet, and if nothing else sets them apart from the colloquial understanding of a statistician, the scale at which they work should. In data science, scale is almost always a concern. This is because the general rule of data science is that the more data that are available, the more results that can be gleaned from that data.

Data volume is influenced by many factors. The higher the quality of the data, the more accurate data scientists can be. Similarly, historic data often has value, especially when trying to explain RMA that has occurred after many years, and data retention can get problematic at Big Data scale.

Old Data

Particularly in manufacturing, old data can be reprocessed with new techniques, leading to new insights. Consider for a moment a product with a defect that occurs after the initial high-failure-rate phase. It can take several years for the product to go from initial design through to consumer purchase and eventual return.

If data is retained for that product from the moment design work begins, then data scientists can set about discovering which combination of environmental factors and design choices led to the increased failures. Algorithms could also examine the data to determine if there is an identifiable pattern in the circumstances surrounding the units that weren’t returned.

If, in our example above, it had been determined that a given design element was responsible for the increased failures, but the specific environmental conditions during the manufacture of those elements allowed for extended life, then this knowledge could be applied to existing production runs in real time, ultimately increasing usable yields while at the same time reducing quality escapes.

Data generated from this type of analysis can be used to rework testing procedures to detect the new type of known flaw. Real-time production machine learning algorithms can also be modified to be aware of the new type of flaw. This, in turn, allows for the ability to spot electronics that will present this flaw, leading to any number of resolutions, from component selection changes to reclassifying manufactured PCBs for duties that don’t involve certain areas of the PCB being utilized.

The data used to accomplish tasks like these doesn't come entirely from data collected within the organization. Through the use of open external data access, APIs, and open standards, manufacturing data can be given context and used for additional types of analysis, again leading to new insights.

This is one of the great dilemmas of Big Data: history matters, but storage capacity is finite. While data can be useful several different times, part of the job of data scientists is to determine how long data should be kept to provide optimal value.

Data Sharing

In order to realize the promise of BDA tools, companies have to share data with their partners. In the example above, the flaw in the value chain could have come from anywhere. Electronics manufacturers source components from numerous suppliers and both the quality of those components and how they respond to environmental stresses can vary.

No manufacturer produces products without deviation. Every piece of manufacturing equipment can and does suffer from calibration drift, micro-environmental deviations, and more. A batch of capacitors from a supplier, for example, may all pass initial testing. Subtle variations in those units may, however, cause them to respond differently during final assembly—especially since the equipment performing the final assembly is itself subject to small variations.

Maybe, at approximately 7:30 AM every morning a fleet of semi-trucks drives by the manufacturing facility, and the vibrations have a subtle effect on the calibration of one—and only one—of the ultrasonic welders. This then could cause a problem as components are joined by that welding unit.

Harnessing data from across the supply chain can help reduce failure rates. If each company reacts to product-return data independently, the root cause may never be identified. The failure may only manifest itself if multiple conditions are present, requiring data from throughout the supply chain in order to produce a viable remedy.

Once You Start, You Can’t Stop

If BDA tools have a fatal flaw, it is the scope of their utility. Once organizations discover how useful tools such as machine learning can be, it is tempting to try applying them everywhere, often all at once. Unfortunately, no organization can get ahold of enough data scientists for such a widespread endeavor, leading to a requirement for selectivity.

Picking the right projects isn't easy: not all data science projects are created equal. Further complicating matters is that industry experience is the key to optimizing returns on any data science investment.

Data science is useful in almost every industry, and an organization's data science team is likely to consist of data scientists who started in other industries. Those who have implemented BDA tools for electronics manufacturers before are far more likely to correctly identify the low hanging fruit than those who have implemented BDA tools for other sectors.

An organization's first, cautious steps into this world are the most important. Companies that start by trying to roll out their own Big Data infrastructure will be at a disadvantage because they will waste time trying to learn the ins and outs of their infrastructure rather than focusing on expanding the use of machine learning and artificial intelligence throughout the organization.

The optimal solution is for organizations to begin their BDA journey by deploying an easy-to-use infrastructure solution from the outset, ensuring that they will be able to scale as needed and allowing their human resources to focus on the data science instead of the infrastructure.

Learn More

Did you enjoy this great article?

Check out our free e-newsletters to read more great articles..