- By Eric Byres, Talha Siddiqui & Derek Kruszewski
- May 09, 2022
Machine learning and natural language processing outdo manual database research for critical product matching. This feature originally appeared in Automation 2022: Cybersecurity & Connectivity Volume 2.
Finding vulnerabilities in industrial control systems (ICS) is more difficult than many organizations realize. Most information technology (IT) and operational technology (OT) practitioners rely on the National Vulnerability Database (NVD), a large database containing the primary collection of vulnerabilities in the world, hoping to identify all the vulnerabilities associated with a software product or with a device’s firmware. Unfortunately, this approach is mostly ineffective for three reasons:
- The NVD is far from complete. It’s estimated that more than 75% of ICS vulnerabilities are missing (Artem Zinenko, Kaspersky Industrial Cybersecurity Conference, 2019).
- The NVD rarely maps component vulnerabilities back to the products containing those components.
- Thanks to mergers and acquisitions, rebranding of products, and even simple typos, the vendor name on the product in your facility is often different from the vendor name in the NVD details or CPE listing.
Even for experienced security analysts, matching vulnerabilities with their installed products (or the other way around) is inefficient, error-prone, expensive, and excruciatingly tedious. In an industry where there is already a critical shortage of cybersecurity professionals, companies that task them to hunt for vulnerabilities may well find themselves with staff retention challenges.
Fortunately, an alternative to manual vulnerability research and management is emerging. Artificial intelligence (AI), including machine learning (ML) and natural language processing, can be used to create vulnerability associations quickly and comprehensively.
Uncovering PLC vulnerabilities
To illustrate just one of the challenges vulnerability research presents to humans, consider the case of this common GE-Fanuc programmable logic controller (PLC) (Figure 1). If you had this device (or more likely, many of them) in your facility, how would you determine if it has any vulnerabilities?
The NVD is a reasonable place to start—but what name would you search for? GE is an obvious choice, but there are many names associated with General Electric. You could also try Fanuc. A whole spectrum of naming conventions arises from the history of these two companies (Figure 2), such as:
- When GE and Fanuc joined forces, they named their product line GE Fanuc Automation, then a few years later renamed it GE Fanuc Intelligent Platform.
- In 2009, they dissolved the joint venture. GE kept the product line but changed the name to GE Intelligent Platforms.
- Upon acquiring Alstom in 2015, GE again rebranded, this time to GE Automation and Controls.
- Then, in 2019, Emerson acquired the portfolio.
In addition to all the M&A activity and rebranding, human error adds to the namespace problem. Engineers who type in the company name may not always be fastidious about commas, periods or suffixes like Ltd. or Inc., resulting in even more variants. A startling example is our analysis of one major OT supplier’s software that found at least 47 company name variations used by the developers.
With all this information, what name should you search for? If you cast your net too wide and try GE, expect to get more than 130,000 hits that will grind your NVD database search to a halt. If you get too specific, you’ll need to try many different names with the different punctuation permutations, and most of them will turn up nothing. What’s important to understand is that you’re not searching for a single name; you are searching for thousands of names.
Spoiler alert: to find out if this particular PLC has any vulnerabilities listed in the NVD, you’ll need to search using “Emerson” as the keyword (Figure 3).
Even then, this was a lucky find. Remember that 75% of all ICS vulnerabilities are missing from the NVD, so you only have a 25% chance of actually getting a result.
AI overcomes the namespace problem
The backbone of vulnerability management is robust vendor management. Knowing all the vendors that make up your software supply chain (and considering the likelihood of typos, rebranded names, etc.) enables you to assess and remediate the exploitability of identified vulnerabilities.
To find vulnerabilities reliably and efficiently, and to manage the risk they pose to your business, artificial intelligence (AI) is needed. aDolus has used AI in its Framework for Analysis and Coordinated Trust (FACT) platform to perfect vendor data and make it much more useful than trying to search the NVD. The platform identifies:
- a normalized name for each vendor.
- all variations and synonyms of that normalized name.
- the relationships between vendors across the dynamic mergers and acquisitions landscape.
The platform performs a complex, weighted selection from more than a dozen input streams to determine a package’s manufacturer. These include aDolus own file processing agent, the file certificate issuer, antivirus data, vendor download locations, the file submitter, and the National Software Reference Library (NSRL), plus aliases in file names, product names, file descriptions, copyright and trademarks.
When multiple input streams agree about the author of a software package, we can be confident it is correct. When there is disagreement, the AI’s vendor selection algorithm evaluates the metadata input streams and determines the correct name of the package’s manufacturer.
Once the vendor management foundation is in place, the platform proactively searches for vulnerabilities across different libraries, going beyond the usual sources like the NVD or ICS-CERT advisories, because so many ICS vulnerabilities are missing from these sources. Often these searches include PDFs and other textual vulnerability notices, presenting a new challenge of more unstructured data. Again, AI comes to the rescue, with natural language processing (NLP) in particular, which allows an AI system to make sense of this text.
Making sense of the text using tokens
The first step to processing English words using AI and NLP is a technique called “tokenization.” Tokenization breaks down raw text into smaller pieces called tokens. These help the AI create context so that it can interpret the meaning of the text by analyzing the sequence. Tokens enable the AI to convert unstructured information (like a text document) into a numerical data structure. Once you have that, NLP can begin to learn what the words mean in relation to each other.
As an example, Figure 4 shows a selection of text from an advisory regarding Allen-Bradley Stratix switches.To make sense of this text, the aDolus platform’s AI converts it into tokens that it can recognize:
“15.2(4a)EA5, Allen-Bradley, Stratix, 8300, Modular, Managed, Ethernet, Switches”
Once it has these tokens, the AI must next be able to recognize what vendor is being talked about. The normalized lists of vendor names mentioned earlier allow AI to infer from the tokens that one of them is a vendor, and the vendor in question is Allen-Bradley according to the notice.
Also, thanks to the normalized vendor names, the AI knows that Allen-Bradley is actually Allen-Bradley Company. Further, it knows that Allen-Bradley is a subsidiary of Rockwell Automation. The platform can now access its collection of files for Rockwell Automation and use the tokens it has to deduce more about the advisory by checking against existing hierarchical data in the platform database.
The platform uses the token about Ethernet to deduce that the advisory has something to do with network and communications, and it recognizes Stratix and even Stratix 8000 as products in the system (Figure 5). The token “15.2(4a)EA5” further narrows the search to the firmware version to which this advisory pertains. Eventually, the AI can identify a likely file in the platform (in this case a .tar file) tied to a vulnerability through a common vulnerability and exposure (CVE) record. Thus, the platform makes the connection.
The original advisory was for Rockwell, which is not in the NVD. In fact, the CVE details do not mention Rockwell at all.
Unless you are using a diverse input stream for vulnerabilities, your vulnerability detection capabilities are not going to be adequate, and it’s difficult to manually search through so many different sources. It’s much easier to let a machine iterate through these for you.
The future of vulnerability management with AI
Identifying vulnerabilities in your software supply chain should be a priority. Supply chain attacks have become increasingly frequent, surging 430% in just 12 months in 2020, according to Sonatype’s 2020 State of the Software Supply Chain Report, and these attacks are becoming regular front page news.
As regulatory pressure increases on operators of critical systems and the vendors who supply them, the future of vulnerability management must include AI. Once these techniques are in place to detect and match the vulnerability notices to particular products, many more insights become possible. One emerging example is the creation of VEX (Vulnerability Exploitability eXchange) documents to communicate which vulnerabilities are actually exploitable and dangerous. But that’s a topic for another article.
For more information on how the aDolus platform can help you manage vulnerabilities and gain full visibility into your software supply chain, contact us here.
This feature originally appeared in Automation 2022: Cybersecurity & Connectivity Volume 2.
Did you enjoy this great article?
Check out our free e-newsletters to read more great articles..Subscribe