Detecting/Explaining Industrial Hacks

The Industrial Control System (ICS) is the pivotal computer system managing industrial processes worldwide—from manufacturing and packaging goods to generating and controlling our power systems, chemical plants, and environmental infrastructure with secure-networked programmable logic controllers (PLCs). Machine learning (ML) is the leading-edge approach to detect intrusions into an ICS—rather than traditional behavior modeling—but to encourage system operators to adopt ML, researchers at the University of Bristol, U.K., have implemented eXplainable artificial intelligence (XAI) that provides clear descriptions and interpretations of ICS anomalies in more accessible terms, effectively identifying their threat levels and/or attack vectors.

“Machine-learning-based anomaly detection will inevitably replace existing model-based methods due to their superior capabilities in identifying anomalous data patterns. However, the obscure interpretability and the utter lack of explanations provided by ML-based anomaly detectors remain a major barrier to their application in real-world systems. This University of Bristol work bridges that gap by integrating explainable features into state-of-the-art ML anomaly detection. Explainable AI can be a key tool to support grid operators in making control-room decisions and protecting power grids from malicious actors,” said ICS expert associate professor Subhash Lakshminarayana, in the School of Engineering at the University of Warwick, U.K., who was not involved with the University of Bristol research.

In theory, an ICS operates with minimal human intervention—using set-and-forget system-operator inputs—with sensors monitoring and fine-tuning the actuators which control its cyber-physical devices. Since the autonomous behavior of an ICS only depends on humans to set-up initial conditions and perform maintenance—the ICS’s lack of humans-in-the-loop make its operational technology (OT) an even more vulnerable target to hackers than information technologies (IT), the latter of which are chock-full of humanly monitored cybersecurity checks and balances.

As a result, most OT are separated from corporate IT by virtually impenetrable firewalls, or for a critical OT by a physical “air-gap” that completely disconnects OT from IT—eliminating attack vectors originating on the Internet. Nevertheless, the OT remains vulnerable to novel attack vectors especially inside-job hacks delivered by maintenance personnel unknowingly using tainted replacement parts, by nefarious intruders gaining physical access to a normally unmanned OT, or even by disgruntle employees supervising a manned OT. As a result, to prevent intrusions even an air-gapped OT needs to be scanned for anomalies in its behavior, according to the latest OT research.

“The air-gap—an absence of a communication channel between OT and IT—may be insufficient to thwart attacks against a plant OT. This is due to practical considerations such as supply-chain infiltration and/or insider attacks. For example, it may suffice for attackers to infiltrate and introduce malware at remote contractor sites that supply components, firmware, and software used in a plant OT—effectively ‘jumping’ the air-gap. An insider attack is also a powerful adversary who is able to bypass many OT security measures due to direct physical access. The air-gap provides a useful layer of security, but additional layers are today recommended against powerful adversaries. In our work, we use anomaly detectors that monitor sensor and actuator readings from a plant OT as the final layer of defense,” said Sarad Venugopalan, Knowledge Transfer Project Research Fellow at the U.K.’s University of Bristol who—together with professor Sridhar Adepu and doctoral candidate Kornkamon Mathuros—received the Best Paper Award for their OT protection solution with XAI at the ACM Cyber-Physical System Security Workshop (July 10, 2024 in Singapore).

Deep Neural Network (DNN) anomaly-detector databases, for instance at GitHub, learn from labeled “normal” and “anomalous” behaviors in an ICS. Unfortunately, the “black box” nature of DNNs and to a certain degree even probabilistic approaches, lack the ability to pinpoint the precise ICS component(s) evoking an anomaly’s attack vector. In their research, Venugopalan et al. claim to be solving this problem by adding at the rear-end of an anomaly detector an eXplanatory AI (XAI) which analyzes the causes for each flagged anomaly, thus increasing both a model’s transparency and the system operator’s trust in it.

secure water treatment testbed (SWaT) flow chart — Researchers’ DeepSVDD deep neural network model, combined with a probabilistic method, detected and identified all but one of 41 anomalies in the 946,722 records for the SWaT industrial control system shown here.
Credit: University of Bristol

“Use of AI for cybersecurity of industrial control systems is as important to help system operators pinpoint the cause of the anomaly, as it is to detect the anomalies in the first place,” said ICS expert Daisuke Mashima, an associate professor at the Singapore University of Technology and Design who was not involved in the University of Bristol research. “Thus, it is crucial for the AI to be interpretable and give insight about its decision processes to the system operator. To my knowledge, this [University of Bristol] research is one of the first attempts to utilize and evaluate explainable AI for ICS cybersecurity.”

Detectors first, then XAI

To start with the best anomaly detector, the researchers surveyed over a dozen first-generation efforts (between 2017 and 2023) including both machine-learning-based and probabilistic-based approaches. That survey work resulted in the implementation of two second-generation anomaly detectors with overall shorter training times than earlier attempts, faster anomaly detection algorithms, and fewer false alarms (called false positives). Of the two they implemented, Empirical Cumulative distribution-based Outlier Detection (ECOD) outperformed Deep Support Vector Data Description (DeepSVDD) in terms of the total number of attacks detected (although it had more false positives). DeepSVDD detected attacks ECOD missed (but missed some ECOD caught); plus, it had fewer false positives than ECOD. As a result, the researchers implemented both versions of DeepSVDD and ECOD to use together for its testbed ICS.

The researchers added different explainable AI back-ends to DeepSVDD and ECOD that were run whenever an anomaly was flagged. The researchers surveyed the best from among previously existing XAIs and chose the best four candidates for adaptation to the testbed ICS: Shapley Additive Explanations (SHAP), Local Interpretable Model Explanation (LIME), Accumulated Local Effects (ALE), and integrated gradients (IG). All were run on the anomalies detected by DeepSVDD and ECOD (except for IG, whose gradient-decent method was unsuitable for use with the probabilistic ECOD).

“It is vital to develop context-specific indicators for anomaly detectors. Otherwise, there will be a lot of false positives alerting system operators by virtue of applying the incorrect context to a scenario,” said ICS expert Neetesh Saxena, an associate professor at the U.K.’s Cardiff University, who was not involved with the University of Bristol research. “As supported in the recent work by these [University of Bristol] researchers, explainable AI deselects unrelated features and highlights the most crucial characteristics, thus helping to correctly interpret anomalies within its proper context and thus improving the accuracy and efficiency of the system operator’s decision-making.”

The researchers used the secure water treatment testbed SWaT as its training database, consisting of 946,722 records labeled either as an attack or normal. Just 5.77% (41) record included in the database were attack records, all (except one) of which were flagged by one or the other (or both) anomaly detection modules. ECOD flagged more anomalies, but DeepSVDD flagged the anomalies ECOD missed. DeepSVDD also demonstrated greater precision, resulting in a lower false positive rate.

The researchers recommended ICS system operators use both anomaly detection modules to cover all bases.

R. Colin Johnson is a Kyoto Prize Fellow who has worked as a technology journalist for two decades.

Detecting/Explaining Industrial Hacks

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

Communications of the ACM (CACM) is now a fully Open Access publication.

Detecting/Explaining Industrial Hacks

Related Reading

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

Shape the Future of Computing

Communications of the ACM (CACM) is now a fully Open Access publication.