Machine Learning vs. Machine Learning

Detecting a deepfake video of U.S. president Trump. — Machine learning is being used increasingly to detect deepfake videos created with machine learning.

Cyber defenders and criminals alike are starting to use machine learning (ML), a form of artificial intelligence (AI), to enhance their security technologies and attack techniques, respectively.

Vendors such as BitDefender and Blackberry Cylance incorporate ML capabilities into their security products to detect elusive attacks such as those from Advanced Persistent Threats (APTs). APTs use stolen trusted usernames and passwords to move laterally across the network with stealth for as long as the usurped credentials continue to be believed. APTs are among the most insidious cyberthreats because they can go undetected for months (or even years) inside an organization's network while siphoning out terabytes of sensitive information.

"Human security analysts use cybersecurity tools with ML capabilities to find APTs more efficiently and analyze them better," says Neil Gong, an assistant professor of electrical and computer engineering at Duke University. ML quickly analyzes and identifies the behaviors of probable APTs, as well as malicious software. However, ML is not foolproof; human analysts are necessary to distinguish ML's errant detections (false positives) from genuine APTs and other malicious activity, according to Gong.

Organizations must find the benefit from ML in the cybersecurity tools that they use. Effectiveness is fading with traditional security programs, which use malware signatures as a sort of digital fingerprint to identify known malware.

Today's cybercriminals are decompiling their malicious software programs, returning them to their original programming language so they can alter the code, then recompiling them into ready-to-run software applications that don't match any known signature. Fortunately, malware detection that uses ML detects the abnormal behaviors of unknown malicious software without the use of signatures.

In a converse approach to that of security vendors, cybercriminals are using ML to determine what characteristics they can develop in their attacks and malicious software to evade detection by both traditional and ML-based cybersecurity tools. They make their assessments by using ML algorithms to learn what types of interactions with security software correlate with which responses. These assessments result in some instrumental techniques.

Giuseppe Ateniese, David and GG Farber Endowed Chair in computer science and department chair at the Stevens Institute of Technology, offers the following example: "ML enables cybersecurity tools to trap malware and run it safely in a virtual environment—a sandbox—to confirm that it is malicious. But, an attacker can use ML to detect whether their malware is in a virtual environment so it can remain dormant to evade detection."

ML for good and evil compete in other ways. Cybercriminals are using ML to create deep fake audio and video to impersonate executives and defraud organizations. Cybercriminals recently used deep fake audio to direct a U.K. energy company executive to wire €200,000 (nearly $220,000) to a Hungarian supplier who was the fraudster, according to The Verge. The cybercriminal used software to mimic the voice of the executive's boss and instruct the executive to make the payment; the software imitated the voice, tone, punctuation, and German accent of the executive's superior, according to the article.

Gong believes cybersecurity defenders using ML will be able to detect deep fakes by analyzing the statistical properties of the audio and video to compare the suspect recordings with genuine examples from the person they imitate.

In another example pitting ML against ML, according to "The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation," a study by the Future of Humanity Institute, the University of Oxford, and others, the application of AI to the automation of software vulnerability discovery can likewise be used for malicious purposes to alleviate the labor constraints of attackers.

Either side can train ML models to recognize vulnerabilities in software programs: defenders do it to fix the vulnerabilities, while attackers do it to exploit the vulnerabilities. "It's pretty much what people used to do, but now with ML, it's automatic. So, it's much cheaper and faster for both sides," says Ateniese.

These observations lead to one conclusion: if cybersecurity vendors don't keep up with new ML techniques that enable a direct response to cybercriminals' use of ML, the number and severity of attacks enhanced with ML will multiply beyond the vendor's ability to detect them at all, let alone keep pace with them, according to Ateniese.

ML has a bright future in fueling cyber-altercations, though that future may appear in staggered installments. ML for cybersecurity and cybercrime will see minimal penetration in the marketplace over the next three to five years, according to Eliezer Kanal, a technical manager in the CERT (Computer Emergency Response Team) division of the Software Engineering Institute of Carnegie Mellon University.

The application of ML in cybersecurity and cybercrime is coming, but it won't develop overnight. "Over the next two years, we'll start to see some ML techniques come into play, where you can detect attacks that cybercriminals execute against you and automatically mitigate them," says Kanal. He adds, "but it's going to be a long time before we see practitioners in the field view this as enough of a challenge that educational institutions will incorporate it into their curriculums."

David Geer is a journalist who focuses on issues related to cybersecurity. He writes from Cleveland, OH, USA.