Is It Difficult to Bypass the Protection That Uses Big Data?

Today I want to talk about big data in cybersecurity, or rather, how easy or difficult it is to bypass protection systems that use big data. In other words, how to fool advanced threat detection systems, past the all-seeing eye of which, according to marketers, no extra byte can slip through. I am talking about systems that use big data analytics as one of the main tools for detecting suspicious activity like SIEM and XDR.

Medium and large businesses usually use such platforms. They have big networks and cloud infrastructure where millions of events occur every hour. Naturally, there is no way to analyze them manually. It is conducted with the intensive use of technical means. And it is important to note that the availability of qualified specialists—both in the field of big data and in the field of cybersecurity—is a necessary component here.

What do such systems do?

They allow the identification of signs of unauthorized activity in vast arrays of structured and unstructured data. Considering that in an average network consisting of 10,000 endpoints, about 25 terabytes of data are transmitted per day, the task of scanning all this data becomes extremely hard. However, several algorithms can help.

An essential quality criterion for threat detection platforms (specifically XDR) is anomaly detection accuracy. XDR solutions, as a rule, include SIEM platforms responsible for collecting and processing events, EDR modules needed to detect and respond to anomalies, and UEBA systems that collect large arrays of data about user actions and/or endpoints, servers and network equipment, and then use machine learning algorithms to build patterns of behavior and try to identify anomalies.

The simplest example of such an abnormality is when, for example, in the middle of the night, a server suddenly begins to actively communicate with a remote host that has never been seen in the logs before. It happens occasionally, not regularly, but this fact looks suspicious. Another example: suddenly, from an office device assigned to a single employee, several tens of megabytes of data go somewhere every three or four days and the employee, judging by the information from the access system, is not present at this time.

The examples provided above are generally quite obvious. There are also less regular events, the connection between which is not obvious at all, but the machine sees everything.

Do the machines see everything?

In Frank Herbert's Dune, sandworms destroy anything that makes a rhythmic sound, whether it be people or mechanisms. However, the inhabitants of the desert have learned to deceive the vigilant creatures using a special irregular gait that imitates the natural noises of the desert. Well, for greater reliability, they use special distracting devices which, when activated, begin to emit a loud rhythmic knock. Sandworms rush to this sound, leaving people time to get where they need to go.

The analogies are not entirely coincidental. In fact, cybercriminals who wish to bypass the protection that uses big data analytics will have to spend a lot of time and effort, and that is good news. The sad thing is that eventually, the weak points can be found in any system.

The basis of the above-described security systems is a kind of knowledge base, which is a combination of information about potential threats and about the structure and functioning of the protected resources. This knowledge base helps to determine what is considered an ordinary course of events and what is an anomaly.

Big data analytics systems can work in several diverse ways. For example, Hadoop can be programmed to detect anything that enters or leaves the network. This way you can identify suspicious communications between infected PCs or servers and hosts under the control of cybercriminals. You also can configure monitoring of system logs.

The early warning platform can collect and accumulate data from inside the protected infrastructure, determine what is considered normal behavior, collect data on potential threats and risks from the outside, and use big data analytics to determine if something similar is observed beyond its protective perimeter.

How hackers can bypass your protection

Naturally, cyber attackers know how such systems work. What can they do? First, operators of targeted attacks will be engaged in reconnaissance. This step may take plenty of time. The object of reconnaissance will be not only the hardware systems and software of the target infrastructure, but also its operators: people. The more an employee shares information about himself, the easier it will be to carry out a phishing attack against him or his colleagues. And we know that a considerable number of successful attacks start with phishing.

The next task of the attackers is to minimize their visibility for the security system. There are several options here. Malefactors may move around the attacked network using legitimate open-source tools already present in the target infrastructure—for example, PowerShell, administrative tools, etc. In addition, they can compromise the system tools using fileless malware and if it is not detected, attackers are provided with the ability to move unnoticed across the attacked network.

However, if they are too active and too regular, the detection system can react, which means hackers will have to act as slowly as possible and without regular intervals.

For example, if only one or two machines in the target infrastructure are scanned once a week or even a month, there is little likelihood that the security system will detect anything.

If the data necessary for the attackers is collected not by one compromised account, but by a dozen accounts, and if this data is sent not on one remote server, but to many of them, it may turn out that the threat model on the basis of which the detection system works is faulty.

Another bad scenario looks like this: the employee copies some files into his flash drive. After leaving the building, he throws garbage from his pockets into the trash can, including a flash drive, and then he picks up a package with money near his house. The DLP system (if it is present) cannot always protect against insider threats, especially if highly motivated professionals carry out the attack.

Another aspect worth paying attention to is out-of-the-box detection systems trained against a small number of typical scenarios. It takes some time before they (if they have such a capability) get enough information about potential threats from other systems (threat feeds) and adapt them to their model.

If attackers manage to come up with a not-very-typical attack scenario, there is a chance for them to have time to implement it before the security system detects it.

There is, of course, an option of classic sabotage when, in the presence of several points of entry into the infrastructure, attackers arrange some kind of bright event, for example, a DDoS attack or a deliberately detectable attempt to transfer some data to a remote host. It is just a smokescreen that distracts attention from the actual attack, which is carried out somewhere in a completely different segment.

You can find vulnerabilities in any system and come up with scenarios for their exploitation. In general, neither human nor artificial intelligence can foresee everything, but it is possible to make the attackers' task extremely difficult. It is possible and necessary to provide the corporate infrastructure with all available advanced security tools.

How to protect

Now there is a kind of arms race between cybercriminals and XDR systems. Hackers are looking for weak points, and the task of defenders is to reduce the number of entry points to a minimum.

Technologies and tools based on big data available to large companies are often too expensive for most cybercriminals. Significant attacks and data breaches are performed by advanced cyber gangs (which often are state-sponsored). Still, in most cases, cyber incidents today begin with targeted attacks on specific users.

What does this mean? First, you need, in addition to the use of advanced technical means of protection, to prepare your users for possible phishing attacks and train them to counter social engineering tricks. All employees should have at least a basic understanding of how to ensure their own information security, both at the workplace and beyond. Because of the massive transition to remote work, this is especially important. In addition, it is necessary to minimize the number of possible entry points. These can be any devices connected to the corporate network and accessible from the outside. Intruders can make use of incorrectly configured drives, very old but still functioning routers, and IoT devices.

Alex Vakulov is a cybersecurity researcher with over 20 years of experience in malware analysis and strong malware removal skills.

Is It Difficult to Bypass the Protection That Uses Big Data?

What do such systems do?

Do the machines see everything?

How hackers can bypass your protection

How to protect

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

Communications of the ACM (CACM) is now a fully Open Access publication.

What do such systems do?

Do the machines see everything?

How hackers can bypass your protection

How to protect

Related Reading

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

Shape the Future of Computing

Communications of the ACM (CACM) is now a fully Open Access publication.