Computing Applications

Defending the Enterprise

There are limitless possibilities in the chaos experiments you can run, since the results of one test will lead to more questions and further testing.

Cyberattacks bring turbulence and disruption, leaving unpredictable and chaotic system failures in their wake. Organizations using cybersecurity chaos experiments simulate cyber events to uncover deficits in their cyber-elasticity. Test results lead developers and engineers to repair or rearchitect applications and supporting infrastructure for security and continuity under trial.

In the context of cybersecurity, chaos is any security failure that can happen, says Kelly Shortridge, senior principal product technologist, Fastly, an edge cloud platform provider. "Security chaos testing is the practice of continual experimentation to verify that systems operate as we think to improve their resilience to attack," explains Shortridge.

"Cybersecurity chaos engineering is resiliency testing adopted to combat the ever-changing threat landscape. Chaos engineering applies new threats to systems to see what happens to an ecosystem if components fail," says Doug Saylors, partner and co-lead, cybersecurity, for ISG, a global technology research and advisory firm. "Running a penetration test or attack simulation with a zero-day exploit is the most common method of chaos testing in cybersecurity," notes Saylors.

Web applications, distributed systems, and network infrastructure, including infrastructure-as-code, break under the pressure of attacks that bring chaos. "Systems have varying response characteristics depending on the type of attack. Applications stop working or provide erroneous outputs that have downstream effects. Network and infrastructure components see significant performance spikes, which affect users negatively through increased latency or limited access to critical applications," says Saylors.

Criminal hackers are willing to cause chaos to get to the underlying data, says Jenn Bergstrom, senior technical director for Parsons X; they hope you are not monitoring closely enough to quickly notice the signs, such as packet loss. Parsons X is a group within Parsons, a digitally-enabled solutions provider. "Packet loss may happen because they send a jumbo packet with an embedded command that will affect your database. So, the chaos is more of a side effect of what they are doing," says Bergstrom.

"It's important to see how your system behaves" under such circumstances, "so you see those minor differences between standard behavior and the unexpected," says Bergstrom.

Specific attacks create lots of chaos. "The main attack space for security chaos is ransomware, even though its goal is to make money," says Shortridge. However, ransomware causes more security failures than encrypting critical data. It locks enterprises out of systems and machines, and leads to downtime until an organization pays the ransom or restores from backups

Ransomware attacks are resource-intensive, requiring network bandwidth, CPU cycles, and hard drive operations to encrypt many files quickly, completing an attack before the organization has time to stop it. Ransomware can take down entire networks. It can encrypt all backups before proceeding to production data to ensure the organization cannot restore it. Any services that count on that data come to a halt. Any software with dependencies on those services suffers, and those operations cease or falter. Cybersecurity chaos experiments must evolve to meet the challenges of modern ransomware.

There are disruptive cyberattacks of a more significant concern than ransomware. "The most worrisome of modern cyberattacks with the most chaotic outcomes is an IoT attack rendering thousands of medical devices and even implants to work improperly or not at all," says Michael Nizich,  director, Entrepreneurship & Technology Innovation Center and Cyber Defense Education at New York Institute of Technology.

There are limitless possibilities in the chaos experiments you can run, since the results of one test will lead to more questions and further testing. For example, chaos experiments reveal hidden dependencies that frustrate systems under attack. As Shortridge explains, "if you're conducting a chaos experiment and you see a sudden flood of API requests as the new unit pricing microservice slows, and you can duplicate the experiment with the same results, you realize there is a dependency and perhaps some shared infrastructure or orchestration service in place." The resulting questions include how and why the APIs and services are connected, what happens to them under different attack scenarios, and how you can make them more nimble under the force of these stressors. 

There have been incidents of chaos engineering saving organizations from havoc, such as the lack of critical service availability. "A recent outage at a major time and attendance provider caused disruption for a large number of customers over an extended period. Customers that used chaos engineering as a resiliency tool could quickly implement workarounds, while others were left with no way to pay employees or comply with terms of union and personal employment contracts," says Saylors.

As a result, "The unprepared customers had to rely on the goodwill of their employees and suppliers to continue operating," says Saylors.


David Geer is a journalist who focuses on issues related to cybersecurity. He writes from Cleveland, OH, USA.

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More