Credit: Box Six
Web production software systems currently operate at an unprecedented scale, requiring extensive automation to develop and maintain services. The systems are designed to adapt regularly to dynamic load to avoid the consequences of overloading portions of the network. As the software systems scale and complexity grows, it becomes more difficult to observe, model, and track how the systems function and malfunction. Anomalies inevitably arise, challenging incident responders to recognize and understand unusual behaviors as they plan and execute interventions to mitigate or resolve the threat of service outage. This is anomaly response.1
The cognitive work of anomaly response has been studied in energy systems, space systems, and anesthetic management during surgery.10,11 Recently, it has been recognized as an essential part of managing Web production software systems. Web operations also provide the potential for new insights because all data about an incident response in a purely digital system is available, in principle, to support detailed analysis. More importantly, the scale, autonomous capabilities, and complexity of Web operations go well beyond the settings previously studied.8,9
No entries found