In past issues we have discussed various system-related disasters and their causes, both accidental and intentional. In almost all cases it is possible to allocate to people—directly or indirectly—those difficulties allegedly attributed to “computer problems.” But too much effort seems directed at placing blame and identifying scapegoats, and not enough on learning from experiences and avoiding such problems [1,2,5,6,7]. Besides, the real causes may implicitly or explicitly involve a multiplicity of developers, customers, users, operators, administrators, others involve with computer and communication systems, and sometimes even unsuspecting bystanders. In a few cases the physical environment also contributes, e.g., power outages, floods, extreme weather, lightning, and earthquakes. Even in those cases there may have been system people who failed to anticipate the possible effects. In principle, at least, we can design redundantly distributed systems that are able to withstand certain hardware faults, component unavailabilities, extreme delays, human errors, malicious misuse, and even “acts of God”—at least within limits. Nevertheless, in surprisingly many systems (including systems designed to provide continuous availability), an entire system can be brought to a screeching halt by a simple event just as by a complex one [4].
The Latest from CACM
Shape the Future of Computing
ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.
Get InvolvedCommunications of the ACM (CACM) is now a fully Open Access publication.
By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.
Learn More
Join the Discussion (0)
Become a Member or Sign In to Post a Comment