Sign In

Communications of the ACM

ACM TechNews

Cheriton Computer Scientists Create Nifty Solution to Catastrophic Network Fault

View as: Print Mobile App Share:
Chart of a partial network partition.

The study revealed that more than two-thirds (69%) of failures require three or fewer events to manifest, and that 24% of the failures were permanent, so even after fixing the partial partition, the failure persists.

Credit: University of Waterloo

Computer scientists at the University of Waterloo Cheriton School of Computer Science in Canada have engineered a solution to partial network partitioning, which can cause catastrophic system failures.

The Cheriton team reviewed 51 partial network partitioning failures across 12 popular cloud-based computer systems, of which 75% had a catastrophic impact like data loss, system or data unavailability, data corruption, or stale and dirty reads.

Cheriton's Samer Al-Kiswany said, "The partition in only one node was responsible for the manifestation of all failures, which is scary because even misconfiguring one firewall in a single node leads to a catastrophic failure."

The team developed network partitioning fault-tolerance layer (Nifty), a simple and transparent internodal communication layer that detours signals around partial partitions.

Cheriton's Mohammed Alfatafta said, "Our prototype evaluation ... shows that Nifty overcomes the shortcomings of current fault tolerance approaches and effectively masks partial partitions while imposing negligible overhead."

From University of Waterloo Cheriton School of Computer Science
View Full Article


Abstracts Copyright © 2020 SmithBucklin, Washington, DC, USA


No entries found

Sign In for Full Access
» Forgot Password? » Create an ACM Web Account