The internet is increasingly the place where both users and organizations store their information. Storage is becoming a commodity; for example, consider the storage offerings by companies such as Google and Amazon.
A key benefit for individual users of commodity Internet storage is that they can access their information from anywhere at anytime from any device. Thus they no longer have to use their PC to access their files or their email. Furthermore, online information can easily be shared with others, giving rise to new applications based on support for collaboration.
However, the full benefit of online storage will be realized only if users can access their data whenever they want. Users need storage that is highly reliable (it is not lost) and highly available (accessible when needed). They will not be satisfied with less reliability or availability than they can get by storing their information locally. Providing these guarantees requires replication: by storing copies of information on multiple computers, it is possible to prevent loss and provide accessibility.
Replication has been the subject of research for over 20 years. The details of the replication protocols depend on the failure model, and two are in common use. The first is the crash model, in which either a computer is up and running as required by the protocol, or it has crashed and is doing nothing. The second is the Byzantine model, in which computers are allowed to fail in arbitrary ways. The Byzantine model is more general: in addition to crashes, it handles failures in which a faulty computer continues to run the protocol while misbehaving. For example, a machine might indicate it has executed a command to update information, but discard the new information.
During the 1980s there was a great deal of research on replication protocols that handle crash failures for two reasons: crashes were the most common failures at the time, and it is much easier to think about crashes than Byzantine failures. This work led to protocols that survive/failures using 2f + 1 replicas, which is the minimum needed in an asynchronous setting like the Internet. Also, the protocols provide good performance, close to what an unreplicated system can provide.
However, these protocols are unable to handle arbitrary (Byzantine) failures, which are becoming more common. One source of Byzantine failures is software errors; typically these are non-deterministic errors, because deterministic errors are much more likely than non-deterministic ones to be removed during testing. The second source, and one of increasing concern today, is malicious attacks in which an adversary manages to get control of a computer and cause it to misbehave. To handle these problems, researchers have developed replication protocols that provide Byzantine fault tolerance.
Prior to the late 1990s, work on Byzantine-fault-tolerant replication was only of theoretical interest because the protocols were so costly or worked only in a synchronous network. This changed with the invention of PBFT, the first Byzantine-fault-tolerant replication protocol that could be used in practice in an asynchronous network. PBFT provides state machine replication; that is, it handles arbitrary operations on the service state. It requires the minimum of 3f+ 1 replicas to tolerate f failures.
The development of PBFT led to renewed interest in Byzantine-fault-tolerant replication protocols. Researchers have investigated a number of research questions, including:
Level of Consistency. At one extreme is a replication protocol like PBFT that appears to behave as if there were just one copy of the data. But performance can be improved by providing weaker guarantees.
Other Approaches. PBFT makes use of a primary replica to direct the protocol; researchers have invented protocols that avoid the use of a primary either completely or partially.
Failure Analysis. Replication protocols like PBFT work correctly provided no more than f replicas are faulty. But if that bound is exceeded, can any guarantees be made?
Performance, performance, performance. Improved protocols that have better performance (lower latency, higher throughput) are always of interest.
The work on Zyzzyva presented here is concerned with the last topic. Zyzzyva achieves excellent performance when all replicas are non-faulty. It pays for this gain in performance in the non-failure case by offering reduced performance when there are failures. Importantly, its techniques should allow it to achieve performance that is close to that of an unreplicated system most of the time.
Today there are Byzantine-fault-tolerant replication protocols efficient enough to be deployed in a real setting. But when might this happen? Here we can learn from the work on crash replication. Although developed in the 1980s, these protocols weren’t used in real systems until around 2000. The reason for this delay was a perception that the reliability provided by these approaches wasn’t really needed in practice. This perception changed as more critical state was stored online. The concern about cost also changed, since computers are much cheaper, and the network is much faster.
I expect that someday there will be a practical deployment that tolerates Byzantine failures. The decision to take this step will depend on the criticality of the data. At some point incurring the cost of replication will be preferable to being liable should the data be lost or unavailable.