When making a storage system distributed across many machines, designers are faced with a critical question: Where is my data? At the heart of every storage system is a mechanism for determining where to find data based on an identifier. In file systems, file names represent the identifier for the file data. In key-value stores, the key represents the identifier, and the value is the data. So to distribute a storage system across many machines, there needs to be some way to know which machine stores the data.
The obvious solution is to use a phone book-directory approach from the good old days—there are already many phone books for various locations, so all that’s needed is a global phone book that links a name (that is, key) to the phone book containing the data (that is, location). This directory-based approach works and has been employed in practice, but a limitation is the size of the global directory. Keys are small (for example, tens of bytes), but when the data is also small, the overhead for storing a global directory is high.
Over a decade of research in key-value stores has paved the way for another approach to look up data: Compute the location. This approach achieves high scalability and low overhead by having clients (that is, users) compute the hash of the key to determine which machine has the data. This idea, known as consistent hashing, works by assigning machines to ranges of the hash space. Much research has been dedicated to improving and optimizing this approach, but it fundamentally suffers from an important limitation: Users cannot control placement decisions. Since placement is based on a hash, it is approximately random. If we wanted to store a phone book for people within a geographic region (for example, to optimize performance, to enforce privacy requirements), this would not be possible with consistent hashing, as the global phone book would randomly partition people across the various phone books.
At the core, then, is a trade-off between the placement inflexibility of hashing-based approaches and the storage inefficiency of traditional directory-based approaches. Hybrid approaches try to balance this tradeoff but still suffer from their disadvantages. What’s needed is an entirely new approach, and the accompanying paper presents such a solution based on recent advances in theoretical research on minimal perfect hashing (MPH). The paper introduces Smash, a storage system that is the first of its kind to apply MPH to storage research. Using MPH, Smash provides full placement flexibility in assigning data to any machine while reducing memory overhead costs by more than 60% compared to current approaches used in practice (that is, CRUSH/Ceph). Full placement flexibility comes from storing a directory that maps keys to locations, and Smash overcomes the space inefficiency of traditional directories by using an MPH approach to significantly reduce space (that is, memory) usage. In addition to reducing memory costs, Smash also achieves low-latency performance for data retrieval/update operations and low rebalancing overhead when adding/removing storage machines.
Minimal perfect hashing solves the efficiency problem of directories by not storing keys. Instead, the hash function directly identifies the location. How is this possible since the key is required for resolving hashing collisions? The idea behind MPH is to construct hash functions where there are no collisions. So rather than dealing with collisions on lookup, MPH approaches manage collisions at construction. This potentially makes updates to the data structure expensive as it may require reconstructing the entire data structure. Fortunately, there has been a recent advance to MPH research to enable efficient updates: Ludo hashing. Ludo hashing is a new data structure that supports fast dynamic updates with low space overhead. Smash incorporates this idea to significantly reduce the memory overheads in storing a directory. That is, Smash uses a directory-based approach for full placement flexibility and uses Ludo hashing for representing this directory in a space-efficient manner where the efficiency comes from not storing keys. As a result, Smash only needs 1.6GB to store a directory with 100 million data objects, whereas other approaches require over 5GB. Smash also decentralizes the directory lookup across the storage machines to enable scalability and avoid a single point of failure.
From file systems to distributed object stores, storage systems all rely upon a mechanism for data lookup. Storage is only useful when one can find where their data is, and Smash presents a new approach for making the data lookup efficient. By developing a new design based on state-of-the-art theoretical research on minimal perfect hashing, Smash demonstrates that it is possible to achieve both full placement flexibility and space efficient lookup.
Join the Discussion (0)
Become a Member or Sign In to Post a Comment