Research and Advances
Computing Applications Research Highlights

Technical Perspective: Opening the Door to SSD Algorithmics

Posted
grid of colorful blocks, illustration

Solid-state drives (SSDS) have revolutionized the world of non-volatile storage over the past decade. Indeed, SSDs have found their way into every nook and cranny of computing—from cell phones to powerful servers deployed in cloud datacenters. A reason for the rising popularity of SSDs is that they are faster and consume less energy than their older rivals, hard disk drives (HDDs).

Why should an algorithm designer care about the emerging dominance of SSDs? Historically, new types of computer hardware have seldom required new models of computation or new algorithmic paradigms. Much of algorithmic science uses models that are hardware independent. An enduring example is the random-access machine (RAM) that models an abstract computer that can read or write any cell in its unbounded memory in constant time. Traditionally, the RAM and related models have freed the algorithm designer from worrying about the nitty-gritty details of the underlying computer hardware.

However, hardware-independent models of computation have their limitations. As the aphorism goes, “all models are wrong, but only some are useful.” Ignoring the peculiarities of the hardware can sometimes lead to models of computation that are too simplistic to be useful for algorithm design. SSDs are a case in point. SSDs store data as a set of blocks, with each block consisting of hundreds of pages. While pages can be randomly accessed for read operations, they cannot be modified in place for write operations. Further, a page must be erased before it can be rewritten and erasures only work for entire blocks, not individual pages. As a result, writing a single page in SSD can cascade into rewriting multiple pages—a phenomenon called “write amplification.” For instance, when a page is modified, it must be copied over to a clean page. If a clean page is not found, a “victim” block must be selected and erased to increase the supply of clean pages—such erasure results in rewriting all valid pages in the victim block causing write amplification. Besides the performance degradation caused by write amplification, the lifetime of the SSD is also impacted since each block can only tolerate a limited number of erasures before failure. Thus, extending the lifetime of an SSD requires “wear leveling” that aims to spread the erasures evenly across blocks. Hence, a traditional model of computation that ignores the idiosyncrasies of writes and erasures in SSDs is likely too wrong to be useful.


The following paper takes a key step toward algorithm design in the SSD model of computation.


To address this critical need, the authors of the accompanying paper propose a more accurate theoretical model of SSDs that views each page as containing either valid data, invalid data (that is, stale), or no data (clean). Their model incorporates read, write, and erase operations, enabling the formal algorithmic study of key phenomena associated with SSDs, such as write amplification and wear leveling.

The paper also takes a key step toward algorithm design in the SSD model of computation. In particular, the authors derive SSD page-management algorithms for minimizing the write amplification of an arbitrary sequence of write operations. These algorithms expose the vital role “death times” play in minimizing write amplification, where the death time of a page is the next time the page is modified. Those familiar with the decades-old literature in memory cache management may recall Belady’s algorithm, which achieves the maximum cache hit rate by evicting pages accessed farthest in the future. The rough analogy between the two research areas where the time of future access of pages plays a critical role in algorithm design is difficult to miss.

The formal study of algorithms and data structures for SSDs that explicitly account for write amplification and wear leveling is still in its infancy. In the current state of the art, software initially designed for other storage media is often retrofitted to work for SSDs based solely on empirical evaluation. A good model has historically been the key that opens the door to theoretical advances and algorithmic discoveries. The most significant impact of this paper is likely to come by attracting more research and researchers to the foundational study of SSD algorithmics. Given the ever-expanding role of SSDs in the modern computing universe, the benefits of such research can hardly be overstated.

 

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More