Credit: Getty Images
Flash-based solid-state drives (SSDs) are a key component in most computer systems, thanks to their ability to support parallel I/O at sub-millisecond latency and consistently high throughput. At the same time, due to the limitations of the flash media, they perform writes out-of-place, often incurring a high internal overhead which is referred to as write amplification. Minimizing this overhead has been the focus of numerous studies by the systems research community for more than two decades. The abundance of system-level optimizations for reducing SSD write amplification, which is typically based on experimental evaluation, stands in stark contrast to the lack of theoretical algorithmic results in this problem domain. To bridge this gap, we explore the problem of reducing write amplification from an algorithmic perspective, considering it in both offline and online settings. In the offline setting, we present a near-optimal algorithm. In the online setting, we first consider algorithms that have no prior knowledge about the input and show that in this case, the greedy algorithm is optimal. Then, we design an online algorithm that uses predictions about the input. We show that when predictions are relatively accurate, our algorithm significantly improves over the greedy algorithm. We complement our theoretical findings with an empirical evaluation of our algorithms, comparing them with the state-of-the-art scheme. The results confirm that our algorithms exhibit an improved performance for a wide range of input traces.
Flash-based solid-state drives (SSDs) have gained a central role in the infrastructure of large-scale datacenters, as well as in commodity servers and personal devices. Unlike traditional hard disks, SSDs contain no moving mechanical parts, which allow them to provide much higher bandwidth and lower latencies, especially for random I/O. The main limitation of flash media is its inability to support update-in-place: after data has been written to a physical location, it has to be erased (cleaned) before new data can be written to it. In particular, SSDs support read and write operations in the granularity of pages (typically sized 4KB–16KB), while erasures are performed on entire blocks, often containing hundreds of pages.
To address this limitation, writes are performed out-of-place: whenever a page is written, its existing copy is marked as invalid, and the new data is written to a clean location. The flash translation layer (FTL), which is part of the SSD firmware, maintains the mapping of logical page addresses to physical locations (slots). To facilitate out-of-place writes, the physical capacity of an SSD is larger than the logical capacity exported to the application. This "spare" capacity is referred to as the device's overprovisioning. When the clean pages are about to be exhausted, a garbage collection (GC) process is responsible for reclaiming space: it chooses a victim block for erasure, rewrites any valid pages still written on this block to new locations, and then erases the block.
No entries found