Data structures that filter data for point or range queries are prevalent across all data-driven applications, from analytics to transactions, and modern machine learning applications. The primary objective is simple: find whether one or more data items exist in the database. Yet, this simple task is exceptionally difficult to perform efficiently, and surprisingly critical for the overall properties of the data-intensive applications that rely on filtering.
This is a hard problem as there are numerous critical parameters and trade-offs. Many parameters come from the workload, for example, the exact percentage of point queries versus updates, percentage of empty-result queries, and so on. Other parameters come from the underlying hardware; for example, filters typically reside in memory but, with exponentially increasing data sizes, we need to be mindful of the filter size and the memory hierarchy. Overall, there are complex trade-offs to navigate: memory, read, and write amplification. For example, a data structure cannot be efficient for both point and range queries while also supporting efficient writes. Yet, numerous applications need to expose both read patterns.
No entries found