Most database research papers are prescriptive. They identify a technical problem and show us how to solve it. They present new algorithms, theorems, and evaluations of prototypes. Other papers follow a different path; they are descriptive rather than prescriptive. They tell us how data systems behave in practice and how they are actually used. They employ a different set of tools, such as surveys, software analyses, or user studies. These papers are much rarer at database research conferences, and they’re all the more valuable for that.
The paper I’m introducing here, Many Faces of Ad Hoc Transactions, by Tang et al., is of this descriptive type. It presents an analysis of open source database applications, focusing on how these applications synchronize concurrent operations. According to the authors, this analysis is the result of five person-years of effort. (I suspect this is one of the reasons work like this is not more common.) This paper originally appeared at SIGMOD 2022,2 and it extends a (short) research thread that first came to my attention through earlier work by Warszawski, Bailis, and others.1,3
“Wait”, I hear you say. “Database systems provide transactions, and applications use transactions to synchronize their database accesses. What’s to study here?” If this is your immediate reaction, I urge you to read this paper. Transactions are, of course, widely used, but database applications also have other tools at their disposal for coordinating concurrent actions. These include synchronization primitives provided by database systems, such as explicit locks, as well as synchronization mechanisms in the application itself. Tang et al. refer to applications’ use of these mechanisms as “ad hoc transactions” to distinguish them from transactions implemented in the database system.
As it turns out, ad hoc transactions are quite common. Tang et al. analyzed eight popular database applications and found ad hoc transactions in all of them—more than 90 examples of ad hoc transactions in all. This ad hoc transaction corpus is the focus of the paper.
Let’s consider one example from the corpus, as an illustration. It is an ad hoc transaction used in the Discourse forum application to update the content of a forum post. It spans two application-level HTTP requests. In the first, a client requests the current content of a forum post for editing. In response, the server-side application uses a read-only database transaction to retrieve the post content and version number and returns them to the client. The client edits the content and then issues a second HTTP request to install the new post content, providing the original version number. In response, the server-side application first uses a read-only database transaction to verify that the version number has not changed since the content was originally retrieved. If it has, the update fails. Otherwise, the application uses another database transaction to install the new content and update the version number. The application also uses an explicit lock on the post identifier to prevent concurrent updates to the post between the version check transaction and the update transaction.
Logically, this whole process is a single, long-lived transaction spanning two HTTP requests. In practice, the application uses an ad hoc synchronization strategy involving three database transactions, an explicit lock, and an explicit optimistic concurrency control (via the version number check) to coordinate the activity.
The paper does a great job addressing the important questions about these kinds of ad hoc transactions. First, what kinds of ad hoc transactions are found in practice? Here the paper identifies a number of common patterns found in the corpus. Second, how are these ad hoc transactions being implemented, that is, what synchronization primitives are the applications using? Third, why are applications using ad hoc synchronization rather than simply relying on database transactions.
Synchronization is tricky, so the paper also examines what can go wrong when applications rely on ad hoc transactions. You may be unsurprised to hear that the authors found dozens of correctness issues related to ad hoc transactions in these applications. They nicely summarize and classify these issues.
This is a super informative paper, representing a whole lot of work. It should be valuable for engineers who build database applications. For researchers interested in designing useful synchronization mechanisms and abstractions for data systems, it should be considered a must-read.
Editor’s Note: This Technical Perspective first appeared in the ACM SIGMOD Record 52, 1 (Jun. 8, 2023), 6.
Join the Discussion (0)
Become a Member or Sign In to Post a Comment