Practice
Software Engineering and Programming Languages

Deterministic Record-and-Replay

Zeroing in only on the nondeterministic actions of the process.

Posted
coding icon on film strip background
This installment of Research for Practice covers a topic that, despite its maturity, continues to produce cutting-edge research: deterministic record-and-replay. Deterministic record-and-replay technologies enable a faithful re-execution (replay) of a program that ran in the past (and perhaps encountered a rare bug, a performance anomaly, or an intrusion by an adversary). But accomplishing this requires that any nondeterministic inputs to the program be logged (recorded) during execution.
The selection of techniques presented here is curated by Andrew Quinn, assistant professor of computer science and engineering at University of California Santa Cruz. We chose the topic because of its growing relevance to our audience of practitioners, and we chose Professor Quinn because of his expertise in the area. His selections here all come from recent publications in top CS conferences, and they cover a range of topics from core technologies providing record-and-replay to somewhat more exotic applications. They reveal both how broadly applicable this technique is across domains and just how hot record-and-replay remains even though it has been an active area of research for a long time.
Professor Quinn’s summaries are detailed, so I will keep mine short. His first selection, which is called RR, provides a fantastic opportunity to get your feet wet with a system that “maximizes deployability” and works right out of the box by taking advantage of a mix of hardware and software features available on any modern system.
The second paper (which just came out in 2024) uses record-and-replay techniques not to re-execute a program per se, but instead to address the problem of lightweight auditing—for example, to ensure a cloud provider is indeed running the program you submitted without needing to resort to the massive compute resources offered by the provider.
Finally, he shares a paper that creatively applies record-and-replay ideas to a seemingly unrelated domain: securing GPU computations on mobile devices by treating each execution on new inputs as a “replay” of a canonical execution.
I cannot think of a better way to spend a weekend afternoon than reading these three papers, along with Professor Quinn’s expert summary.
—Peter Alvaro

Deterministic record-and-replay is a technique that allows a user to record a program execution and then replay the exact same execution at a later time. Think of it as being like TiVo, only for processes that execute on a computer. Researchers have created hundreds of systems that use deterministic record-and-replay across many domains. Security forensic systems (for example, Backtracker5) use record-and-replay to determine what data was exfiltrated during a security breach, while replay-based debuggers (for example, Instant Replay6) use record-and-replay to debug rare production issues and heisenbugs (bugs that go away once you observe them).

The key challenge with deterministic record-and-replay is that processes produce a lot of execution state, far too much to capture and store. To illustrate, let’s do a back-of-the-envelope calculation for the size of a process’s instruction sequence, which is a subset of the information that must be encoded for record-and-replay. Assume that each instruction pointer is 8 bytes, that the process is single-threaded and executes for 10 minutes, and that the system uses a 2GhZ processor. In this scenario, the instruction sequence would require 1.2TB of storage. Imagine the storage demands to do this for all your workloads, which execute longer, have more threads, and use faster processes.

The key insight enabling deterministic record-and-replay is that most of a process’s actions are deterministic—meaning their behavior depends entirely on the current state of the process. Thus, a deterministic record-and-replay system can eschew storing most execution states and instead store only information about the nondeterministic actions of the process. During recording, such systems store the nondeterministic inputs that were passed to the process (for example, the bytes read from the network) into a log; during replay, the systems then re-execute the process with the values in the log to re-create the execution state.

This column describes three recent research results that use deterministic record-and-replay, focusing on diverse use cases of the technique.

Readily Deployable Record-and-Replay

Source: 

Engineering Record and Replay for Deployability (R. O’Callahan, C. Jones, N. Froyd, K. Huey, A. Noll, and N. Partush)a

This system, called RR, is something you could probably add to your engineering workflow right now since it’s a record-and-replay debugger that’s easy to deploy. The system’s key novelty is that it performs record-and-replay on an unprivileged user-space implementation that supports unmodified user-space applications on a stock Linux kernel, compiler, language runtime, and standard x86/x86-64 CPU. RR even supports programs that have bugs, such as data races, which have been a major source of issues for past record-and-replay systems. RR is also an open source system, so you should build and tinker around with it if you haven’t had the chance already.

RR works by building on software and hardware features that were originally intended for traditional debugging and performance profiling. It uses ptrace, which was designed to implement software debuggers (for example, gdb) that capture system call results and signals. The system accelerates the performance of ptrace by introducing in-process, system-call intersection, a technique that selectively suppresses ptrace traps and hence minimizes context switches, using seccomp-bpf (secure computing-Berkeley Packet Filter). RR executes only one thread at a time to avoid nondeterministic data races, and it uses CPU hardware performance counters (originally designed for performance profiling) to measure application progress so it can deliver signals and perform context switches at the right point in the program. This evaluation indicates whether the system meets reasonable compute, memory, and storage requirements for regular use as a debugger.

Auditing Event-Driven Web Applications

Source:

Efficient Auditing of Event-Driven Web Applications (I. Tzialla, J. Wang, J. Zhu, A. Panda, and M. Walfish.)b

Suppose you deploy a Web server on a remote cloud machine. How can you be sure the machine is actually executing your Web server? This brings us to the second system of interest: Karousos, which uses a variant of record-and-replay to answer that question of how your Web server is faring, a concern these researchers refer to as “execution integrity.” Specifically, Karousos offers a solution that provides a principal (you, that is) with confidence that running a given program (your Web server) on given inputs (user requests) actually produces the alleged outputs (user responses) observed from a third-party machine (the remote cloud machine).

Karousos uses the model from Orochi,7 in that a trusted collector machine captures all inputs and outputs to the program, while a verifier re-executes the inputs in the trace to check whether re-executed outputs match the trace. The model accelerates the process by re-executing requests in batches, using untrusted advice produced by the untrusted machine that consists of some of the nondeterministic actions made by the program.

Karousos specifically targets event-driven Web applications, for which prior work provides poor support. The system’s key contributions center around techniques that balance the trade-off between re-execution throughput and the size of advice. The technique and analysis are deep and beyond the scope of this column. I encourage interested readers to dive into the paper. (I promise you’ll learn something!)

In the end, Karousos’s re-execution is essentially deterministic record-and-replay, with two key caveats. First, its re-executions aim to require less computation than the recorded execution; otherwise, it would not make sense to use the cloud in the first place. Second, deterministic record-and-replay assumes a correct log of nondeterministic events, whereas Karousos targets scenarios where the third-party adversarially constructs the advice.

GPUReplay for Client ML

Source:

GPUReplay: a 50-KB GPU Stack for Client ML (H. Park and F.X. Lin)c

Let’s say you design and maintain a mobile application that uses a GPU to accelerate a machine-learning (ML) component. Moreover, let’s say this application executes on a GPU stack consisting of an ML framework (for example, TensorFlow), a runtime (for example, OpenCL), and a GPU driver. Together, the GPU stack converts a high-level language definition of your program into low-level GPU commands and places them on the GPU. Unfortunately, these sophisticated stacks come with three main concerns: weak security, deployment difficulty, and slow startup.

GPUReplay uses deterministic record-and-replay to improve this state of affairs. At development time, it executes your ML application and records the GPU stack executions. The executions encode how the GPU stack interacts with the GPU to initialize each execution. At deployment time, GPUReplay executes your ML application by invoking the recorded GPU stack executions when given new input data. In doing so, it needs to deal with nondeterminism that may arise between the recorded executions and the replay ones, which it does by allowing nondeterminism that does not affect program output (for example, timing variations), preventing the nondeterminism from occurring in the first place, or detecting nondeterminism and attempting re-execution. The authors show that GPUReplay eliminates a large number of high-impact CVEs (common vulnerabilities and exposures) from occurring in deployment and validates that the replayed executions are indeed correct.

Summary

This column describes three recent research advances related to deterministic record-and-replay, with the goal of showing both classical use cases (replay-based debugging) and emerging use cases (execution integrity and GPU acceleration). A growing number of systems use a weaker form of deterministic record-and-replay (which I call “mostly deterministic” record-and-replay). Essentially, these systems exploit the determinism that exists across many program executions but intentionally allow some nondeterminism for performance reasons. This trend is exemplified in GPUReplay in particular, but also in systems such as ShortCut4 and Dora.8

For more information on deterministic record-and-replay, I suggest any of the detailed surveys on the topic.1,2,3

    References

    • 1. Chen, Y. et al. Deterministic replay: A survey. ACM Computing Surveys 48, 2 (2015); https://tinyurl.com/294ykf86
    • 2. Cornelis, F. et al. A taxonomy of execution replay systems. In Proceedings of the Intern. Conf. on Advances in Infrastructure for Electronic Business, Education, Science, Medicine, and Mobile Technologies on the Internet (2003); https://tinyurl.com/29vu5p4n
    • 3. Dionne, C., Feeley, M., and Desbiens, J. A taxonomy of distributed debuggers based on execution replay. In Proceedings of the Intern. Conf. on Parallel and Distributed Processing Techniques (PDPTA)  (1996); https://tinyurl.com/2boqkcoa
    • 4. Dou, X., Chen, P.M., and Flinn, J. Accelerating mostly deterministic code regions. In Proceedings of the 27th ACM Symp. on Operating Systems Principles (2019), 570585; https://tinyurl.com/28c4ap4x
    • 5. King, S.T. and Chen, P.M. Backtracking intrusions. In Proceedings of the 19th ACM Symp. on Operating Systems Principles (2003), 223236; https://tinyurl.com/2dasjjx6
    • 6. Leblanc, T.J. and Mellor-Crummey, J.M. Debugging parallel programs with Instant Replay. IEEE Transactions on Computers 36, 4 (1987), 471482; https://tinyurl.com/26hwpbkg
    • 7. Tan, C., Yu, L., Leners, J.B., and Walfish, M. The efficient server audit problem, deduplicated re-execution, and the Web. In Proceedings of the 26th Symp. on Operating Systems Principles (2017), 546564; https://tinyurl.com/2879o3by
    • 8. Viennot, N., Nair, S., and Nieh, J. Transparent mutable replay for multicore debugging and patch validation. In Proceedings of the 18th Intern. Conf. on Architectural Support for Programming Languages and Operating Systems (2013); https://tinyurl.com/2df9ycnm

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More