Technical Perspective: A Compiler’s Story

In the early 1970s, pioneers like Floyd, Dijkstra, and Hoare argued that programs should be formally specified and proven correct. But for the past 40 years, most of the computer science community has discounted this vision as an unrealistic goal. Perhaps the most important reason has been simple economics: Throughout the 1980s and 1990s, the industry tended to be more interested in factors such as time-to-market than issues involving correctness.

But the context has changed dramatically over the intervening years: Security and reliability have become key concerns in the fastest growing sectors like embedded systems where lives may be at stake. Even in non-critical domains, developers must worry about bugs, including buffer overruns and race conditions that can lead to security exploits. Researchers have developed a variety of tools, including strong static type checkers, software model checkers, and abstract interpreters, all of which can (and are) used to help enforce a range of safety properties. Consequently, formal methods are in wide use today, albeit disguised within tools.

However, these tools typically operate at the source-level (or, at best, a VM bytecode level), and not at machine code level. To scale, they must make assumptions about how a compiler will refine the source code to machine code. For example, even though the C language specification does not specify an order of evaluation for function arguments, most analysis tools assume the compiler will use a left-to-right strategy, since analyzing all possible evaluation strategies would take too much time. This mismatch of assumptions, or a bug in the compiler itself, can easily render the analysis tool useless.

In the following paper, Xavier Leroy addresses these problems by describing a compiler he built and verified using the Coq proof development system. Although he is not the first to build a verified translator, Leroy’s compiler is notable and exciting for three key reasons: First, it maps a useful source language (a significant subset of C) to PowerPC assembly, making it directly usable for a range of embedded applications. Second, his compiler includes a number of analyses and optimizations, such as liveness analysis and graph-coloring register allocation, that makes the resulting code competitive with gcc -0. Third, the proof of correctness is mechanically checked, yielding the highest level of assurance we can hope to achieve. In short, developers can be assured that, in spite of optimization, the output of Leroy’s compiler will behave the same as the source code input.

There are a number of hidden contributions to this work, beyond the construction of a fully verified, optimizing compiler: The compiler was built in a modular, pipelined fashion as a series of translations between well-specified intermediate languages, making it possible to add new optimizations or reuse components for other projects. For example, the specification for the C-subset can be reused to build new verified tools, such as a source-level analysis.

The compiler also demonstrates judicious use of translation validation in lieu of full verification. With translation validation, we can use unverified components to compute something (for example, an assignment of variables to registers) and need only check the output is valid (no interfering variables are assigned to the same register). Only the checker must be verified to ensure soundness, and this is often much easier than validating a full analysis and transformation. Translation validation is one engineering technique that can drastically reduce the burden of building verified systems.

This paper also shows how far we must go before verification becomes routinely feasible for production compilers or any other setting. Foremost is the cost of constructing and maintaining the proof as the code evolves. Leroy’s proof of correctness is about five to six times as big as the compiler itself, making it difficult to significantly change the code without breaking the proof. However, the proof was constructed largely by hand, and for the most part, does not take advantage of semi-automated decision procedures or proof search, a research area that has seen tremendous progress over the past decade. Indeed, work by other researchers following Leroy’s lead suggests we can potentially cut the size of the proofs by up to an order of magnitude.

Perhaps the biggest challenge we face is specification. Compilers have a fairly clean notion of "correctness" (the output code should behave the same as the input code), but most software systems do not. For example, what does it mean for an operating system or Web browser to be correct? At best we can hope to formalize some safety and security properties that these systems should obey, and be willing to adapt these properties as our understanding of failures and attacks improves. In turn, this demands a verification architecture that allows specifications to be modified and adapted almost as frequently as the code. Fortunately, verified compilers make it possible to do this sort of adaptation using high-level languages without sacrificing assurance for the generated machine code.

Consequently, I think we are on the verge of a new engineering paradigm for safety- and security-critical software systems, where we rely upon formal, machine-checked verification for certification, instead of human audits. Leroy’s compiler is an impressive step toward this goal.

Footnotes

DOI: http://doi.acm.org/10.1145/1538788.1538813