Computing Applications Research highlights

Safe to the Last Instruction: Automated Verification of a Type-Safe Operating System

Posted Dec 1 2011

Abstract
1. Introduction
2. Tools For Building a Safe OS
3. The Nucleus Interface
4. The Verified Nucleus
5. Kernel
6. Measurements
7. Related Work
8. Conclusion and Future Work
References
Authors
Footnotes
Figures

Typed assembly language (TAL) and Hoare logic can be used to verify the absence of many kinds of errors in low-level code. We use TAL and Hoare logic to achieve highly automated, static verification of the safety of a new operating system called Verve. We have developed techniques and tools to mechanically verify the safety of every assembly-language instruction in the operating system, runtime system, drivers, and applications (in fact, every part of the system software except the boot loader). Verve consists of a “Nucleus” that provides primitive access to hardware and memory, a kernel that builds services on top of the Nucleus, and applications that run on top of the kernel. The Nucleus, written in verified assembly language, implements allocation, garbage collection, multiple stacks, interrupt handling, and device access. The kernel, written in C# and compiled to TAL, builds higher-level services, such as preemptive threads, on top of the Nucleus. A TAL checker verifies the safety of the kernel and applications. A Hoare-style verifier with an automated theorem prover verifies both the safety and correctness of the Nucleus. Verve is, to the best of our knowledge, the first operating system mechanically verified to guarantee both type and memory safety. More generally, Verve’s approach demonstrates a practical way to mix high-level typed code with low-level untyped code in a verifiably safe manner.

1. Introduction

High-level computer applications build on services provided by lower-level software layers, such as operating systems and language runtime systems. These lower-level software layers should be reliable and secure. Without reliability, users endure frustration and potential data loss when the system software crashes. Without security, users are vulnerable to attacks from the network, which often exploit low-level bugs such as buffer overflows to take over a user’s computer. Unfortunately, today’s low-level software still suffers from a steady stream of bugs, often leaving computers vulnerable to attack until the bugs are patched.

Many projects have proposed using safe languages to increase the reliability and security of low-level systems. Safe languages ensure type safety and memory safety: accesses to data are guaranteed to be well-typed and guaranteed not to overflow memory boundaries or dereference dangling pointers. This safety rules out many common bugs, such as buffer overflow vulnerabilities. Unfortunately, it is difficult to express a complete computer system entirely in a safe language, because safe languages deliberately omit unsafe, low-level features, such as explicit memory deallocation. To perform low-level tasks like memory management, a safe language usually relies on a runtime system written in an unsafe language (e.g., C), and any bugs in this runtime system can undermine the safety of the entire language. For example, such bugs have left popular Web browsers, including Mozilla and Internet Explorer, open to attack.¹⁰

This paper presents Verve, an operating system and run-time system that we have verified to ensure type and memory safety. Verve has a simple mantra: every assembly-language instruction in the software stack must be mechanically verified for safety. This includes every instruction of every piece of software except the boot loader: applications, device drivers, thread scheduler, interrupt handler, allocator, garbage collector, etc. Because of this, Verve does not have to trust a high-level language compiler to enforce safety, nor does it have to rely on unverified library code.

The goal of formally verifying low-level OS and runtime system code is not new. Nevertheless, very little mechanically verified low-level OS and runtime system code exists, and that code still requires man-years of effort to verify.^5,8 This paper argues that recent programming language and theorem-proving technologies reduce this effort substantially, making it practical to verify strong properties throughout a complex system. The key idea is to split a traditional OS kernel into two layers: a critical low-level “Nucleus,” which exports essential runtime abstractions of the underlying hardware and memory, and a higher-level kernel, which provides more fully fledged services. Because of these two distinct layers, we can leverage two distinct automated technologies to verify Verve: TAL (typed assembly language¹¹) and automated theorem provers. Specifically, we verify the Nucleus using automated theorem proving (based on Hoare Logic) and we ensure the safety of the kernel using TAL (generated from C#).

A complete Verve system consists of a Nucleus, a kernel, and one or more applications. We wrote the kernel and applications in safe C#, which is automatically compiled to TAL. An existing TAL checker³ verifies this TAL code (again, automatically). We wrote the Nucleus directly in assembly language, hand-annotating it with assertions (preconditions, postconditions, and loop invariants). An existing Hoare-style program verifier called Boogie¹ verifies the assembly language against a specification of safety and correctness. This ensures the safety and correctness of the Nucleus’s implementation, including safe interaction with the TAL code and safe interaction with hardware (including memory, interrupts, timer, keyboard, and screen). Boogie relies on Z3,⁴ an automated theorem prover, to check that the assertions are satisfied. Writing the assertions requires human effort, but once they are written, we can use Boogie and Z3 to verify them completely automatically. As a result, the Verve Nucleus requires only 23 lines of proof annotation per executable statement. Although it is difficult to compare annotation burdens across systems that use different proof environments and programming languages, similar projects based on interactive theorem provers^5,8 have required more than 10 lines of proof annotation per line of code.

Verve boots and runs on real, off-the-shelf x86 hardware, and provides efficient support for realistic language features, including classes, virtual methods, arrays, and preemptive threads. Nevertheless, the current Verve system is still small compared to commodity operating systems and has many limitations. It lacks support for many C# features: exception handling, for example, is implemented by killing a thread entirely, rather than with try/catch. It lacks dynamic loading of code. It runs on only a single processor. Although it protects applications from each other using type safety, it lacks a more comprehensive isolation mechanism between applications such as Java Isolates or C# AppDomains. The verification does not guarantee termination. Finally, Verve uses verified garbage collectors⁷ that are stop-the-world rather than incremental or real time, and Verve keeps interrupts disabled throughout the collection. Except for multi-processor support, none of the limitations in Verve’s present implementation are fundamental.

We expect that with more time, the high degree of automation in Verve’s verification will allow Verve to scale to a more realistic feature set, such as a large library of safe code and a verified incremental garbage collector. Indeed, we have already ported about 35,000 lines of safe C# code to run on top of Verve, including standard C# libraries, device drivers, and implementations of several Internet protocols.

In this paper, we describe our verification tools (Section 2), the interface that our Nucleus exports to the rest of the kernel (Section 3), the verification of the Nucleus (Section 4), the kernel (Section 5), the time it takes to verify Verve (Section 6), and related work in systems verification (Section 7).

2. Tools For Building a Safe OS

Two verification technologies, TAL and automated theorem proving, drive Verve’s design. TAL is assembly language annotated with type information at the level of the assembly-language registers. We can use a simple type checker to verify that each assembly-language instruction in a TAL program respects the types of the instruction’s operands. For instance, the checker would reject the use of an integer as a memory address.

While TAL reasons about types, theorem provers reason about logical formulas, attempting to prove formulas valid or invalid. Automated theorem provers run with little or no human assistance, in contrast to interactive theorem provers, which can prove a wider variety of formulas but often require considerable human assistance. Modern automated theorem provers can reason about various theories, such as integer arithmetic, bitwise arithmetic, and arrays. Based on this reasoning, automated theorem provers can soundly prove deep properties about computer programs.

TAL and automated theorem proving are complementary technologies. On one hand, TAL is relatively easy to generate: because of the similarity between TAL types and high-level language types, a compiler can automatically turn high-level language code into TAL code, relying on the type annotations already present in the high-level language code. This enables TAL to scale easily to large amounts of code. Verve uses the Bartok compiler,³ which automatically generates TAL code from type-safe C# code.

On the other hand, we can use automated theorem provers to verify deeper logical properties about the code than a typical TAL type system can express, using a methodology discussed by Turing¹² and described by Floyd and Hoare in the 1960s, now commonly known as “Hoare logic.” In this methodology, a programmer annotates various points in the program, such as procedure entry points and loop entry points, with annotations describing the state of the machine. Such annotations are similar to type annotations, but specify properties of variables in much greater detail than usually found in type annotations. For example, a type annotation might merely say that registers eax and ebx both have type int, while a Hoare-style annotation might specify a precise formula about the values in eax and ebx, such as “eax >= 10 && eax + ebx < 20.” Because of this level of detail, writing these annotations requires substantial programmer effort.

To exploit the tradeoff between TAL and automated theorem proving, we decided to split the Verve operating system code into two parts, shown in Figure 1: a Nucleus, verified with Hoare logic and automated theorem proving, and a kernel, verified with TAL. The relative difficulty of Hoare logic motivated the balance between the two parts: only the functionality that we could not use TAL to verify as safe went into the Nucleus; all other code went into the kernel.

The Nucleus’s source code is not expressed in TAL, but rather in Boogie’s programming language, called BoogiePL (or just Boogie), so that the Boogie verifier can check it. Since the Nucleus code consists of assembly-language instructions, these assembly-language instructions must appear in a form that the Boogie verifier can understand. As described in detail below, we decided to encode assembly-language instructions as BoogiePL statements, so that the Boogie verifier can check that each instruction’s requirements are satisfied. After Boogie verification, a separate tool called “BoogieAsm,” developed for an earlier project,⁷ extracts standard assembly-language instructions from the BoogiePL code. A standard assembler then turns these instructions into an object file.

A bug in the Boogie checker or TAL checker could allow unsafe code to pass into the Verve system. Therefore, we currently must trust that these checkers are correct; they are part of Verve’s “trusted computing base.” Figure 2 shows the various components of Verve’s trusted computing base. The only trusted components are the tools used to verify, assemble, link, and boot the verified Nucleus and kernel. Note that although we rely on various compilers (the Bartok compiler generates TAL code, while a separate compiler called “Beat” generates much of the BoogiePL code), none of our compilers are part of our trusted computing base: we do not need to trust the compilers to ensure the correctness of the Nucleus and safety of the Verve system as a whole.

The trusted computing base includes the specification of correctness for the Nucleus’s BoogiePL code. This includes specifications of the behavior of functions exported by the Nucleus, shown in Figure 1. (For example, the specification of “YieldTo” ensures that the Nucleus sets the stack pointer to the top of the correct stack during a yield.) It also includes specifications for assembly-language instructions and for interaction with hardware devices and memory; we took some of these specifications from existing work,⁷ and wrote some of them from scratch. All Boogie specifications are written as logical formulas in the BoogiePL language.

By expressing and checking properties at a low level (assembly language), we can ensure nontrivial properties with high confidence. The bulk of this paper focuses on these properties, with an emphasis on the specification and verification of the Nucleus’s correctness properties. The next two sections discuss the Nucleus’s design and verification.

3. The Nucleus Interface

The core of our verification is the Nucleus, which provides a verified interface to the low-level functionality of the operating system. We verify the Nucleus using Hoare logic in Boogie, based on a trusted specification for x86 assembly-language instructions. In Verve, all access to low-level functionality must occur through the Nucleus—the kernel’s TAL code and application’s TAL code can access low-level functionality only indirectly, through the Nucleus. For example, TAL code cannot directly access devices. Furthermore, even though TAL code can directly read and write words of memory, it can read and write only words designated as safe-for-TAL by the Nucleus’s garbage collector.

The Nucleus consists of a minimal set of functions necessary to support the TAL code that runs above it. We wanted a minimal set because even with an automated theorem prover, Hoare-style verification is still hard work; less code in the Nucleus means less code to verify. At the same time, the set has to guarantee safety in the presence of arbitrary TAL code; it can assume that the TAL code is well typed, but can make no further assumptions about the behavior of the TAL code. For example, when an interrupt occurs, the Nucleus tries to transfer control to a designated TAL interrupt handler. The Nucleus cannot assume that this handler is in the correct state to handle the interrupt, and must therefore check the handler’s state at runtime.

One design decision greatly simplified the Nucleus: following a design used in recent microkernels,^6,8 no Nucleus function ever blocks. In other words, every Nucleus function performs a finite (and usually small) amount of work and then returns. The Nucleus may, however, return to a different thread than the thread that invoked the function. This allows the kernel built on top of the Nucleus to implement blocking thread operations, such as waiting on a semaphore.

Another design decision simplified reasoning about the Nucleus: following the approach taken by the recent verified L4 microkernel, seL4,⁸ Verve keeps interrupts disabled throughout the execution of any single Nucleus function. (On the other hand, interrupts may be enabled during the TAL kernel’s execution, with no loss of safety.) Since Nucleus functions do not block, Verve still guarantees that eventually, interrupts will always be re-enabled, and usually will be re-enabled very quickly. However, Verve’s current implementation sacrifices real-time interrupt handling because of one particularly long function: “GarbageCollect,” which performs an entire stop-the-world garbage collection. In the future, we hope to improve real-time behavior by using a verified incremental collector.¹⁰

Such design decisions led us to a small Nucleus API consisting of just 20 functions, all shown in Figure 1. These 20 functions, implemented with a total of about 1500 x86 instructions, include memory management (AllocObject, AllocVector, GarbageCollect, readField, writeField, readStack, and writeStack), stack management (GetStackState, ResetStack, YieldTo, Throw), device access (VgaTextWrite, TryReadKeyboard, StartTimer, SendEoi), fault handling (FaultHandler, ErrorHandler, FatalHandler, and InterruptHandler), and startup (NucleusEntryPoint). Most of the functions are intended for use by only the kernel, not by applications. However, applications may call AllocObject, AllocVector, and Throw directly.

4. The Verified Nucleus

To verify that the Nucleus behaves correctly, we have to specify what correct behavior is. Formally, this specification consists of preconditions and postconditions for each of the 20 functions exported by the Nucleus (Figure 1). The preconditions reflect the guarantees made by other components of the system when calling the Nucleus. For example, the precondition to NucleusEntryPoint describes the state of memory when the Nucleus begins execution; the (trusted) boot loader is responsible for establishing this precondition. The preconditions for functions exported to the kernel and applications describe the state of registers and the current stack when making a call to the Nucleus; the (trusted) TAL checker is responsible for guaranteeing that these preconditions hold when the (untrusted) kernel and applications transfer control to the Nucleus. Nucleus postconditions describe changes to the Nucleus state and reflect memory and guarantees the Nucleus makes to the rest of the kernel. Because preconditions are relatively weak for certain functions, the Nucleus must occasionally perform run-time checks to validate the values passed from the kernel and applications.

The Nucleus specification describes what the Nucleus must do, but does not specify exactly how the Nucleus must be implemented. For example, the Verve specification of garbage collection does not specify which algorithm the garbage collector should implement. Instead, following the approach of McCreight et al.,¹⁰ the specification just says that the garbage collector must ensure that the stack frames and heap objects contain the correct data, with no dangling pointers. We have built a verified mark-sweep and a verified copying collector for Verve, both obeying this same specification.

The Nucleus interacts with five components: memory, hardware devices, the boot loader, interrupt handling, and TAL code (kernel and application code). Memory and hardware devices export functionality to the Nucleus, such as the ability to read memory locations and write to hardware devices. The verification process ensures that the Nucleus satisfies the preconditions to each operation on memory and hardware. In turn, the Nucleus exports functionality to the boot loader (the Nucleus entry point), the interrupt handling (the Nucleus’s interrupt handlers), and the TAL code (AllocObject, YieldTo, etc.).

4.1. Specification logistics

We express Verve’s specification as first-order logical formulas in BoogiePL.¹ These formulas follow C/Java/C# syntax and consist of:

Arithmetic operators: +, -, *, >, ==, ! =, ...
Boolean operators: !, &&, ||, ==>, ...
Variables: foo, Bar, old(foo), ...
Boolean constants: true, false
Integer constants: 5, ...
Bit vector constants: 5bv16, 5bv32, ...
Function application: Factorial(5), Max(3,7), IsOdd(9), ...
Array indexing: foo[3], ...
Array update: foo[3:= Bar], ...
Quantified formulas: ( i: int:: foo[i] = = Factorial(i)), ...

BoogiePL bit vectors correspond to integers in C/Java/C#, which are limited to numbers that fit inside a fixed number of bits. BoogiePL integers, on the other hand, are unbounded mathematical integers. BoogiePL arrays are unbounded mathematical maps from some type (usually integers) to some other type. Unlike arrays in C/Java/C#, BoogiePL arrays are immutable values (there are no references to arrays and arrays are not updated in place). An array update expression a[x:= y] creates a new array, which is equal to the old array a at all locations except x, where it contains a new value, y. For example, (a[x:= y])[x] == y and (a[x:= y]) [x + 1] == a[x + 1].

BoogiePL procedures have preconditions and postconditions, written as BoogiePL logical formulas:

In this example, the procedure P can be called only in a state where global variable a is less than global variable b, and the parameter x is less than the parameter y. Upon exit, the procedure’s postconditions ensure that a is still less than b, and that a is equal to x plus the old version of a (before P executed). Note that the procedure must explicitly reveal all the global variables that it modifies ("modifies a, b;" in this example), so that callers to the procedure will be aware of the modifications.

The Boogie tool relies on the Z3⁴ automated theorem prover. Z3 automatically checks logical formulas involving linear integer arithmetic (addition, subtraction, comparison), arrays, bit vectors, and functions. Z3 checks integer formulas faster than it checks bit-vector formulas, so we chose integers over bit vectors where possible. Z3 also checks quantified formulas (formulas with forall and exists); however, Z3 relies on programmer-supplied hints in the form of triggers to help with quantifiers, since checking quantified formulas with arithmetic and arrays is undecidable in general.

For each function exported to the boot loader, interrupt handling, and TAL code, the Nucleus implements a BoogiePL procedure whose specification is given in terms of “requires,” “ensures,” and “modifies” clauses. The Nucleus implements these procedures in terms of more primitive hardware procedures: each BoogiePL procedure exported from the memory and hardware devices to the Nucleus corresponds to exactly one assembly-language instruction, such as an instruction to read a single memory location or write to a single hardware register. The rest of this section presents specifications for various BoogiePL procedures in Verve, and describes how the Nucleus implementation is verified against these specifications.

4.2. Memory

Verve’s initial memory layout is set by the boot loader. Verve uses an off-the-shelf boot loader, which sets up an initial virtual-memory address space (i.e., it sets up a page table), loads the executable image into memory, and jumps to the executable’s entry point, passing detailed information about the memory layout to the entry point. The boot-loader-supplied address space simply maps virtual memory directly to physical memory, except for a small range of low addresses that are left unmapped (to catch null pointer dereferences). A traditional operating system would create new virtual memory address spaces to protect applications from each other. Because Verve guarantees type safety, however, it can rely on type safety for protection and keep the initial boot-loader-supplied address space.

Verve’s mapped address space consists of three parts. First is the memory occupied by the executable image, including code, static fields, method tables, and memory layout information for the garbage collector. Verve may read this memory, but may write to only the static fields, not the code, method tables, or layout information. Second, the Verve specification reserves the memory just above the executable image for the interrupt table. Verve may write to the table, but it can write only values that obey the specification for interrupt handlers. Third, the remaining memory above the interrupt handler is general-purpose memory, free for arbitrary use. The specification describes the state of generalpurpose memory using a global variable Mem, which is an array that maps integer byte addresses to integer values. For any 4-byte-aligned address i in general-purpose memory, Mem [i] contains the 32-bit memory contents stored at address i, represented as an integer in the range 0 … 2³² 1.

Each part of memory exports its own access functions to the Nucleus. The general memory exports two operations to the Nucleus, Load and Store:

Each of these two operations requires a 4-byte-aligned pointer (“Aligned (…) “) to memory inside the general-purpose memory region (“memAddr (…) “). The loaded or stored value must be in the range 0 … 2³² – 1 (“word (…) “). Any Store operation updates the contents of Mem, so that subsequent Load operations are guaranteed to see the updated value. Loads and stores have an additional side effect, noted in the modifies clause: they modify the current instruction pointer (program counter), “Eip.“

4.3. Hardware devices

Verve supports four basic hardware devices: a programmable interrupt controller (PIC), a programmable interval timer (PIT), a VGA text screen, and a keyboard. Verve specifies the interaction with this hardware using unbounded streams of events. The Nucleus delivers events to the PIC, PIT, and screen, and it receives events from the keyboard. For the screen, the events are commands to draw a character at a particular position on the screen. For the keyboard, events are keystrokes received from the keyboard. For the PIC and PIT, particular sequences of events initialize interrupt handling and start timers.

We present the keyboard specification as an example. Verve represents the stream of events from the keyboard as an immutable array KbdEvents mapping event sequence numbers (represented as integers, starting from 0) to events (also represented as integers). As the Nucleus queries the keyboard, it discovers more and more events from the stream. Two indices into the array, KbdAvailable and KbdDone, indicate the state of the Nucleus’s interaction with the keyboard. Events 0 ... KbdDone-1 have already been read by the Nucleus, while events KbdDone ... KbdAvailable-1 are available to read but have not yet been read.

Two operations, KbdStatusIn8 and KbdDataIn8, query the keyboard. Each of these procedures represents a single x86 assembly-language 8-bit I/O instruction, and BoogieAsm translates each call to these procedures into a single x86 “in” instruction. By invoking KbdStatusIn8, the Nucleus discovers the current state of KbdAvailable and KbdDone. If this operation places a 0 in the eax register’s lowest bit, then no events are available; if the operation places a 1 in eax’s lowest bit, then at least one event is available. If the Nucleus can prove that at least one event is available (KbdAvailable > KbdDone), it may call KbdDataIn8 to receive the first available event.

Given primitive x86 operations like Load, Store, KbdStatusIn8, and KbdDataIn8, we can implement and verify the procedures that make up the Nucleus. We illustrate this process with a small, but complete, example—the verified source code implementing TryReadKeyboard from Figure 1, along with a portion of its specification:

TryReadKeyboard‘s specification requires that the Nucleus return a keystroke (in the range 0255) if one is available, and otherwise return the value 256. The implementation of the TryReadKeyboard function calls KeyboardStatusIn8, performs a bitwise AND operation to discover the status, and branches based on the status. To verify that the implementation meets the specification, we simply run the Boogie tool on TryReadKeyboard‘s implementation. Boogie queries the Z3 theorem prover to check that the procedure satisfies its postconditions, and that all calls inside the procedure satisfy the necessary preconditions. Given the BoogiePL source code, this process is entirely automatic, requiring no scripts or human interactive assistance to guide the theorem prover.

BoogieAsm checks that each statement in the verified BoogiePL code corresponds to a simple, predetermined sequence of 0, 1, or 2 assembly-language instructions (e.g., “call eax:= And (eax, 1)” corresponds to “and eax, 1“), and then transforms the BoogiePL code into valid assembly code:

Note that some variables in the BoogiePL code, like KbdEvents, KbdAvailable, and KbdDone, are “specification variables” that exist only during verification, and do not exist in the generated assembly code.

4.4. Stacks and threads

Verve supports preemptive multithreading, which requires periodic switching between stacks. This stack switching requires the Nucleus specification to be explicit about the flow of control between the kernel and the Nucleus. Most Nucleus procedures have a simple control flow—the Nucleus procedure performs some work, and then uses the x86 ret instruction to return directly to the caller that called the Nucleus procedure (see TryReadKeyboard in the previous section, e.g.). However, a few Nucleus procedures return in more complicated ways. Interrupt handlers, for example, use a special x86 “return-from-interrupt” instruction, rather than the standard x86 ret instruction. More interestingly, several procedures, including YieldTo and interrupt handlers, may return to a caller in a different stack.

Verve uses a specification variable "RET" to specify how each procedure must return. RET equals one of two values: ReturnToAddr(i), which specifies that the procedure must perform a normal return (the x86 ret instruction) to return address i, or ReturnToInterrupted (i, cs, eflags), which specifies that the procedure must perform an interrupt return (the x86 iretd instruction) to return address i, restoring code segment cs and status flags eflags. Most Nucleus procedures are required to return to the return address pushed on the stack, pointed to by the stack pointer esp:

The Nucleus entry point, on the other hand, does not return to its caller (the boot loader), but instead returns to the TAL code located at address KernelEntryPoint, so that the kernel begins execution after the Nucleus has finished initialization:

The kernel scheduler calls the YieldTo procedure to switch stacks. For example, the kernel’s timer interrupt handler calls YieldTo to preempt one thread and switch to another thread. The exact behavior of YieldTo depends on the state of the target stack that is being yielded to. Each stack may be in one of four states at any time: empty, interrupted, yielded, or running. When switching to an empty stack (a stack with no stack frames), YieldTo is required to switch the stack pointer to the empty stack and then return to KernelEntryPoint, which starts a new thread running in the empty stack. When switching to a stack that had been interrupted earlier, YieldTo is required to use return-from-interrupt to return control to the interrupted instruction running in the interrupted stack. To switch to a stack that had earlier voluntarily yielded, YieldTo uses an ordinary return instruction. Finally, switching to the currently running thread has no effect; in this case, YieldTo simply returns:

In the specification above, stacks are numbered starting from 0, where S contains the current running stack and s contains the target stack to which YieldTo is switching. The array StackState maps stacks to their states (empty, interrupted, yielded, or running). The Nucleus’s interrupt handler “InterruptHandler” has a specification similar to YieldTo‘s specification, except that its target stack is always stack number 0, which runs the kernel’s TAL code for scheduling and interrupt processing.

5. Kernel

The Verve Nucleus implements low-level primitives to manage memory, switch stacks, and access hardware, but by itself the Nucleus does not provide the high-level abstractions that applications expect from an operating system. Therefore Verve also includes a simple kernel on top of the Nucleus, written in C# and compiled to TAL. This kernel includes standard libraries for thread synchronization and scheduling. It also schedules garbage collections, bringing each thread to a safe point before invoking the Nucleus garbage collector.

The key novelty of Verve’s kernel is that it is written entirely in safe code, relying on the Nucleus to perform operations that require unsafe code in traditional operating systems, such as memory deallocation and context switching. In general, the kernel decides policies, such as when a particular thread should run or when a garbage collection should take place, while the Nucleus implements mechanisms like context switching and garbage collection. An incorrect policy decision might cause incorrect behavior; for example, an incorrect scheduler implementation might fail to schedule a thread. However, an incorrect kernel implementation cannot violate the system’s type safety.

As an example of a kernel policy, the current kernel implements round-robin preemptive threading, allowing threads to block on semaphores. The kernel manages the threads by keeping each thread in a queue, where one queue holds threads ready to run, one queue holds threads awaiting garbage collection, each semaphore contains a queue of blocked threads, and so on. Each queue is implemented as a linked list using standard C# objects, but queues could be implemented in other ways (e.g., using a tree or array to implement priority queues) without requiring changes to the Nucleus. Furthermore, other policies, such as priority-based scheduling or deadline-based scheduling, could be implemented with no changes to the Nucleus.

6. Measurements

This section summarizes Verve’s performance, quantifies the size of the Nucleus implementation (including annotations), and describes the time taken to verify Verve, for mechanical verification and in terms of person-months.

First, we wrote two simple micro-benchmarks to exercise the Nucleus’s stack management. We wrote the benchmarks in C#, compiled them to TAL, verified the TAL code, and linked them with the kernel and Nucleus. We then ran them on a 1.8 GHz AMD Athlon 64 3000+ with 1GB RAM, using the processor’s cycle counters to measure time and averaging over multiple iterations, after warming the caches. We measured for two configurations—Verve built with a copying collector and Verve built with a mark-sweep (MS) collector:

The YieldTo benchmark shows that the Verve Nucleus requires 98 cycles to switch from one stack to another and back (49 cycles per invocation of YieldTo). The kernel builds thread scheduling and semaphores on top of the raw Nucleus YieldTo operation. Using semaphore wait and signal operations, it takes 216 cycles to switch from one thread to another and back (108 cycles per thread switch). The wait/signal performance is comparable to the round-trip IPC performance of fast microkernels such as L4 (242 cycles on a 166 MHz Pentium⁹) and seL4 (448 cycles on an ARM processor⁸), although in fairness, IPC involves an address space switch as well as a thread switch.

We next present the size of various parts of the Nucleus specification and implementation. All measurements are lines of BoogiePL code, after removing blank lines and comment-only lines. The following table shows the size of various portions of the trusted specification:

Overall, 1185 lines of BoogiePL is fairly large, but most of this is devoted to definitions about the hardware platform and memory layout. The GC table and layout information, originally defined by the Bartok compiler, occupies a substantial fraction of the specification. The specifications for all the functions exported by the Nucleus total 239 + 215 = 454 lines.

We measured the size of the Nucleus implementation for two configurations of Verve, one with the copying collector and one with the mark-sweep collector (note that the trusted specifications are the same for both collectors); 1610 lines of BoogiePL are shared between the two configurations:

In total, each configuration contains about 4500 lines of BoogiePL. From these, BoogieAsm extracts about 1400 x86 instructions. This corresponds roughly to a 3-to-1 ratio (or 4-to-1 ratio, if the specification is included) of BoogiePL to x86 instructions (or, roughly, 2-to-1 or 3-to-1 ratio of nonexecutable annotation to executable code). This is about an order of magnitude fewer lines of annotation and script than related projects.^{5, 8} The choice of using Boogie/Z3 and first-order logic, rather than a less-automated interactive proof system designed for non-first-order logics, is the key reason for this relatively small amount of annotation: since Verve’s annotations are written in BoogiePL’s first-order logic, the Z3 first-order logic theorem prover is able to automatically prove properties that require thousands of lines of manual scripts in interactive proof systems.

It takes 272s for the Boogie/Z3 tools to verify all the Nucleus components, including both the mark-sweep and copying collectors, on a 2.4 GHz Intel Core2 with 4GB of memory. The vast majority of this time is spent on verifying the collectors; only 33s were required to verify the system’s other components.

This small verification time gave us the freedom to experiment with different designs. For example, mid-way through the project, we switched from an implementation based on blocking Nucleus calls to an implementation based on non-blocking Nucleus calls. We were able to make such changes in days rather than months, because we could make minor changes to large, Nucleus-wide invariants and then run the automated theorem prover to quickly re-verify the entire Nucleus. In the end, the Verve design, implementation, and verification described in this paper took just 9 person-months, spread between two people.

7. Related Work

The Verve project follows in a long line of operating system and runtime system verification efforts. More than 20 years ago, the Boyer-Moore mechanical theorem prover was used to verify a small operating system (Kit) and a small high-level language implementation,² although the Kit OS was too limited to run on commodity hardware and to support standard programming languages.

More recently, the seL4 project verified all of the C code for an entire microkernel.⁸ The seL4 microkernel contains 8700 lines of C code, substantially larger than earlier verified operating systems like Kit. This allows seL4 to implement realistic primitives for page table management, multithreading, capabilities, and message passing, so that it can securely run realistic user-mode applications, written in standard languages like C, on real hardware. The features supported by seL4 are comparable, though not identical, to those supported by Verve: seL4 pages are analogous to Verve objects, seL4 capabilities are analogous to Verve object references, seL4 messages are analogous to Verve method invocations, and seL4 threads are similar to Verve threads. The verified seL4 microkernel is substantially larger than the verified Verve Nucleus (8700 lines of C vs. 1400 x86 instructions). On the other hand, the verification effort required by seL4 was larger than the effort required by Verve: they report 20 person-years of research devoted to developing their proofs, including 11 person-years specifically for the seL4 code base. The proof required 200,000 lines of Isabelle scripts—a 20-to-1 script-to-code ratio. We hope that while seL4 demonstrates that realistic microkernels are within the reach of interactive theorem proving, Verve demonstrates that automated theorem proving can provide a less time-consuming alternative to interactive theorem proving for realistic systems software verification. The FLINT project sets an ambitious goal to build foundational certified machine code, where certification produces a proof about an executable program, and a very small proof checker can be used to verify the proof.⁵ Such a system would have a much smaller trusted computing base than Verve. So far, such a foundational approach has been labor-intensive, limiting the scope of the current implementation, which currently does not support standard programming languages and does not have more than a few hundred lines of application code. We hope that future advances in theorem proving technology will combine foundational proofs with more automated verification.

8. Conclusion and Future Work

Using a combination of TAL and automated theorem proving, we have completely verified the safety of Verve at the assembly-language level, and completely verified the correctness (excepting termination) of Verve’s Nucleus at the assembly-language level. So what happens when we boot and run Verve? Since it is verified, did it run perfectly (or at least safely?) every time we ran it? Almost—the good news is that as far as we know, every execution of Verve has run in accordance with its specification. The bad news is that the specification itself may contain bugs, or at least may contain unwarranted assumptions about the world that Verve interacts with. In fact, we did encounter two violations of type safety, due to violations of the assumptions made by the specification. First, Verve initially included an off-the-shelf, unverified debugger stub, written in C++. The presence of the C++ code undermined Verve’s assumptions about memory, so we eventually decided to banish the debugger stub from Verve. Second, a linking issue caused Bartok’s relocation information for GC tables to get dropped, resulting in incorrect return addresses in the GC tables at run time. After these two issues were resolved, Verve ran without incident.

Compared to the traditional operating system development, where vast numbers of bugs are discovered only at run time, the number of bugs encountered at run time in Verve has been extremely small. Nevertheless, the fact that any bugs existed is a good motivation for trying to reduce the size and complexity of the specification, for reducing the number of trusted components (such as the debugger stub and the linker), and for more systematically testing the specification.^{7, 8} It also motivates the idea of verifying the verification tools themselves: verifying the TAL checker (which currently contains over 12,000 lines of trusted code³) against a small specification would particularly help to reduce Verve’s trusted computing base.

Based on the core Verve features described in this paper, we have implemented or ported approximately 35,000 lines of safe C# code to run on top of the Verve kernel and Nucleus. This C# code, which Bartok compiles to verifiable TAL code, includes various System libraries from .NET and support for networking protocols like ARP, IP, and UDP. To enable running networking protocols safely on real networking hardware, we recently extended Verve’s Nucleus to support a large class of PCI devices, using an IO memory management unit (IO/MMU) to protect Verve’s memory from errant devices. These extensions added less than 400 assembly-language instructions to the Nucleus, and required less than 1000 lines of BoogiePL code and just a couple man-months to implement. While many challenges remain for future work (such as verified incremental garbage collection¹⁰ and support for multicore machines), our experience so far gives us confidence that Verve’s approach will scale to larger systems and more features without requiring excessive human effort and without sacrificing type safety.

Acknowledgments

We would like to thank Jeremy Condit, Galen Hunt, Ed Nightingale, Don Porter, Shaz Qadeer, Rustan Leino, Juan Chen, Gregory Malecha, and David Tarditi for their suggestions and assistance.

Figures

Figure 1. Verve structure, showing all 20 functions exported by the Nucleus.

Figure 2. Building the Verve system: trusted, untrusted components.

Submit an Article to CACM

CACM welcomes unsolicited submissions on topics of relevance and value to the computing community.

You Just Read

Safe to the Last Instruction: Automated Verification of a Type-Safe Operating System

View in the ACM Digital Library

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and full citation on the first page. Copyright for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or fee. Request permission to publish from permissions@acm.org or fax (212) 869-0481.

DOI

10.1145/2043174.2043197

December 2011 Issue

Published: December 1, 2011

Vol. 54 No. 12

Pages: 123-131

Table of Contents

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Explore More

BLOG@CACM Nov 7 2025

What Past Computing Breakthroughs Teach Us About AI

Richa Gupta

Architecture and Hardware

BLOG@CACM Nov 7 2025

Reimagining Analytics: The Missing Link in Streaming’s Battle for Audience Loyalty

Kuljesh Puri and Pawan Anand

Architecture and Hardware

News Nov 7 2025

The AI Deskilling Paradox

Samuel Greengard

Artificial Intelligence and Machine Learning

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More