Native Client is a sandbox for untrusted x86 native code. It aims to give browser-based applications the computational performance of native applications without compromising safety. Native Client uses software fault isolation and a secure runtime to direct system interaction and side effects through interfaces it controls. It further provides operating system portability for binary code while supporting performance-oriented features generally absent from Web application programming environments, such as thread support, instruction set extensions such as SSE, and use of compiler intrinsics and hand-coded assembler. We combine these properties in an open architecture that encourages community review and third-party tools.
1. Introduction
As an application platform, the modern Web browser brings together a remarkable combination of resources, including seamless access to Internet resources, high-productivity programming languages such as JavaScript, and the richness of the Document Object Model (DOM) for graphics presentation and user interaction. While these strengths put the browser in the forefront as a target for new application development, it remains handicapped in a critical dimension: computational performance. Thanks to Moore’s Law and the zeal with which it is observed by the hardware community, many interesting applications get adequate performance in a browser despite this handicap. But there remains a set of computations that are generally infeasible for browser-based applications due to performance constraints, for example, simulation of Newtonian physics, computational fluid-dynamics, and high-resolution scene rendering. The current environment also tends to preclude the use of large bodies of high-quality code developed in languages other than JavaScript.
Modern Web browsers provide extension mechanisms such as ActiveX7 and Netscape Plugin Application Programming Interface (NPAPI)19 allowing native code to be loaded and run as part of a Web application. Such architectures allow plug-ins to circumvent the security mechanisms otherwise applied to Web content, while giving them access to full native performance, perhaps as a secondary consideration. Given this organization, and the absence of effective technical measures to constrain these plug-ins, browser applications that wish to use native code must rely on nontechnical measures for security, for example, manual establishment of trust relationships through pop-up dialog boxes or manual installation of a console application. Historically, these nontechnical measures have been inadequate to prevent execution of malicious native code, leading to inconvenience and economic harm.3, 22 As a consequence we believe there is a prejudice against native code extensions for browser-based applications among experts and distrust among the larger population of computer users.
While acknowledging the insecurity of the current systems for incorporating native code into Web applications, we also observe that there is no fundamental reason why native code should be unsafe. In Native Client, we separate the problem of safe native execution from that of extending trust, allowing each to be managed independently. Conceptually, Native Client is organized in two parts: a constrained execution environment for native code to prevent unintended side effects and a runtime for hosting these native code extensions through which allowable side effects may occur safely.
The main contributions of this work are
- An infrastructure for OS- and browser-portable sandboxed x86 binary modules
- Support for advanced performance capabilities such as threads, SSE instructions, compiler intrinsics, and hand-coded assembler
- An open system designed for easy retargeting of new compilers and languages
- Refinements to CISC software fault isolation, using x86 segments for improved simplicity and reduced overhead
We combine these features in an infrastructure that supports safe side effects and local communication, while preventing arbitrary file system and network access. Overall, Native Client provides sandboxed execution of native code and portability across operating systems, delivering native code performance for the browser.
The remainder of the paper is organized as follows. Section 1.1 describes our threat model. Section 2 develops some essential concepts for the NaCla system architecture and programming model. Section 3 gives additional implementation details, organized around major system components. Section 4 provides a quantitative evaluation of the system using more realistic applications and application components. In Section 5, we discuss some implications of this work. Section 6 discusses relevant prior and contemporary systems. Section 7 offers our conclusion.
Native Client should run untrusted modules from any Web site with safety comparable to systems such as JavaScript. When presented to the system, an untrusted NaCl module may contain arbitrary code and data. A consequence is that the NaCl runtime must be able to confirm that the module conforms to our validity rules (detailed below). Modules that do not conform to these rules are rejected by the system.
Once a conforming NaCl module is accepted for execution, the NaCl runtime must constrain its activity to prevent unintended side effects, such as might be achieved via unmoderated access to the native operating system’s system call interface. The NaCl module may arbitrarily combine the entire variety of behaviors permitted by the NaCl execution environment in attempting to compromise the system. It may execute any reachable instruction block in the validated text segment. It may exercise the NaCl application binary interface to access runtime services in any way: passing invalid arguments, etc. It may also send arbitrary data via our intermodule communication interface, with the communicating peer responsible for validating input. The NaCl module may allocate memory and spawn threads up to resource limits. It may attempt to exploit race conditions in subverting the system.
The next sections detail how our architecture and code validity rules create a sandbox that effectively contains NaCl modules.
2. System Architecture
A NaCl application is composed of a collection of trusted and untrusted components. Figure 1 shows the structure of a hypothetical NaCl-based application for managing and sharing photos. It consists of two components: a user interface, implemented in JavaScript and executing in the Web browser, and an image processing library (imglib. nexe), implemented as a NaCl module. In this hypothetical scenario, the user interface and image processing library are part of the application and therefore untrusted. The browser component is constrained by the browser execution environment and the image library is constrained by the NaCl container. Both components are portable across operating systems and browsers, with native code portability enabled by Native Client. Prior to running the photo application, the user has installed Native Client as a browser plug-in. Note that the NaCl browser plug-in itself is OS and browser specific. Also note it is trusted, that is, it has full access to the OS system call interface and the user trusts it to not be abusive.
When the user navigates to the Web site that hosts the photo application, the browser loads and executes the application JavaScript components. The JavaScript in turn invokes the NaCl browser plug-in to load the image processing library into a NaCl container. Observe that the native code module is loaded silently—no pop-up window asks for permission. Native Client is responsible for constraining the behavior of the untrusted module.
Each component runs in its own private address space. Inter-component communication is based on Native Client’s reliable datagram service, the IMC (Inter-Module Communications). For communications between the browser and a NaCl module, Native Client provides two options: a Simple Remote Procedure Call (SRPC) facility, and NPAPI, both implemented on top of the IMC. The IMC also provides shared memory segments and shared synchronization objects, intended to avoid messaging overhead for high-volume or high-frequency communications.
The NaCl module also has access to a “service runtime” interface, providing for memory management operations, thread creation, and other system services. This interface is analogous to the system call interface of a conventional operating system.
In this paper we use “NaCl module” to refer to untrusted native code. Note however that applications can use multiple NaCl modules, and that both trusted and untrusted components may use the IMC. For example, the user of the photo application might optionally be able to use a (hypothetical) trusted NaCl service for local storage of images, illustrated in Figure 2. Because it has access to local disk, the storage service must be installed as a native browser plug-in; it cannot be implemented as a NaCl module. Suppose the photo application has been designed to optionally use the stable storage service; the user interface would check for the stable storage plug-in during initialization. If it detected the storage service plug-in, the user interface would establish an IMC communications channel to it, and pass a descriptor for the channel to the image library, enabling the image library and the storage service to communicate directly via IMC-based services (SRPC, shared memory, etc.). In this case the NaCl module will typically be statically linked against a library that provides a procedural interface for accessing the storage service, hiding details of the IMC-level communications such as whether it uses SRPC or whether it uses shared memory. Note that the storage service must assume that the image library is untrusted. The service is responsible for ensuring that it only services requests consistent with the implied contract with the user. For example, it might enforce a limit on total disk used by the photo application and might further restrict operations to only reference a particular directory.
The Native Client architecture was designed to support pure computation. It is not appropriate for modules requiring process creation, direct file system access, or unrestricted access to the network. Trusted facilities such as storage should generally be implemented outside of Native Client, encouraging simplicity and robustness of the individual components and enforcing stricter isolation and scrutiny of all components. This design choice echoes microkernel operating system design.1,4,12
With this example in mind we will now describe the design of key NaCl system components in more detail.
Native Client is built around an x86-specific intra-process “inner sandbox.” We believe that the inner sandbox is robust; regardless, to provide defense in depth,5,8 we have also developed a second “outer sandbox” that mediates system calls at the process boundary. The outer sandbox is substantially similar to prior structures11,20 and we will not discuss it in detail here.
The inner sandbox uses static analysis to detect security defects in untrusted x86 code. Previously, such analysis has been challenging for arbitrary x86 code due to such practices as self-modifying code and overlapping instructions. In Native Client we disallow such practices through a set of alignment and structural rules that, when observed, ensure that the native code module can be disassembled reliably, such that all reachable instructions are identified during disassembly. With reliable disassembly as a tool, our validator can then ensure that the executable includes only the subset of legal instructions, disallowing unsafe machine instructions.
The inner sandbox further uses x86 segmented memory to constrain both data and instruction memory references. Leveraging existing hardware to implement these range checks greatly simplifies the runtime checks required to constrain memory references, in turn reducing the performance impact of safety mechanisms.
This inner sandbox is used to create a security sub-domain within a native operating system process. With this organization we can place a trusted service runtime subsystem within the same process as the untrusted application module, with a secure trampoline/springboard mechanism to allow safe transfer of control from trusted to untrusted code and vice versa. Although in some cases a process boundary could effectively contain memory and system-call side effects, we believe the inner sandbox can provide better security, as it effectively isolates the native system call interface from untrusted code, thereby removing it from the attack surface. We generally assume that the operating system is not defect free, such that this interface might have exploitable defects. The inner sandbox further isolates any resources that the native operating system might deliberately map into all processes, as commonly occurs in Microsoft Windows. In effect, our inner sandbox not only isolates the system from the native module, but also helps to isolate the native module from the operating system.
The sandboxes prevent unwanted side effects, but some side effects are often necessary to make a native module useful. For interprocess communications, Native Client provides a reliable datagram abstraction, the “InterModule Communications” service or IMC. The IMC allows trusted and untrusted modules to send/receive datagrams consisting of untyped byte arrays along with optional “NaCl Resource Descriptors” to facilitate sharing of files, shared memory objects, communication channels, etc., across process boundaries. The IMC can be used by trusted or untrusted modules, and is the basis for two higher-level abstractions. The first of these, the SRPC facility, provides convenient syntax for defining and using subroutines across NaCl module boundaries, including calls to NaCl code from JavaScript in the browser. The second, NPAPI, provides a familiar interface to interact with browser state, including opening URLs and accessing the DOM, that conforms to existing constraints for content safety. Either of these mechanisms can be used for general interaction with conventional browser content, including content modifications, handling mouse and keyboard activity, and fetching additional site content, substantially all the resources commonly available to JavaScript.
As indicated above, the service runtime is responsible for providing the container through which NaCl modules interact with each other and the browser. The service runtime provides a set of system services commonly associated with an application programming environment. It provides sysbrk()
and mmap()
system calls, primitives to support a malloc()/free()
interface or other memory allocation abstractions. It provides a subset of the POSIX threads interface, with some NaCl extensions, for thread creation and destruction, condition variables, mutexes, semaphores, and thread-local storage. Our thread support is sufficiently complete to allow a port of Intel’s Thread Building Blocks21 to Native Client. The service runtime also provides the common POSIX file I/O interface, used for operations on communications channels as well as Web-based read-only content. As the name space of the local file system is not accessible to these interfaces, local side effects are not possible.
To prevent unintended network access, network system calls such as connect()
and accept()
are simply omitted. NaCl modules can access the network via JavaScript in the browser. This access is subject to the same constraints that apply to other JavaScript access, with no net effect on network security.
The NaCl development environment is largely based on Linux open source systems and will be familiar to most Linux and Unix developers. We have found that porting existing Linux libraries is generally straightforward, with large libraries often requiring no source changes.
Overall, we recognize the following as the system components that a would-be attacker might attempt to exploit:
- Browser integration interface
- Inner sandbox: binary validation
- Outer sandbox: OS system-call interception
- Service runtime binary module loader
- Service runtime trampoline interfaces
- IMC communications interface
- NPAPI interface
In addition to the inner and outer sandbox, the system design also incorporates CPU and content blacklists. These mechanisms will allow us to incorporate layers of protection based on our confidence in the robustness of the various components and our understanding of how to achieve the best balance between performance, flexibility, and security.
In the next section we argue that secure implementations of these facilities are possible and that the specific choices made in our own implementation are sound.
3. Native Client Implementation
In this section, we explain how NaCl implements software fault isolation. The design is limited to explicit control flow, expressed with calls and jumps in machine code. Other types of control flow (e.g. exceptions) are managed in the NaCl service runtime, external to the untrusted code, as described with the NaCl runtime implementation below.
Our inner sandbox uses a set of rules for reliable disassembly, a modified compilation tool chain that observes these rules, and a static analyzer that confirms that the rules have been followed. This design allows for a small trusted code base (TCB),26 with the compilation tools outside the TCB, and a validator that is small enough to permit thorough review and testing. Our validator implementation requires less than 600 C statements (semicolons), including an x86 decoder and cpuid
decoding. This compiles into about 6000 bytes of executable code (Linux optimized build) of which about 900 bytes are the cpuid implementation, 1700 bytes the decoder, and 3400 bytes the validator logic.
To eliminate side effects the validator must address four subproblems:
- Data integrity: no loads or stores outside of data sandbox
- Reliable disassembly
- No unsafe instructions
- Control flow integrity
To solve these problems, NaCl builds on previous work on CISC fault isolation. Our system combines 80386 segmented memory6 with previous techniques for CISC software fault isolation.15 We use 80386 segments to constrain data references to a contiguous subrange of the virtual 32-bit address space. This allows us to effectively implement a data sandbox without requiring sandboxing of load and store instructions. VX3210 implements its data sandbox in a similar fashion. Note that NaCl modules are 32-bit x86 executables. Support for the more recent 64-bit executable model is an area of our ongoing development.
Table 1 lists the constraints Native Client requires of untrusted binaries. Together, constraints C1 and C6 make disassembly reliable. With reliable disassembly as a tool, detection of unsafe instructions is straightforward. A partial list of opcodes disallowed by Native Client includes:
syscall
andint.
Untrusted code cannot invoke the operating system directly.- All instructions that modify x86 segment state, including
lds
, far calls, etc. ret.
Returns are implemented with a sandboxing sequence that ends with a register-indirect jump.
Apart from facilitating control sandboxing, excluding ret
also prevents a vulnerability due to a race condition if the return address were checked on the stack. A similar argument requires that we disallow memory addressing modes on indirect jmp
and call
instructions. Native Client does allow the hlt instruction. It should never be executed by a correct instruction stream and will cause the module to be terminated immediately. As a matter of hygiene, we disallow all other privileged/ring-0 instructions, as they are never required in a correct user-mode instruction stream. We also constrain x86 prefix usage to only allow known useful instructions. Empirically we have found that this eliminates certain denial-of-service vulnerabilities related to CPU errata.
The fourth problem is control flow integrity, ensuring that all control transfers in the program text target an instruction identified during disassembly. For each direct branch, we statically compute the target and confirm it is a valid instruction as per constraint C6. Our technique for indirect branches combines 80386 segmented memory with a simplified sandboxing sequence. As per constraints C2 and C4, we use the CS
segment to constrain executable text to a zero-based address range, sized to a multiple of 4KB. With the text range constrained by segmented memory, a simple constant mask is adequate to ensure that the target of an indirect branch is aligned mod 32, as per constraints C3 and C5:
We will refer to this special two instruction sequence as a nacljmp
. Encoded as a 3-byte and and a 2-byte jmp
it compares favorably to previous implementations of CISC sandboxing.16,23 Without segmented memory or zero-based text, sandboxed control flow typically requires two six-byte instructions (an and
and an or
) for a total of 14 bytes.
Note that this analysis covers explicit, synchronous control flow only. Exceptions are discussed in Section 3.2.
If the validator were excessively slow it might discourage people from using the system. We find our validator can check code at approximately 30 MB/second (35.7 MB in 1.2 seconds, measured on a MacBook Pro with MacOS 10.5, 2.4 GHz Core 2 Duo CPU, warm file-system cache). At this speed, the compute time for validation will typically be small compared to download time, and so is not a performance issue.
We believe this inner sandbox needs to be extremely robust. We have tested it for decoding defects using random instruction generation as well as exhaustive enumeration of valid x86 instructions. We also have used “fuzzing” tests to randomly modify test executables. Initially these tests exposed critical implementation defects, although as testing continues no defects have been found in the recent past. We have also tested on various x86 microprocessor implementations, concerned that processor errata might lead to exploitable defects.14 We did find evidence of CPU defects that lead to a system “hang” requiring a power-cycle to revive the machine. This occurred with an earlier version of the validator that allowed relatively unconstrained use of x86 prefix bytes, and since constraining it to only allow known useful prefixes, we have not been able to reproduce such problems.
Hardware exceptions (segmentation faults, floating point exceptions) and external interrupts are not allowed, due in part to distinct and incompatible exception models in Linux, MacOS, and Windows. Both Linux and Windows rely on the x86 stack via %esp
for delivery of these events. Regrettably, since NaCl modifies the %ss
segment register, the stack appears to be invalid to the operating system, such that it cannot deliver the event and the corresponding process is immediately terminated. The use of x86 segmentation for data sandboxing effectively precludes recovery from these types of exceptions. As a consequence, NaCl untrusted modules apply a failsafe policy to exceptions. Each NaCl module runs in its own OS process, for the purpose of exception isolation. NaCl modules cannot use exception handling to recover from hardware exceptions and must be correct with respect to such error conditions or risk abrupt termination. In a way, this is convenient, as there are very challenging security issues in delivering these events safely to untrusted code.
Although we cannot currently support hardware exceptions, Native Client does support C++ exceptions.24 As these are synchronous and can be implemented entirely at user level there are no implementation issues. Windows Structured Exception Handling18 requires nonportable operating system support and is therefore not supported.
Conceptually, the service runtime is a container for hosting Native Client modules. In our research system, the service runtime is implemented as an NPAPI plugin, together with a native executable that corresponds to the process container for the untrusted module. It supports a variety of Web browsers on Windows, MacOS, and Linux. It implements the dynamic enforcement that maintains the integrity of the inner sandbox and provides resource abstractions to isolate the NaCl application from host resources and operating system interface. It contains trusted code and data that, while sharing a process with the contained NaCl module, are accessible only through a controlled interface. The service runtime prevents untrusted code from inappropriate memory accesses through a combination of x86 memory segment and page protection.
When a NaCl module is loaded, it is placed in a segment-isolated 256MB region within the service runtime’s address space. The first 128KB of the NaCl module’s address space (NaCl “user” address space) is reserved for initialization by the service runtime. The first 64KB of this 128KB region is read and write protected to detect NULL pointers and to provide for defense-in-depth against unintended 16-bit address calculations. The remaining 64KB contains trusted code that implements our “trampoline” call gate and “springboard” return gate. Untrusted NaCl module text is loaded immediately after this reserved 128KB region. The %cs
segment is set to constrain control transfers from the zero base to the end of the NaCl module text. The other segment registers are set to constrain data accesses to the 256MB NaCl module address space.
Because it originates from and is installed by the trusted service runtime, trampoline and springboard code is allowed to contain instructions that are forbidden elsewhere in untrusted executable text. This code, patched at runtime as part of the NaCl module loading process, uses segment register manipulation instructions and the far call
instruction to enable control transfers between the untrusted user code and the trusted service runtime code. Since every 0 mod 32 address in the second 64KB of the NaCl user space is a potential computed control flow target, these are our entry points to a table of system-call trampolines. One of these entry points is blocked with a hlt
instruction, so that the remaining space may be used for code that can only be invoked from the service runtime. This provides space for the springboard return gate.
Invocation of a trampoline transfers control from untrusted code to trusted code. The trampoline sequence resets %ds
andthenusesafar call
toresetthe%cs
segment register and transfer control to trusted service handlers, reestablishing the conventional fat addressing model expected by the code in the service runtime. Once outside the NaCl user address space, it resets other segment registers such as %fs
, %gs
, and %ss
to reestablish the native code threading environment, fully disabling the inner sandbox for this thread, and loads the stack register %esp
with the location of a trusted stack for use by the service runtime. Note that the per-thread trusted stack resides outside the untrusted address space, to protect it from attack by other threads in the untrusted NaCl module.
Just as trampolines permit crossing from untrusted to trusted code, the springboard enables crossing in the other direction. The springboard is used by the trusted runtime:
- To transfer control to an arbitrary untrusted address.
- To start a new POSIX-style thread.
- To start the main thread.
Alignment ensures that the springboard cannot be invoked directly by untrusted code. The ability to jump to an arbitrary untrusted address is used in returning from a service call. The return from a trampoline call requires popping an unused trampoline return address from the top of the stack, restoring the segment registers, and finally aligning and jumping to the return address in the NaCl module.
As a point of comparison, we measured the overhead of a “null” system call. The Linux overhead of 156 ns is slightly higher than that of the Linux 2.6 getpid syscall time, on the same hardware, of 138 ns (implemented via the vsyscall table and using the sysenter
instruction). We note that the user/kernel transfer has evolved continuously over the life of the x86 architecture. By comparison, the segment register operations and far calls used by the NaCl trampoline are somewhat less common, and may have received less consideration over the history of the x86 architecture.
The IMC is the basis of communications into and out of NaCl modules. The implementation is built around a NaCl socket, providing a bidirectional, reliable, in-order datagram service similar to Unix domain sockets.13 An untrusted NaCl module receives its first NaCl socket when it is created, accessible from JavaScript via the DOM object used to create it. The JavaScript uses the socket to send messages to the NaCl module, and can also share it with other NaCl modules. The JavaScript can also choose to connect the module to other services available to it by opening and sharing NaCl sockets as NaCl descriptors. NaCl descriptors can also be used to create shared memory regions.
Using NaCl messages, Native Client’s SRPC abstraction is implemented entirely in untrusted code. SRPC provides a convenient syntax for declaring procedural interfaces between JavaScript and NaCl modules, or between two NaCl modules, supporting a few simple types (e.g. int, float, char), arrays of simple types, and NaCl descriptors. Pointers are not supported. Higher-level data representations can easily be layered on top of IMC messages or SRPC.
Our NPAPI implementation is also layered on top of the IMC and supports a subset of the common NPAPI interface. Specific requirements that shaped the current implementation are the ability to read, modify, and invoke properties and methods on the script objects in the browser, support for simple raster graphics, provide the createArray()
method and the ability to open and use a URL like a file descriptor. We are currently studying some additional refinements to NPAPI for improved portability, performance and safety.b
Building NaCl Modules: We have modified the standard GNU tool chain, using version 4.2.2 of the gcc collection of compilersc and version 2.18 of binutilsd to generate NaCl-compliant binaries. We have built a reference binary from newlibe using the resulting tool chain, rehosted to use the NaCl trampolines to implement system services (e.g., read(), brk(), gettimeofday(), imc_sendmsg())
. Native Client supports an insecure “debug” mode that allows additional file-system interaction not otherwise allowed for secure code.
We modified gcc for Native Client by changing the alignment of function entries (-falign-functions)
to 32 bytes and by changing the alignment of the targets branches (-falign-jumps)
to 32 bytes. We also changed gcc to use nacljmp
for indirect control transfers, including indirect calls and all returns. We made more significant changes to the assembler, to implement Native Client’s block alignment requirements. To implement returns, the assembler ensures that call instructions always appear in the final bytes of a 32-byte block. We also modified the assembler to implement indirect control transfer sequences by expanding the nacljmp
pseudo-instruction as a properly aligned consecutive block of bytes. To facilitate testing we added support to use a longer nacljmp
sequence, align the text base, and use an and and or that uses relocations as masks. This permits testing applications by running them on the command line, and has been used to run the entire gcc C/C++ test suite. We also changed the linker to set the base address of the image as required by the NaCl loader (128KB today).
Apart from their direct use the tool chain also serves to document by example how to modify an existing tools chain to generate NaCl modules. These changes were achieved with less than 1000 lines total to be patched in gcc and binutils, demonstrating the simplicity of porting a compiler to Native Client.
Profiling and Debugging: Native Client’s open source release includes a simple profiling framework to capture a complete call trace with minimal performance overhead. This support is based on gcc’s - finstrument - functions
code generation option combined with the rdtsc
timing instruction. This profiler is portable, implemented entirely as untrusted code. In our experience, optimized builds profiled in this framework have performance somewhere between -00
and -02
builds. Optionally, the application programmer can annotate the profiler output with methods similar to printf
, with output appearing in the trace rather than stdout.
Our release also includes a modified version of gdb on Linux for Native Client debugging. The debugger recognizes the different addressing domains used by trusted and untrusted code, and independent symbol tables for both domains. Even with this support, the additional complexities of Native Client can interfere with debugging. As such we maintain a set of libraries to facilitate building both standalone and Native Client versions of a project, and commonly debug the standalone version first.
4. Experience
Performance measurements in this section are made without the Native Client outer sandbox. The outer sandbox implementations are platform-dependent, and generally use standard kernel facilities (e.g. system call ACLs on Windows, user IDs on Linux) with inherently small incremental overhead.
A primary goal of Native Client is to deliver substantially all of the performance of native code execution. NaCl module performance is impacted by alignment constraints, extra instructions for indirect control flow transfers, and the incremental cost of NaCl communication abstractions.
We first consider the overhead of making native code side effect free. To isolate the impact of the NaCl binary constraints (Table 1), we built the SPEC2000 CPU benchmarks using the NaCl compiler, and linked to run as a standard Linux binary. The worst case for NaCl overhead is CPU bound applications, as they have the highest density of alignment and sandboxing overhead. Figure 3 shows the overhead of NaCl compilation for a set of benchmarks from SPEC2000. The worst case performance overhead is crafty at about 12%, with an average of about 5% across all benchmarks. Hardware performance counter measurements indicate that the largest slowdowns are due to instruction cache misses. For crafty, the instruction fetch unit is stalled during 83% of cycles for the NaCl build, compared to 49% for the default build. Gcc and vortex are also significantly impacted by instruction cache misses.
As our current alignment implementation is conservative, aligning some instructions that are not indirect control flow targets, we hope to make incremental code size improvement as we refine our implementation. “NaCl32” measurements use statically linked binaries, 32-byte alignment, and the nacljmp
pseudo-instruction for indirect control flow transfers. To isolate the impact of the indirect control flow sequence, Figure 3 also shows “align32” results for static linking and 32-byte alignment only. These comparisons make it clear that alignment is a factor in some cases where overhead is significant. Impact from static linking and sandboxing instruction overhead is small by comparison.
The impact of alignment is not consistent across the benchmark suite. In some cases, alignment appears to improve performance, and in others it seems to make things worse. We hypothesize that alignment of branch targets to 32-byte boundaries sometimes interacts favorably with caches, instruction prefetch buffers, and other facets of processor microarchitecture. These effects are curious but not large enough to justify further investigation. In cases where alignment makes performance worse, one possible factor is code size, as mentioned above. Increases in NaCl code size due to alignment can be as much as 50%, especially in programs like the gcc SPEC2000 benchmark with a large number of static call sites. Similarly, benchmarks with a large amount of control flow branching (e.g., crafty, vortex) have a higher code size growth due to branch target alignment. The incremental code size increase of sandboxing with nacljmp
is consistently small.
Overall, the performance impact of Native Client on these benchmarks is on average less than 5%. At this level, overhead compares favorably to untrusted native execution.
We ported an internal implementation of H.264 video decoding to evaluate the difficulty of the porting effort. The original application converted H.264 video into a raw file format, implemented in about 11K lines of C for the standard GCC environment on Linux. We modified it to play video. The port required about 20 lines of additional C code, more than half of which was for error checking. Apart from rewriting the Makefile, no other modifications were required. This experience is consistent with our general experience with Native Client; legacy Linux libraries that do not inherently require network and file access generally port with minimal effort. Performance of the original and NaCl versions were comparable and limited by video frame-rate.
We profiled sdlquake-1.0.9f using the built-in “timedemo demo1” command. Quake was run at 640 × 480 resolution on a Ubuntu Dapper Drake Linux box with a 2.4 GHz Intel Q6600 quad core CPU. The video system’s vertical sync (VSYNC) was disabled. The Linux executable was built using gcc version 4.0.3, and the Native Client version with nacl-gcc version 4.2.2, both with -02
optimization.
With Quake, the differences between Native Client and the normal executable are, for practical purposes, indistinguishable. See Table 2 for the comparison. We observed very little nondeterminism between runs. The test plays the same sequence of events regardless of frame rate. Slight variances in frame rate can still occur due to the OS thread scheduler and pressure applied to the shared caches from other processes. Although Quake uses software rendering, the performance of the final bitmap transfer to the user’s desktop may depend on how busy the video device is.
5. Discussion
As described above, Native Client has inner and outer sandboxes, redundant barriers to protect native operating system interfaces. Additional measures such as a CPU blacklist and NaCl module blacklist will also be deployed.
We have developed and tested Native Client on Ubuntu Linux, MacOS, and Microsoft Windows XP. Overall we are satisfied with the interaction of Native Client with these operating systems. That being said, there are areas where better operating system support would help. As an example, popular operating systems require all threads to use a fat addressing model in order to deliver exceptions correctly. Use of segmented memory prevents these systems from interpreting the stack pointer and other essential thread state. Through better operating system segment support we could resolve this problem and provide hardware exception support in untrusted code. However, note that due to our portability requirement we could not enable exception support for untrusted modules unless all native OSes support it. This least-common- denominator effect also arises in other parts of the system, such as the 256MB address space limit for NaCl modules.
With respect to programming languages and language implementations, we are encouraged by our initial experience with Native Client and the GNU tool chain, and are looking at porting other compilers. We have also ported two interpreters, Lua and awk. While it would be challenging to support JITted languages such as Java, we are hopeful that Native Client might someday allow developers to use their language of choice in the browser rather than being restricted to JavaScript.
6. Related Work
Techniques for safely executing third-party code generally fall into four categories: system request moderation, virtualization, fault isolation, and trust with authentication.
Kernel-based mechanisms such as user-id-based access controls, systrace20 and ptrace25 are familiar facilities on Unix-like systems. Many projects have applied such mechanisms to containing untrusted code, most recently Android2 from Google and Xax9 from Microsoft Research. While they can be very effective, these approaches require dependencies on the native operating system. These dependencies in turn can interfere with portability, and expose more of the native operating system in the attack surface. Our inner sandbox design was heavily influenced by goals of portability and operating system independence.
Many research and practical systems apply abstract virtual machines to constrain untrusted code. While they commonly support ISA portability, they also tend to create a performance obstacle that we avoid by working directly with machine code. A further advantage of expressing sandboxing directly in machine code is that it does not rely on a trusted compiler or interpreter. This greatly reduces the size of the trusted computing base,26 and has a further benefit in Native Client of opening the system to third-party tool chains.
Native Client applies concepts of software fault isolation that have been extensively discussed in the research literature. Our data integrity scheme is a straightforward application of segmented memory as implemented in the Intel 80386.6 Our control flow integrity technique builds on the seminal work by Wahbe, Lucco, Anderson, and Graham,27 also applying techniques described by McCamant and Morrisett.16
Perhaps the most prevalent use of native code in Web content is via Microsoft’s ActiveX.7 ActiveX controls rely on a trust model to provide security, with controls cryptographically signed using Microsoft’s proprietary Authenticode system,17 and only permitted to run once a user has indicated they trust the publisher. This dependency on the user making prudent trust decisions is commonly exploited. ActiveX provides no guarantee that a trusted control is safe. Even when the control itself is not inherently malicious, defects in the control can be exploited, often permitting execution of arbitrary code. In contrast, Native Client is designed to prevent such exploitation, even for flawed NaCl modules.
7. Conclusion
This paper has described Native Client, a system for incorporating untrusted x86 native code into an application that runs in a Web browser. In addition to creating a barrier against undesirable side effects, Native Client enables modules that are portable both across operating systems and across Web browsers, and it supports performance-oriented features such as threading and vectorization instructions. We believe the NaCl inner sandbox is extremely robust; regardless, we provide additional redundant mechanisms to provide defense-in-depth.
In our experience we have found porting existing Linux/ gcc code to Native Client is straightforward, and that the performance penalty for the sandbox is small, particularly in the compute-bound scenarios for which the system is designed.
By describing Native Client here and making it available as open source, we hope to encourage community scrutiny and contributions. We believe this feedback together with our continued diligence will enable us to create a system that achieves improved safety over previous native code Web technologies.
Acknowledgments
Many people have contributed to the direction and the development of Native Client; we acknowledge a few of them here. The project was conceived based on an idea from Matt Papakipos. Jeremy Lau, Brad Nelson, John Grabowski, Kathy Walrath, and Geoff Pike have made valuable contributions to the implementation and evaluation of the system. Thanks also to Danny Berlin, Chris DiBona, and Rebecca Ward. Doug Evans is responsible for our GDB implementation. We thank Sundar Pichai and Henry Bridge for their role in shaping the project direction. We would also like to thank Dick Sites for his thoughtful feed back on an earlier version of this paper.
Figures
Figure 1. Hypothetical NaCl-Based application. Untrusted modules have a gray background.
Figure 2. The hypothetical application of Figure 1 with a trusted storage service.
Figure 3. SPEC2000 performance. “Align32” results are for binaries with aligned 32-byte instruction blocks. “Nacl32” results are for NaCl binaries. Performance for both is presented relative to standard compilation with static linking.
Join the Discussion (0)
Become a Member or Sign In to Post a Comment