Architecture and Hardware Contributed articles

Trust Extension For Commodity Computers

A user's trust in a single device can be extended to many other devices.

Posted Jun 1 2012

Introduction
Key Insights
Trust Extension
Bootstrapping Trust
Executing Code Securely
Leveraging Secure Code Execution
Verifiable Remote Computation
Conclusion
Acknowledgments
Sidebar: Security Features in Commodity Computers
References
Author
Footnotes
Figures

Trust Extension for Commodity Computers, illustration

As society rushes to digitize sensitive information and services, users and developers must adopt adequate security protections. However, such protections often conflict with the benefits expected from commodity computers. Consumers and businesses value commodity computers because they provide good performance and an abundance of features at relatively low cost, but attempts to construct secure systems from the ground up are expensive, time-consuming, and unable to keep up with the changing marketplace.^2,8,11,12 For example, the VAX VMM security kernel was developed over nine years (19811990), but the kernel was never deployed. This failure was due, in part, to the absence of support for Ethernet, a feature considered crucial by the time the kernel was completed but not anticipated when originally designed.¹¹

Key Insights

Improving software security is insufficient; also needed is the ability to securely verify whether a computer employs the new software.
Providing security on demand (such as via the Flicker architecture} helps balance security, performance, and features.
Verifiable computation allows a client to outsource the computation of a function and efficiently verify the results returned while keeping inputs and outputs private; constraining the way the worker/server computes the function enables such efficient verification.

Rather than build secure systems from scratch, the tension between security and features can be resolved by extending the trust users have in one device to enable them to use another commodity device or service securely without sacrificing the performance, features, or cost expected of commodity systems.²¹ Note this article focuses on average users and commodity systems rather than advanced users, special-purpose computers, or highly constrained environments (such as those in the military).

At a high level, this premise is supported by developing software, hardware, and cryptographic techniques to extend user trust in a small special-purpose hardware device to provide strong security protection for both local and remote computation on sensitive data while preserving the performance and features of commodity computers.

Included is an overview of hardware security technologies (see the sidebar “Security Features in Commodity Computers“), how to extend trust in a special-purpose mobile device to verify the security hardware in a user’s local machine, how to extend that trust in a meaningful way to software on the local machine, how to extend trust in that software to network elements, and finally how to extend that trust to remote computers where neither software nor hardware is trusted; for a detailed comparison with related work see other publications.^21,23

Trust Extension

Creating trustworthy systems requires advances on many fronts, including programming languages, operating systems, network protocols, and definitions of security. More fundamentally, both computers and users must be able to make accurate, informed trust decisions. After all, even if software improves, users must be able to determine which systems employ the new, improved software. In particular, users must be able to judge whether a system (local or remote) should be trusted before handing over sensitive data.

This article describes techniques that provide firm evidence on which to base trust decisions. Once a user decides to trust a particular device (such as a cellphone), it shows how to use that device to verify, on the user’s behalf, the trustworthiness of other devices and services (such as laptop computers and cloud-based computation). The user’s trust in the original device inspires trust in its assessment of the new device or service. If the assessment is positive, the user has essentially extended trust in the first device to include the new device or service.

Informally, this is our definition of trust: To trust an entity X with private data (or a security-sensitive task), users must believe that at no point in the future will they have cause to regret having given their data (or entrusted the task) to X.

Bootstrapping Trust

The problem of allowing users to bootstrap trust in their own personal computers is fundamental and should be easier than other potential scenarios; if users cannot establish trust in their computers, they are unlikely to be able to trust a remote computer. When working with their own computers, users can be reasonably certain those computers are physically secure; that is, an attacker has not tampered with their hardware configurations. Such an assumption aligns naturally with standard human intuition about security; a resource (such as a physical key) physically controlled by an individual is typically more secure than a resource a user might give someone else. Fortunately, the physical protection of valuable items has been a major focus of human ingenuity for millennia.

If a user’s computer is physically secure, then special-purpose secure hardware can be used to monitor and report the software state of the platform. Given the software state, users (or their agents) can decide whether the platform should be trusted. While a full-blown secure coprocessor (such as the IBM 4758)²⁷ might appeal, cost limits deployment. However, as of 2010, more than 350 million³⁰ commodity computers had been sold worldwide with a special-purpose security chip called the TPM, as described in the sidebar.²⁸ With appropriate software support, it can be used to report on the software executing on the computer. The resulting attestation can be verified by a user’s trusted device (such as a cellphone or special-purpose USB fob).²⁹ Thus, the user’s trust in the device can be extended, via the TPM, to establish trust in the software on the machine. For example, Garriss et al. used a cellphone to verify a virtual machine monitor on a kiosk computer,⁵ though a kiosk might not satisfy the hardware security assumptions discussed earlier.

However, given even a TPM-equipped machine, the question remains: How does one bootstrap trust in the TPM itself? Surprisingly, neither the TPM specifications nor the academic literature addresses this problem. They instead tend to assume the user magically possesses the TPM’s public key. Unfortunately, any straight-forward approach to trusting the TPM falls victim to a cuckoo attack^a,20,21 in which the adversary convinces the user that a TPM the adversary physically controls resides in the user’s own local computer; Figure 1a outlines a possible implementation of the cuckoo attack. Malware on the user’s local machine proxies the user’s TPM-related messages to a remote, TPM-enabled machine controlled by an attacker. Any credential the user requests from the local TPM (such as an endorsement certificate) can also be provided by the attacker’s TPM and relayed to the malware on the user’s machine.

To analyze a cuckoo attack, we developed a model^20,21 using predicate logic for bootstrapping trust in a local computer with secure hardware. The model defines a set of predicates describing the trustworthiness of people, computers, and secure hardware (such as TPMs), as well as a set of trust axioms. For example, if a trusted person P says computer C is secure, C can be seen as physically secure, thus encoding our previous assumption that people are reasonably effective at assessing and maintaining physical security.

Trying to use this model to reason about the security of the user’s local machine brings us to a logical contradiction,^20,21 namely that the local machine is both trusted and untrusted, thus capturing the essence of the cuckoo attack; the user cannot decide whether or not to trust the local machine.

From Figure 1a, it may seem a cuckoo attack can be prevented by severing the connection between the local malware and an adversary’s remote PC. The assumption is that, without a remote TPM to provide the correct responses, the infected machine must either refuse to respond or allow the true TPM to communicate with the user’s device, revealing the presence of the malware.

However, regardless of implementation, cutting off network access fails to prevent a cuckoo attack. As outlined in Figure 1b, a cuckoo attack is possible because the malware on the local machine has access to a “TPM oracle” that provides TPM-like answers without providing TPM security guarantees. Since the adversary has physical possession of his own TPM_M, he can extract its private keys and credentials and include them with the malware. Thus provisioned, the malware on the local machine can emulate TPM_M, even without network access.

The right way to prevent a cuckoo attack is to introduce a secure channel to the local TPM, instantiated through either a physically hardwired channel allowing users to connect directly to the TPM on the local machine, or a mechanism to allow the user to learn authentic cryptographic information about the local TPM, and hence establish a cryptographic secure channel.

Both the formal model and a usability assessment were used to analyze a dozen instantiations of these approaches,^20,21 including adding a special-purpose port or button to the computer, reusing existing interfaces (such as USB and FireWire), using software-only attestation, and encoding cryptographic material as a 2D barcode or serial number printed on the computer. Each involves advantages and disadvantages. In the short term, placing an alphanumeric hash of the TPM’s public key on the exterior of the computer seems to offer the best trade-off among security, usability, and deploy-ability. In the long term, the strongest security would come from a special-purpose hardware interface wired directly to the machine’s secure hardware (such as the TPM), designed so the user is unable to inadvertently connect the verifying device to another interface. This solution removes almost any opportunity for user error and does not require preservation of secrets.

Executing Code Securely

Establishing a secure connection between a user and the security hardware on the user’s computer does not, unfortunately, provide a full-featured, trustworthy execution environment. Security hardware tends to be either resource-impoverished or special-purpose (or both). Hence, the user’s trust in security hardware must be extended to the user’s entire computer.

However, establishing truly secure functionality on a general-purpose computer raises this crucial question: How can secure code execution coexist with the untrustworthy mountain of buggy yet feature-rich software common on modern computers? For example, how can a user’s keystrokes be kept private if the operating system, the most privileged software on the computer, cannot be trusted to be free of vulnerabilities? This coexistence is even more challenging due to the need to preserve the system’s existing functionality and performance.

Previous work sought to deal with the problem by running a persistent security layer, dubbed a security kernel, VMM, or hypervisor, in the computer’s most privileged mode.^8,11,12,26 The security layer was responsible for creating isolation domains for ordinary, untrusted code and for security-sensitive code. Unfortunately, the approach also involves several drawbacks; for example, the security layer’s need to interpose on hardware accesses leads to performance degradation for ordinary code and often requires eliminating access to devices too complicated to emulate (such as 3D graphics cards).⁴ Moreover, the need to run both untrusted and trusted code simultaneously can lead to security vulnerabilities (such as side-channel attacks), as well as code bloat in the security layer; for example, the initial implementation of the Xen VMM in 2003 required 42,000 lines of code⁴ but almost doubled to 83,000 lines within a few years.¹³

To avoid these drawbacks, the Flicker architecture^14,1618,21 was developed to satisfy the need for features and security. Indeed, Flicker shows these conflicting needs can be satisfied by constructing a secure execution environment on demand. When invoked for secure code execution (such as signing a certificate or authenticating to a Web site), Flicker creates an isolated environment such that none of the software executing before Flicker begins can monitor or interfere with Flicker code execution, while all traces of Flicker code execution can be eliminated before regular software execution resumes. For example, a certificate authority could sign certificates with its private key and keep the key secret, even from an adversary with control of the BIOS, OS, and DMA-enabled devices.

Secure code execution. This section describes Flicker as implemented on an AMD system (see Figure 2); the implementation is similar on an Intel system. To take advantage of Flicker, application developers must provide the security-sensitive code (called the piece of application logic, or PAL) selected for Flicker protection as an x86 binary and define its interface with the rest of their application; tools have been developed to aid in automating this process. To create a secure loader block, or SLB (supplied as an argument to the DRTM instruction, SKINIT), application developers link PAL against a code module developed to perform the steps necessary to set up and tear down the Flicker session.

Since SKINIT is a privileged instruction, when executing the resulting SLB, the application passes it to a kernel module that allocates memory, initializes various values in the SLB, and handles untrusted setup and teardown operations. The kernel module is not included in the trusted computing base (TCB) of the application, since its actions are verified.

Flicker achieves its properties through the DRTM capabilities outlined earlier. Rather than launch a VMM, it pauses the current execution environment (such as the untrusted OS), saving information about the kernel’s page tables to memory, along with the content of key registers (such as CR0, CR3, GDTR, IDTR, and TR). On multicore machines, execution must be halted on all but the bootstrap processor, and the other cores must be sent an INIT inter-processor interrupt so they respond correctly to a hand-shaking synchronization step performed during execution of SKINIT.

The Flicker session begins with execution of the SKINIT instruction, which receives the SLB selected for Flicker protection as an argument. As described earlier, SKINIT resets the CPU state, adds entries to the device-exclusion vector to disable the DMA to the memory region containing the SLB, disables interrupts (including system-management interrupts) to prevent the previously executing code from regaining control, disables debugging support, even for hardware debuggers, and extends a hash of the SLB into a PCR in the TPM. Finally, it hands control to Flicker’s initialization routine. To simplify execution of PAL code, Flicker loads the global descriptor table; loads the CS, DS, and SS registers; transitions to ring 3 (user space) execution; and calls the PAL, providing the address of PAL inputs as a parameter. When PAL execution terminates, Flicker cleans up any trace of the security-sensitive code’s execution by clearing out memory and register content. Finally, it resumes the previous execution environment. This resumption entails returning to ring 0 (kernel) execution, restoring the saved OS state, re-enabling paging interrupts and jumping back to kernel code.

A cuckoo attack is possible because the malware on the local machine has access to a “TPM oracle” that provides TPM-like answers without providing TPM security guarantees.

Since Flicker’s protections are deployed only on demand, Flicker induces no performance overhead or feature reduction during regular computer use. Limiting Flicker’s persistence also strengthens its security guarantees, since it avoids the complexity (and potential vulnerability) of solutions based on a VMM or a security kernel. Hence, the core Flicker system was implemented with a TCB of only 250 lines of code.

However, non-persistence poses challenges of its own. For instance, to enable more complex applications, TPM-based secure storage must be leveraged to maintain state across Flicker sessions. Preventing a variety of subtle attacks on this saved state requires developing additional protocols.²² A platform using Flicker can convince remote parties a Flicker session executed with a particular PAL.¹⁸ To provide result integrity, when PAL execution terminates, Flicker extends PCR 17 with hashes of the PAL’s input and output parameters. It then extends PCR 17 with a fixed public constant, providing two powerful security properties. First, it prevents other software from extending values into PCR 17 and attributing them to the PAL; the fact that SKINIT resets PCR 17 to 0 prevents malicious software from extending values before the Flicker session. Second, it revokes access to any secrets kept in the TPM’s secure storage that might have been available during PAL execution. Combining these techniques enables a PAL to communicate securely, with both secrecy and integrity protections, with a remote party through a secure cryptographic channel.

Flicker was implemented for both 32-bit Windows and Linux running on AMD and Intel CPUs (http://flickertcb.sourceforge.net) and includes useful code modules PALs can choose to include (such as for memory management, TPM operations, cryptographic operations, and secure channels). Using them, Flicker has since been applied to several broad classes of applications. To illustrate stateless applications, Flicker provides verifiable isolated execution of a kernel rootkit detector on a remote machine. To illustrate applications for which integrity protection of application state suffices, Flicker helps verify execution of a distributed computing application based on the Berkeley Open Infrastructure for Network Computing framework.³ Finally, to illustrate applications with state requiring secrecy and integrity, Flicker helps protect computations performed with the private key for a certificate authority, the passwords for an SSH server and for a full-disk encryption utility, and the content of a differentially private database.

To summarize this evaluation, invoking the SKINIT instruction requires ~48ms, while secure storage operations take ~22ms, meaning a typical Flicker session requires ~70ms, plus the time needed for the application-specific PAL to execute. Generating a TPM quote for the Flicker session is by far the slowest operation; depending on the TPM, that quote can take ~362ms756ms. Fortunately, it can be computed in parallel with untrusted code since it is performed outside the Flicker environment. In general, Flicker has little effect on the performance of untrusted code, and the OS remains stable. Frequent brief Flicker sessions have little effect on system performance. Longer-running Flicker sessions may produce input lag or dropped network packets, but our experiments indicate Flicker does not corrupt data transfers (such as between USB and disk).

Suggested architectural improvements. Overall, experiments identified two significant performance bottlenecks for minimal TCB execution on current CPU architectures: inability to execute PALs and untrusted code simultaneously on different cores of a multicore machine and slow TPM operations to protect PAL state during a context switch between secure and insecure execution.

Alleviating them involves a number of recommendations to hardware manufacturers^14,17,21 to support an execution model in which multiple PALs can execute in parallel with legacy software. The key ones are a hardware mechanism that isolates the memory pages belonging to a PAL from all other code and for a hardware context switch mechanism that efficiently suspends and resumes PALs without exposing a PAL’s execution state to other code.

A secure execution control block (SECB) is defined as a structure that holds PAL state and resource allocations for launching a PAL and for storing its state when not executing. Figure 3 outlines the proposed life cycle of a PAL. To begin execution of a PAL described by a newly allocated SECB, a new CPU instruction, secure launch, or SLAUNCH, can be added that takes as its argument the starting physical address of an SECB. It combines the hardware virtual machine management data structures on AMD and Intel CPUs with the security functionality of a DRTM operation.

To isolate memory and facilitate rapid context switches, the memory controller can maintain an access-control table with one entry per physical page, where each entry specifies which CPUs (if any) have access to the physical page. Memory pages are by default marked ALL to indicate they are accessible by all CPUs and DMA-capable devices.

When PAL execution is started using SLAUNCH, the memory controller updates its access-control table so each page allocated to the PAL (specified by the list of memory pages in the SECB) is accessible to only the CPU executing the PAL. When the PAL is subsequently suspended, the state of its memory pages transitions to NONE, indicating nothing currently executing on the platform is allowed to read or write to those pages. Note that the memory allocated to a PAL includes space for data and is a superset of the pages containing the PAL binary. Since this data is ephemeral, the PAL must take steps (such as using the TPM’s secure storage facilities) to ensure its data persists across reboots.²²

When PAL execution terminates, a well-behaved PAL calls SFREE, which clears the memory allocated to the PAL, as well as any microarchitectural state that might contain sensitive data, and marks the memory pages with ALL. Likewise, the untrusted OS may kill a PAL via SKILL, clearing the PAL’s state before freeing the associated memory pages.

Also recommended is a PAL preemption timer in the CPU that can be configured by the untrusted OS. When the timer expires or a PAL voluntarily yields (via a new CPU instruction, secure yield, or SYIELD), the PAL’s CPU state should automatically and securely be written to its SECB by hardware, all sensitive microarchitectural state should be cleared, and control should be transferred to an appropriate handler in the untrusted OS.

The untrusted OS can resume a PAL by executing SLAUNCH on the desired CPU, parameterized with the physical address of the PAL’s SECB. The PAL’s measured flag indicates to the CPU that the PAL has been measured and is only being resumed, not launched for the first time. During PAL resumption, the SLAUNCH instruction signals the memory controller that the PAL’s state should be accessible to the CPU on which the PAL is executing. Note that the PAL may execute on a different CPU each time it resumes. To enable multicore PALs, if a CPU attempts to resume an already executing PAL, that CPU will be added to the pool of CPUs available to the PAL.

The recommendations outlined here eliminate use of TPM operations during context switches and require the TPM to measure the PAL only once (instead of on every context switch). An implementation of these recommendations would be expected to achieve PAL context switch times on the order of those possible using hardware virtualization support, or 0.6μs on current hardware. This performance reduces the overhead of context switches by orders of magnitude, making switching in and out of a PAL much more practical.

Flicker thus provides a solid foundation for constructing secure systems that operate in conjunction with standard software; developers of security-sensitive code modules need to trust only their own code, along with as few as 250 lines of Flicker code, to protect the secrecy and integrity of the code’s execution. Flicker guarantees these properties, even if the BIOS, OS, and DMA-enabled devices are all malicious. The current implementation (http://flickertcb.sourceforge.net/) offers reasonable performance for many applications; the hardware recommendations here would enable many more.

Leveraging Secure Code Execution

If secure code execution can be provided on endhosts, the next frontier would be to examine how such trust can be extended into the network to improve the performance and efficiency of network applications. That is, if endhosts (or at least portions of each endhost) can be trusted, then network infrastructure no longer must arduously and imprecisely reconstruct data already known by the endhosts. For instance, suppose a mail server wants to improve the accuracy of its spam identification using host-based information. A 2009 study found the average and standard deviation of the size of email messages sent in the past 24 hours are two of the best indicators of whether a given email message is spam or legitimate.⁹ This data is easy for an endhost to collect but difficult for a single mail recipient to obtain.

As an initial exploration of how endhost hardware security features can be used to improve the network, a general architecture called Assayer was designed for securely conveying endhost information to network elements.^21,24 While Assayer may not represent the optimal way to convey this information, it is a valuable first step for highlighting the various issues. For example, is it possible to provide useful host-based information while also protecting user privacy? Which cryptographic primitives are needed to verify this information in a secure and efficient manner? Initial findings suggest improved endhost security can improve the security and efficiency of the network while reducing the complexity of in-network elements.^21,24

Naively, hardware-based attestation, as discussed earlier, might be used for each piece of information sent to the network. Also returning to an earlier example, a mail server might ask each client to include an attestation in every email message. The mail server’s spam filter could verify the attestation, then incorporate the host-provided information into its classification algorithm. Any legacy traffic arriving without an attestation could simply be processed by the existing algorithms. Unfortunately, checking attestations is time-consuming and requires interaction with the client machine. Even if such checks are feasible for an email filter, they would be unacceptable for other applications (such as distributed denial-of-service, or DDoS, mitigation) requiring per-packet checks at line rates. Likewise, labeling outbound traffic with mandatory access control labels, as proposed by McCune et al.¹⁵ works well in tightly controlled enterprise networks but is unsuited to the heterogeneity of the Internet.

So how can the average case be made fast and noninteractive within the diversity of the Internet? The natural approach is to cryptographically extend the trust established by a single hardware-based attestation over multiple outbound messages. The cost of the initial verification is thus amortized over subsequent messages. To achieve this amortization, the Assayer architecture employs two distinct phases: infrequent setup in which the relying party (such as the mail server) establishes trust in the client and more frequent usage in which the client generates authenticated annotations on outbound messages (see Figure 4).

The relying party delegates the task of inspecting clients to one or more off-path verifier machines. Every T days, the client convinces a verifier it has securely installed a trustworthy code module that keeps track of network-relevant information (such as number of email messages recently sent and amount of bandwidth recently used).

The various code modules execute atop a minimal hypervisor that implements Flicker-like functionality with performance closer to what Flicker would provide with the hardware changes recommended earlier. To protect user privacy, code modules lack visibility into the client machine’s software state; for example, a client module for Web browsing cannot determine which Web browser the client is using. Assayer instead employs application-specific incentives (such as giving priority to traffic carrying annotations) to convince the commodity software to submit outbound traffic to the Assayer modules on the client machine.

Having established the trustworthiness of the client, the verifier issues a limited-duration sender token that is bound to the client’s code module. During the usage phase, client code submits outbound messages to an Assayer code module that uses the sender token to authenticate the message annotations it generates. These annotations are then checked by one or more fast-path filter middleboxes that verify the annotations and react accordingly. For instance, a relying party trying to identify spam might feed the authenticated information from the filter into its existing spam classification algorithms. Alternatively, a Web service might contract with its ISP to deploy filters on its ingress links to mitigate DDoS attacks by prioritizing legitimate traffic. If the traffic does not contain annotations, then the filter treats it as legacy traffic; for example, a DDoS filter would give annotated traffic priority over legacy traffic. Two protocols have been designed to instantiate the sender tokens and message annotations: one efficient symmetric-key-based and one less-efficient asymmetric-key-based offering additional security properties.^21,24

How can a user’s keystrokes be kept private if the operating system, the most privileged software on the computer, cannot be trusted to be free of vulnerabilities?

To evaluate the usefulness of trustworthy host-based information, the application of Assayer is considered in three case studies: spam identification, DDoS mitigation, and super-spreader worm detection. Assayer is well suited to combating spam and can mitigate many network-level DDoS attacks. Surprisingly, while it is technically feasible to use Assayer to combat super-spreader worms, which contact many hosts quickly, such use would face challenges when it comes to deployment incentives.

To better understand the performance implications of conveying host-based information to the network, a full Assayer prototype has been implemented, including multiple protocol implementations. The size of the protection layer on the client that protects code modules from the endhost (and vice versa) is minuscule (841 lines of code), and the individual modules are even smaller. The verifier prototype can sustain 3,300 client verifications per second and bursts of up to 5,700 clients per second. Generating and verifying annotations for outbound traffic requires only a few microseconds for Assayer’s most efficient scheme. Even on a gigabit link, Assayer can check the annotations with a reasonable throughput cost of 3.7%18.3%, depending on packet size.

Verifiable Remote Computation

With Flicker, a user’s computer is assumed to be physically secure. To generalize Flicker’s results, techniques are needed to establish trust in code execution when even the hardware is untrustworthy. This scenario is particularly compelling, as the growth of cloud computing and mobile devices contributes to the desire to outsource computing from a client device to an online service. In these applications, how can users be assured the secrecy of their data is protected? Equally important, how can users verify the results returned are correct, without redoing the computations?

While various forms of homomorphic encryption provide only data secrecy,⁷ the results of arbitrary tasks (abstracted as function evaluations) on a computational service (such as in the cloud) can be efficiently verified without trusting any hardware or software on the system.^6,21 This result contrasts with previous approaches that were inefficient or that could verify only the results of restricted function families.

To formalize secure computational outsourcing we introduce the notion of verifiable computing.^6,21 Abstractly, a client wishes to evaluate a function F (such as sign a document or manipulate a photograph) over various dynamically selected inputs x₁,…,x_k on one or more untrusted computers, then verify that the values returned are indeed the result of applying F to the given inputs. The critical requirement is that the client’s effort to generate and verify work instances must be substantially less than the work required to perform the computation.

The definition of verifiable computation here is non-interactive: The client sends a single message to the worker and vice versa. The definition also uses an amortized notion of complexity for the client, which can perform expensive preprocessing but following this stage is required to run efficiently.

Drawing on techniques from secure multiparty computation, the first protocol for verifiable computing, presented here, provably provides computational integrity for work done by an untrusted party; it also provides provable secrecy for the computation’s inputs and outputs.^6,21 This feature is bundled into the protocol at no additional computational cost. Such privacy is critical for many real-world outsourcing scenarios in which a function is computed over highly sensitive data (such as medical records and trade secrets).

Moreover, the protocol provides asymptotically optimal performance (amortized over multiple inputs). Specifically, it requires a one-time preprocessing stage that takes O(|C|) time, where C is the smallest known Boolean circuit computing F. For each work instance, the client performs O(|m|) work to prepare an m-bit input, the worker performs O(|C|) work to compute the results, and the client performs O(|n|) work to verify the n-bit result.

Developers of security-sensitive code modules need to trust only their own code, along with as few as 250 lines of Flicker code, to protect the secrecy and integrity of the code’s execution.

This result shows arbitrary computations can be outsourced to untrusted workers, preserve the secrecy of the data, and efficiently verify the computations were done correctly. Verifiable computing could thus be used to, say, outsource work to a cloud-based service or within a distributed computing project like Folding@home,¹⁹ which outsources protein-folding simulations to millions of Internet users worldwide. To prevent workers from cheating, these projects often assign the same work unit to multiple clients, then compare results; verifiable computing would eliminate these redundant computations and provide strong cryptographic protections against colluding workers, though some subtleties are involved when workers are caught cheating.^6,21 The protocol is based on the crucial (and somewhat surprising) observation that Yao’s Garbled Circuit protocol,³¹ in addition to providing secure two-party computation, provides a “one-time” verifiable computation. That is, Yao’s Construction can be adapted to allow a client to outsource the computation of a function on a single input.

In the preprocessing stage of the outsourcing protocol, the client garbles the circuit C according to Yao’s Construction. Then, in the “input preparation” stage, the client reveals the random labels associated with the input bits of x in the garbling. These labels allow the worker to compute the random labels associated with the output bits, and from them the client reconstructs F(x). If the bit labels are sufficiently long and random, it is improbable that the worker can guess those for an incorrect output; therefore the client is assured the response, F(x), is correct.

Unfortunately, reusing the circuit for a second input x’ is insecure, since once the output labels of F(x) are revealed, nothing stops the worker from presenting the same labels as correct for F(x’). Creating a new garbled circuit requires as much work as if the client computed the function itself, so on its own, Yao’s circuits do not provide an efficient method for outsourcing computation.

The second crucial idea is to combine Yao’s Garbled Circuit with a fully homomorphic encryption system (such as Gentry’s⁷) to be able to reuse the garbled circuit safely for multiple inputs. More specifically, instead of revealing the labels associated with the bits of input x, the client encrypts those labels under the public key of a fully homomorphic scheme. A new public key is generated for every input in order to prevent information from one execution from being useful for later executions. The worker then uses the homomorphic property to compute an encryption of the output labels and provide them to the client, which decrypts them and reconstructs F(x).

Even without secure hardware, these results demonstrate a user’s trust in one device can be leveraged to verify (and hence trust) the results of computations performed by any number of remote, untrusted commodity computers.

Conclusion

Motivated by the trend toward entrusting sensitive data and services to insecure computers, this article has described techniques to enable users to extend their trust in one device to other devices or services. Starting from trust in a simple USB device, users can verify the reports from secure hardware on their computer, then securely execute code on that computer, despite the presence of a mountain of untrustworthy code, while the untrusted code continues to enjoy the performance and features it does already. Efficiently extending the trust in this secure environment into the network improves the security of a range of network protocols. Finally, the user’s trust is extended to the verification of computations performed by remote computers, even though they may be arbitrarily malicious. Many people using computers and online services today only imagine their data and privacy are protected. In the long run, the ability to bootstrap trust in these computers and services will help replace this illusion with the actual foundation of security users expect and deserve.

Acknowledgments

This article is based on my Ph.D. dissertation,²¹ though much of the research it describes was conducted in collaboration with my advisor Adrian Perrig at Carnegie Mellon University, as well as with Hiroshi Isozaki, Jonathan McCune, Mike Reiter, and Arvind Seshadri;¹⁶¹⁸ Zongwei Zhou;²⁴ and Craig Gentry and Rosario Gennaro.⁶ I am extremely grateful for their contributions. This article also benefited from feedback from Jay Lorch, Jonathan McCune, Diana Parno, and Adrian Perrig.

Sidebar: Security Features in Commodity Computers

While other instantiations are possible, the following techniques are described in terms of features of trusted platform modules, or TPMs, and recent CPUs from AMD and intel:

Measurement. When TPM-equipped platforms first boot, platform hardware takes a measurement (SHA-1 hash) of the BIOS, recording it in one of the TPM’s Platform Configuration Registers (PCRs).^25,28 The BIOS is then responsible for measuring the next software component (such as the bootloader) and associated data files. The BIOS records the measurement in a PCR before executing the software. If each subsequent software component performs these steps—measure, record, execute—the TPM holds a set of measurements of all code executed on the platform.²⁵

Attestation. To securely convey measurements to an external verifier, the TPM creates attestations; with a verifier-supplied nonce, the TPM uses a private key never accessible outside the TPM to generate a TPM quote, or a digital signature over the nonce and the content of the PCRs. The nonce assures the verifier that the attestation is fresh and not copied from a previous boot cycle.

To ensure the attestation comes from a real hardware TPM, not a software emulation, the TPM includes an endorsement keypair and endorsement certificate for the public key from the platform’s manufacturer declaring it belongs to a real TPM. To preserve user privacy, the TPM uses attestation-identity keys (bound anonymously to the endorsement key) to sign attestations.

Secure storage. The TPM also includes a limited amount of nonvolatile RAM (NVRAM). Reading and writing to NVRAM can be restricted based on PCR content, so an NVRAM location can be made accessible only to a particular set of software. That software can use the protected NVRAM to store a symmetric key that can then be used to encrypt and integrity-protect bulk data.

Dynamic root of trust for measurement. To protect the launch of a virtual machine monitor (VMM), recent commodity CPUs from AMD and Intel extend the x86 instruction set to support a dynamic root of trust for measurement (DRTM) operation.^1,10 DRTM provides many of the security benefits of rebooting the computer (such as starting from a clean slate) while bypassing the overhead of a full reboot; that is, devices remain enabled, the BIOS and bootloader are not invoked, and memory contents remain intact. DRTM is implemented through a new SKINIT instruction on AMD CPUs or GETSEC[SENTER] on Intel CPUs, which can launch a VMM at an arbitrary time (hence the colloquialism late launch) with built-in protection against software-based attacks. When a DRTM is invoked, the CPU’s state is reset, and direct memory access (DMA) protections for a region of memory are enabled. The CPU hashes the content (such as data and executable code) of the memory region, extends the measurement into the TPM’s PCR 17 (and 18 on Intel), and begins executing the code.

Trust assumptions. Relying on a TPM’s guarantees involves trusting the TPM manufacturer, the platform vendor, and the PKI linking them to the TPM’s keys. The CPU, RAM, and chipset must also be trusted. The TPM is designed to withstand software-based attacks and limited hardware attacks (such as rebooting the machine) but is not expected to resist sophisticated physical attacks.

Figures

Figure 1. Cuckoo attack.

Figure 2. Timeline of steps needed to execute a PAL.

Figure 3. PAL life cycle.

Figure 4. Assayer components.

Submit an Article to CACM

CACM welcomes unsolicited submissions on topics of relevance and value to the computing community.

You Just Read

Trust Extension For Commodity Computers

View in the ACM Digital Library

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and full citation on the first page. Copyright for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or fee. Request permission to publish from permissions@acm.org or fax (212) 869-0481.

DOI

10.1145/2184319.2184339

June 2012 Issue

Published: June 1, 2012

Vol. 55 No. 6

Pages: 76-85

Table of Contents

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Explore More

BLOG@CACM Jul 26 2024

Establishing Standards for Embodied AI

Shaoshan Liu

Architecture and Hardware

vitruvian man on green binary code background, illustration

BLOG@CACM Jul 24 2024

A Pioneer in Using AI to Teach Reading

Jeremy Roschelle

Architecture and Hardware

BLOG@CACM Jul 23 2024

A Versal Story in the Era of Hardware AI: Why the Chinese Could Win

Aleksandr Romanov and Maksim Popov

Architecture and Hardware

worker amidst rows of circuit boards at Chinese factory

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More

Key Insights

Trust Extension

Bootstrapping Trust

Executing Code Securely

Leveraging Secure Code Execution

Verifiable Remote Computation

Conclusion

Acknowledgments

Sidebar: Security Features in Commodity Computers

Figures

Trust Extension For Commodity Computers

DOI

June 2012 Issue

Related Reading

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

Shape the Future of Computing

Communications of the ACM (CACM) is now a fully Open Access publication.