The boot sequence for a machine typically starts with the BMC (baseboard management controller) or PCH (platform controller hub). In the case of an Intel CPU, the Intel Management Engine runs in the PCH and starts before the CPU. After configuring the machine’s hardware, the BMC (or PCH, depending on the system) allows the CPU to come out of reset. The CPU then loads the boot (unified extensible firmware interface, UEFI) firmware from the SPI (serial peripheral interface) flash. The boot firmware then accesses the boot sector on the machine’s persistent storage and loads the bootloader into the system memory. The boot firmware then passes execution control to the bootloader, which loads the initial OS image from storage into system memory and passes execution control to the operating system. For example, in popular Linux distros, GRUB (derived from Grand Unified Bootloader) acts as the bootloader and loads the operating system image for the machine.
This is much like a relay race where one team member passes a baton to another to win the race. In a relay race, you hopefully know the members of your team and trust them to do their part for the team to get to the finish line. With machines, this chain of trust is a bit more complex. How can we verify that each step in the boot sequence is running software we know is secure? If our hardware or software has been compromised at any point in the boot sequence then the attacker has the most privilege on our system and likely can do anything they want.
The goal of a hardware root of trust is to verify that the software installed in every component of the hardware is the software that was intended. This way you can verify and know without a doubt whether a machine’s hardware or software has been hacked or overwritten by an adversary. In a world of modchips,16 supply chain attacks, evil maid attacks,7 cloud provider vulnerabilities in hardware components,2 and other attack vectors it has become more and more necessary to ensure hardware and software integrity. This is an introduction to a complicated topic; some sections just touch the surface, but the intention is to provide a full picture of the world of secure booting mechanisms.
Trusted platform module (TPM). A TPM is a standard for a dedicated microchip designed to secure hardware through integrated cryptographic keys. TPM was standardized by the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC) in 2009 as ISO/IEC 11889.9 The TPM is typically installed on the motherboard of a computer, and it communicates with the remainder of the system by using a hardware bus.
A TPM has the following features:18
- A random number generator;
- A way to generate cryptographic keys;
- Integrity measurement;
- Attestation;
- Wrapping/binding keys; and,
- Sealing/unsealing keys.
Integrity measurement. Measurement is the process by which information about the software, hardware, and configuration of a system is collected and digested. At load-time, the TPM uses a hash function to fingerprint an executable and its configuration. These hash values are used in attestation to reliably establish code identity to remote or local verifiers. The hash values can also be used in conjunction with the sealed storage feature. A secret can be sealed along with a list of hash values of programs that are allowed to unseal the secret. This allows the creation of data files that can only be opened by specific applications.
Attestation reports the state of the hardware and software configuration. The integrity measurement software in charge of creating the hash key used for the configuration data determines the extent of the summary. The goal of attestation is to prove to a third party that your operating system and application software are intact and trustworthy. The verifier trusts that attestation data is accurate because it is signed by a TPM whose key is certified by the certificate authority (CA). TPMs are manufactured with a public/private key pair built into the hardware, known as the endorsement key. The endorsement key is unique to a specific TPM and is signed by a trusted CA. The trust for attestation data is dependent on the trust for the CA that originally signed the endorsement key.
Attestation can reliably tell a verifier what applications are running on a client machine, but the verifier must still make the judgment about whether each given piece of software is trustworthy.
Wrapping/binding a key. Machines that use a TPM can create cryptographic keys and encrypt them so that they can only be decrypted by the TPM. This process, known as wrapping or binding a key, can help protect the key from disclosure. Each TPM has a master wrapping key, also known as the storage root key, which is stored within the TPM itself. The private portion of a storage root key, or endorsement key, that is created in a TPM is never exposed to any other device, process, application, software, or user.
Sealing/unsealing a key. Machines that use a TPM can also create a key that has not only been wrapped but is also tied to certain platform measurements. This type of key can be unwrapped only when those platform measurements have the same values that they had when the key was created. This process is known as sealing the key to the TPM. Decrypting the key is called unsealing. The TPM can also seal and unseal data that is generated outside the TPM. With this sealed key and software, you can lock data until specific hardware or software conditions are met.
Custom silicon. It is important to note the limitations of TPMs and some solutions to those. TPMs can attest the firmware running on a machine is the firmware we want to run, but there is no mechanism in a TPM for verifying that the code is secure. It is up to the user to verify the security of the firmware and to ensure it does not contain any backdoors, which is impossible if the code is proprietary.
When booting a machine securely, you want the first instruction run on that machine to be the one you would expect to run. A TPM is insufficient for verifying the actual bits of code to be executed are secure, so a few companies created their own silicon for expanding on the security of TPMs.
Google’s Titan
For Google’s infrastructure as well as Chromebooks, Google expanded on the security of the TPM with their own chip Titan. Google open sourced5 a version of Titan14 (with both specs and code), which is under active development, in October of 2019. In creating Titan, Google added two new features that did not exist with TPMs: first-instruction integrity and remediation.
First-instruction integrity allows verification of the earliest code that runs on each machine’s startup cycle. Titan observes every byte of boot firmware by interposing itself between the boot firmware flash (BIOS) of the BMC (or PCH) and the main CPU via the SPI bus. Therefore, the boot sequence for a machine with a Titan chip is different from a normal boot sequence.
The boot sequence with Titan is as follows:
- Titan holds the machine in reset.
- Titan’s application processor executes code from its embedded read-only memory (boot ROM).
- Titan runs a memory built-in self-test to ensure all memory (including ROM) has not been tampered with.
- Titan verifies its own firmware using public key cryptography and mixes the identity of this verified code into Titan’s key hierarchy.
- Titan loads the verified firmware.
- Titan verifies the host’s boot firmware flash (BIOS/UEFI).
- Titan signals readiness to release the rest of the machine from reset.
- The CPU loads the basic firmware (BIOS/UEFI) from the boot firmware flash, which performs further hardware/software configuration.
- The rest of the standard boot sequence continues.
Holding the machine in reset while Titan cryptographically verifies the boot firmware, Titan enables the verification of the first instruction. Titan knows what boot firmware and OS booted on our machine from the very first instruction. Titan even knows which microcode patches may have been fetched before the boot firmware’s first instruction.
Remediation. What happens when we need to patch bugs in Titan’s firmware? This is where remediation comes into play. In the event of patching bugs in the Titan firmware, trust can be reestablished through remediation. Remediation is based on a strong cryptographic identity. To provide a strong identity, the Titan chip manufacturing process generates unique keying material for each chip. The Titan-based identity system not only verifies the provenance of the chips creating the certificate signing requests (CSRs), but also verifies the firmware running on the chips, as the code identity of the firmware is hashed into the on-chip key hierarchy. This property allows Google to fix bugs in Titan firmware and issue certificates that can only be wielded by patched Titan chips.
The Titan-based identity system enables back-end systems to securely provision secrets and keys to individual Titan-enabled machines, or jobs running on those machines. Titan is also able to chain and sign critical audit logs, making those logs tamper evident. This ensures audit logs cannot be altered or deleted without detection, even by insiders with root access to the relevant machine.
Microsoft’s Cerberus
Microsoft open sourced11 the specs for their chip, Cerberus. (At the time of writing this article, only the specs have been open sourced). Like Titan, Cerberus interposes on the SPI bus where firmware is stored for the CPU. This allows Cerberus to continuously measure and attest these accesses to ensure firmware integrity and thereby protect against unauthorized access and malicious updates.
Apple’s T2
Apple is a poster child for secure booting devices. Most people remember when the FBI wanted a backdoor into iPhones and Tim Cook refused.10 Between Macs, iPhones, and Chromebooks, an industry standard for products includes security by default.
For Apple machines, secure boot is done with their T2 chip,1 Ivan Krstić of Apple gave a talk at Black Hat12 detailing the boot process for a Mac with Apple’s T2 chip. Unlike Titan and Cerberus which interpose on the SPI flash, T2 provides the firmware and boots the CPU over an eSPI (Enhanced Serial Peripheral Interface) bus.
Apple’s requirements for T2 were the following:
- Signature verification of complete boot chain.
- System Software Authorization (server-side downgrade protection).
- Authorization “personalized” for the requesting device (not portable).
- User authentication required to downgrade secure boot policy.
- Secure boot policy protected against physical tamper.
- System can always be restored to known-good state.
The boot sequence for a machine using a T2 chip is as follows:
- The machine is powered on.
- T2 ROM is loaded and executed.
- T2 ROM passes off to iBoot, the bootloader.
- The bootloader executes the bridgeOS kernel, the kernel for the T2 chip.
- The bridgeOS kernel passes off to the UEFI firmware for the T2 chip.
- The T2 chip then allows the CPU out of reset and loads the UEFI firmware for the CPU.
- The UEFI firmware for the CPU then loads macOS booter, the bootloader.
- The macOS booter then executes the macOS kernel.
One important design element of the T2 chip is how Apple verifies the version of MacOS running on a computer. T2 verifies the hash of MacOS against a list of approved hashes for running. Apple is in a unique position to have this level of verification since they own the entire stack and prevent users from running any other OS on their devices. If you would like to go deeper on the internals of the T2 chip, I would suggest reading the slides for Ivan Krstić’s Black Hat talk.12
Platform firmware resiliency. Chip vendors are investing in platform firmware resiliency (PFR) based on National Institute of Standards and Technology (NIST) guidelines.15 These guidelines focus on ensuring the firmware remains in a state of integrity, detecting when it has been corrupted, and recovering the pieces of firmware back to a state of integrity.
PFR addresses the vulnerability of enterprise servers that contain multiple processing components, each having its own firmware. This firmware can be attacked by hackers who may surreptitiously install malicious code in a component’s flash memory that hides from standard system-level detection methods and leaves the system permanently compromised.
The PFR specification is based on the following principles:
- Protection: Ensures firmware code and critical data remain in a state of integrity and are protected from corruption, such as the process for ensuring the authenticity and integrity of firmware updates.
- Detection: Detect when firmware code and critical data have been corrupted.
- Recovery: Restore firmware code and critical data to a state of integrity in the event that any such firmware code or critical data are detected to have been corrupted, or when forced to recover through an authorized mechanism.
Vendors have been building features around the NIST guidelines for PFR. Intel8 and Lattice Semiconductors13 each have a product.
UEFI Secure Boot21 is designed to ensure that EFI binaries that are executed during boot are verified, either through a checksum or a valid signature, backed by a locally trusted certificate. When a machine using UEFI Secure Boot powers on, the UEFI firmware validates each EFI binary either has a valid signature or the binary’s checksum is present on an allowed list. Counter to the allow list is a deny list that is also checked to ensure no binary’s checksum or signature exists on it. Users can configure the list of trusted certificates and checksums as EFI variables. These variables get stored in non-volatile memory used by the UEFI firmware environment to store settings and configuration data.
Attestation can reliably tell a verifier what applications are running on a client machine, but the verifier must still make the judgment about whether each given piece of software is trustworthy.
The UEFI kernel is extremely complex and has millions of lines of code. It consists of boot services and runtime services. The specification19 is quite verbose and complex. The UEFI kernel is a common vector for many vulnerabilities since it has some of the same proprietary code used on many different platforms. The UEFI kernel is shared on multiple platforms, making it a great target for attackers. Additionally, since only UEFI can rewrite itself, exploits can be made persistent. This is because UEFI lives in the processor’s firmware, typically stored in the SPI flash. Even if a user were to wipe the entire operating system or install a new hard drive, an attack would persist in the SPI flash.
Intel’s Boot Guard. Boot Guard is Intel’s solution to verify the firmware signatures for the processor. Boot Guard works by flashing the public key of the BIOS signature into the field programmable fuses (FPFs), a one-time programmable memory inside Intel Management Engine (ME), during the manufacturing process. The machine then has the public key of the BIOS and it can verify the correct signature during every subsequent boot. However, once Boot Guard is enabled by the manufacturer, it cannot be disabled.
The problem with Boot Guard is that only Intel or the manufacturer has the keys for signing firmware packages. This makes it impossible to use coreboot, LinuxBoot, or any other equivalents as firmware on those processors. If you tried, the firmware would not be signed with the correct key, and the failed attempt to boot would brick the board.
Matthew Garrett wrote a great post about Boot Guard that highlights the importance of user freedom when it comes to firmware.4 The owner of the hardware has a right to own the firmware as well. Boot Guard prevents this. In the security keynote at the 2018 Open Source Firmware Conference,6 Trammel Hudson described how he found a vulnerability to bypass Boot Guard, CVE-2018-12169.3 The bug20 allows an attacker to use unsigned firmware and boot normally, completely negating the purpose of Boot Guard. Because Boot Guard is tied to the CPU, it does not have the control that a custom silicon hardware root of trust has when it comes to other firmware for components in the system.
System transparency. Mullvad wrote up a paper on what they call system transparency (ST),17 which is aimed at facilitating trust for the components of a system by giving every server a unique identity, limiting the attack surface and mutable state in the firmware and allowing both owners and users to verify all software running on a platform starting from the first instruction executed after power on.
ST accomplishes these goals by following seven principles:
- A key ceremony of each server to bind the server’s unique identity with a difficult-to-forge physical artifact like a video.
- Physical write-protection of the firmware. Writable code sections are a mutable state, so ST limits the possible changes to this critical piece of code. Read-only code also serves as a root of trust for all other software-enforced security mechanisms.
- Tamper detection. Attackers cannot be stopped from changing the content of the firmware flash by replacing the actual chip. So, violations of the physical integrity of the server hardware need to be detectable.
- Measured boot. ST has the goal to give all parties insight into what code was run as part of the system boot. A measured boot in combination with remote attestation allows third parties to acquire a cryptographic log of the boot.
- Reproducible builds. Ensures that if a binary artifact is built once, it can be built again and again and produce the same artifact. This establishes a verifiable link between the human-readable code and the binary that was attested using the measured boot mechanism.
- Immutable infrastructure. System transparency only works when changes to the operating system are limited. Allowing somebody to log into the system and make arbitrary changes invalidates all guarantees of a measured boot.
- Binary transparency log. All firmware and OS images that can be booted on a system are signed by the system’s owner and are inserted into a public, append-only log. Users of the system can monitor this log for new entries and catch malicious system owners booting backdoored firmware on new servers.
The Importance of Open Source Firmware
It is clear that securing the boot process with a hardware root of trust has various implementations throughout the industry. Without open source firmware, the proprietary bits of the boot process are still lacking the visibility and audibility to ensure our software is secure. Even if we can verify through a hardware root of trust that the hash of proprietary firmware is the hash we know to be true, we need visibility to the source code for the firmware for assurance it does not contain any backdoors. Through this visibility we can also gain ease of use in debugging and fixing problems without relying on a vendor.
Firmware is scattered throughout motherboards of machines and their components; it is in the CPU (central processing unit), NIC (network interface controller), SSD (solid-state drive), HDD (hard-disk drive), GPU (graphics processing unit), fans, and more. To ensure the integrity of a machine, all these components must be verified. In the future, these custom silicon chips will interpose not only on the SPI flash but also on every other device communicating with the BMC.
If you would like to help with the open source firmware movement, push back on your vendors and platforms you are using to make their firmware open source.
Acknowledgments
Thank you to Ivan Krstić, Matthew Garrett, Kai Michaelis, Fredrik Strömberg, and Trammell Hudson for their research and work in this area, which helped me to write this article.
Related articles
on queue.acm.org
Security for the Modern Age
Jessie Frazelle
https://queue.acm.org/detail.cfm?id=3301253
Simulators: Virtual Machines of the Past (and Future)
Bob Supnik
https://queue.acm.org/detail.cfm?id=1017002
Automating Software Failure Reporting
Brendan Murphy
https://queue.acm.org/detail.cfm?id=1036498
Join the Discussion (0)
Become a Member or Sign In to Post a Comment