doc/wg/opentitan/case_studies/datacenter_security_model.md
This document describes the security model for a Root-of-Trust (RoT) chip as currently deployed for datacenter use cases. The deployment described represents how the current Google proprietary root-of-trust chip (publicly known as Titan) and its firmware work. The exact nature of the security model continues to evolve along with Google's production security requirements and the capabilities available from Root-of-Trust chips.
The primary purpose of the current integrations of the Titan RoT into server products is to maintain first-instruction boot integrity and to grant the machine a valid identity in Google's production environment (colloquially known as prod). Titan RoTs also mitigate DoS attacks by enforcing secure firmware updates.
This document concludes with some alternative concepts for firmware delivery considering Tock's process isolation model and Application ID features.
Google's Platforms team designs and contracts out the manufacture of the servers and peripherals (NICs, SSDs, custom accelerators) that comprise its production environment. Within Google, customers (ie: product teams such as websearch, gmail, cloud, etc) purchase compute resources to execute their jobs.
In order for a machine to run customer jobs, it must pass a series of health checks and acquire a cryptographic identity. A machine which is able to present (or wield) its identity can interact with other production services such as the cluster scheduler, storage services or other internal services (such as the web index, image recognition, etc). A machine which cannot present its identity is only permitted to interact with a limited set of services geared towards automated or manual repair of broken machines.
The Google Titan chip is a Google-designed Root-of-Trust chip. It features a 32-bit embedded ARM Cortex-M, internal flash and RAM and crypto acceleration hardware, including a SHA hashing block, a key derivation block and a bignum accelerator.
The Google Titan chip is integrated into server products such that it has control over the machine's reset process and boot firmware. Specifically, the Titan chip can both monitor and drive the application processor's reset signals and interposes on the SPI bus between the Application Processor (AP) and its EEPROM. This design is applied to simple servers with a CPU, to serveris with Baseboard Management Controllers (BMCs), and to peripherals like NICs and accelerators.
Titan integrations into server peripherals are very similar to the integration into a server product: Titan is given low-level control over the peripheral's reset signals and is positioned such that it has control over the peripheral's boot firmware. There are often customizations to the integration to meet certain requirements of the peripheral (such as boot timing), but for the purpose of this document, the integrations are basically the same.
Titan has 3 distinct code images. The first, called the ROM image, never changes. The second, called the bootloader, changes rarely. The third, called the application firmware, is a monolithic image including the kernel, hardware support code and application code. Generally, new application firmware is pushed to production every few months.
Google's production infrastructure makes the following assumptions:
Requirements:
The following sequence details the per-power-cycle machine lifecycle:
The following are notable features of the application firmware for the datacenter use case:
The migration to OpenTitan represents a radical change from how the current datacenter Titan firmware is developed and delivered. This change allows us to re-examine the assumptions and requirements of the current system and make improvements.
The current datacenter Titan application firmware is a monolithic software image. The implementation consists of three main components: a kernel & drivers component, a hardware integration component (referring to the specifics of Titan's control over the server system) and a cryptographic & identity services component. The boundaries between these components are somewhat blurry and there is no strong separation between the kernel and application components.
The components of the monolithic image are maintained by different teams within Google. The platforms team maintains the kernel, drivers and hardware integration components. The prod-identity team maintains the cryptographic & identity services component. As one might expect, having multiple teams contributing to a single monolithic firmware image has been a source of complexity that has occasionally led to confusion or delays.
The following sections of the document describe hypothetical use cases or hypothetical modifications to existing use cases. They assume OpenTitan is the Root-of-Trust and adopt OpenTitan terminology. It is assumed that the OpenTitan chip boots securely and configures the chip appropriately before booting the Tock kernel.
Tock provides a kernel and userspace boundary and process isolation. Tock also permits applications to be delivered and loaded separately from the kernel payload. It will be possible for the datacenter firmware to be delivered as individual components with different access controls on each component.
Google platforms team, as a <code><em>silicon_owner</em></code> (in OpenTitan terminology, <em>silicon_owner</em> is the purchaser of the OpenTitan chip or devices containing an OpenTitan chip), can sign and deliver the Tock kernel for datacenter integrations. The platforms team can also sign and deliver the hardware integration application which provides the lowest level of machine control services for an OpenTitan integration.
Google's production identity team can sign and deliver the cryptographic services application which will be responsible for guarding and maintaining the machine or subsystem identity and attesting to the validity of its boot firmware.
These applications may be signed by distinct keys, allowing independent operational security and release processes for the different applications. The application signing keys are distinct from the kernel signing keys and each of these keys may have different key storage requirements. For example, the kernel key may be considered a high-value resource restricted to an offline Hardware Security Module (HSM) whereas application signing keys could be considered safe enough to be stored in Google's online key management service (because they can be rotated as part of a new kernel release). This separation of authority and permissions can allow the platforms and prod-identity teams to independently develop their respective applications as well as sign and deliver those applications without need of an offline ceremony (whereas, kernel upgrades or deployment of new application signing keys would require an offline ceremony), thereby lowering the cost of feature additions and other maintenance work.
The separation of kernel versus application signing authority also allows for new modes of service delivery for OpenTitan-enabled platforms.
Examples:
The exact method of securing the keys is beyond the scope of this document. No engineer has unilateral access; access is granted via an M-of-N quorum authenticated through multiple factors. ↩
There are some variations or exceptions. In all cases, Titan has control over the machine or peripheral's boot process. ↩
TL;DR: the flash part is twice the required size and Titan can choose which half to show to the machine. ↩
In the event that an attacker could force-downgrade Titan to the known-bad firmware, attempts to access or wield the newer keys will be denied by the hardware. ↩