doc/developer/design/20220303_secrets.md
This document fleshes out the design for secrets, which securely store sensitive values.
Four new SQL statements will be introduced with the following grammar to create, alter, drop, and list secrets, respectively:
create_secret_stmt ::=
CREATE SECRET [IF NOT EXISTS] <name> AS <value>
alter_secret_stmt ::=
ALTER SECRET [IF EXISTS] <name> AS <value>
alter_secret_rename_stmt ::=
ALTER SECRET [IF EXISTS] <name> RENAME TO <name>
drop_secret_stmt ::=
DROP SECRET [IF EXISTS] <name> [{ RESTRICT | CASCADE }]
show_secrets_stmt: SHOW SECRETS [FROM <schema>] [{ LIKE 'pattern' | WHERE <expr> }]
<name> ::= <qualified-identifier>
<value> ::= <scalar-expression>
Initially, all of these statements will be available in experimental mode only.
The name of a secret is a qualified identifier. Secrets live in the standard database and schema hierarchy alongside tables, views, sources, etc.
The CREATE SECRET statement creates a new secret with the specified contents.
It returns an error if a secret with the specified name already exists unless
the IF NOT EXISTS clause is present. The <value> expression may not
reference any relations and must be implicitly castable to bytea.
The first form of the ALTER SECRET statement changes the contents of an
existing secret. The second form changes the name of an existing secret.
The DROP SECRET statement drops an existing secret. It returns an error
if a secret with the specified name does not exist unless the IF EXIST
clause is present. Dropping a secret that is in use by a CONNECTOR is not
permitted unless CASCADE is present.
A SQL transaction can usually contain only one secret DDL statement. As a
special case, if the secrets controller in use supports
multi-operation atomicity, a SQL transaction can contain any number of
non-renaming ALTER SECRET statements, but no other statements, in which case
the secrets will be updated atomically.
Note that while the catalog updates to a secret apply immediately, changes to the contents of the secret do not propagate to downstream consumers of the secret on any particular timeline.
The SHOW SECRETS statement lists the names of the secrets in the specified
schema, or the first schema in the search path if no schema is explicitly
specified, optionally filtered by the provided LIKE pattern or WHERE
expression. The filter clauses which work analogously to the same clauses in the
SHOW DATABASES statement.
Note that there is intentionally no way to view the contents of secrets.
A new mz_secrets table with the following structure will describe the
available secrets in the system:
| Field | Type | Meaning |
|---|---|---|
id | text | The ID of the secret. |
schema_id | bigint | The ID of the schema to which the secret belongs. |
name | text | The name of the secret. |
Note that, again, the contents of the secrets are not exposed.
There is presently no way to use a secret. The existing CREATE SOURCE and
CREATE SINK commands will be extended to support referencing secrets, but
that work is left to a future design document.
The CatalogItem enum will be extended with an additional Secret variant:
pub enum CatalogItem {
// ...
Secret,
}
Unlike the other catalog items, there is no data associated with a secret. The
name and ID of the secret are stored in the CatalogEntry struct, and the
actual data for the secret will be stored in a secret backend (see next
section). Therefore there is presently no additional metadata that needs to be
recorded about a secret.
When serializing a secret, its create_sql will be synthesized as:
CREATE SECRET <name> AS '******'
When deserializing a secret on bootstrap, the '******' will be ignored.
The Coordinator will host a SecretsController that securely handles the
storage of secrets. It will interact with the controller via the following
interface.
/// Securely stores secrets.
pub trait SecretsController {
/// Applies the specified secret operations in bulk.
///
/// Implementations must apply the operations atomically. If the method
/// returns `Ok(())`, then all operations have been applied successfully;
/// if the method returns `Err(())`, then none of the operations have been
/// applied.
///
/// Implementations are permitted to reject combinations of operations which
/// they cannot apply atomically.
fn apply(&mut self, ops: Vec<SecretOp>) -> Result<(), anyhow::Error>;
/// Returns the IDs of all known secrets.
fn list(&self) -> Result<Vec<GlobalId>, anyhow::Error>;
}
/// An operation on a [`SecretsController`].
pub enum SecretOp {
/// Create or update the contents of a secret.
Ensure {
/// The ID of the secret to create or update.
id: GlobalId,
/// The binary contents of the secret.
contents: Vec<u8>,
},
/// Delete a secret.
Delete {
/// The id of the secret to delete.
id: GlobalId
}
}
There will be two concrete implementations of the SecretsController:
The FilesystemSecretsController will store secrets on the local filesystem.
It will not support multi-operation atomicity, but will support atomically
updating or removing one secret at a time.
The KubernetesSecretsController will store secrets in Kubernetes. It
will support multi-operation atomicity.
All secrets controllers expose the contents of secrets to the STORAGE
and COMPUTE layer by way of a directory that contains one file per secret,
where the name of the file is the secret's ID and the contents of the file are
the secret's contents. For example, if IDs u12 and u36 are secrets, the
directory would look like this:
secrets/
u12
u36
This bears repeating: even the KubernetesSecretsController will expose the
contents of the secrets via the filesystem. The filesystem is the lingua
franca for exposing secrets to applications. Kubernetes supports exposing the
contents of secrets as files on the filesystem, as does HashiCorp Vault.
The STORAGE and COMPUTE layers will need to be taught to re-read the
contents of secrets that they consume from disk periodically, in case the
contents have bene updated by an ALTER SECRET. We may wish to build a helper
"secret consumer" library that uses filesystem events to notice when the secrets
change, rather than polling periodically.
Providing proper SQL transaction semantics requires a complicated integration between catalog transactions and secret controller transactions.
Let's consider each of the cases in turn:
The SQL transaction contains exactly one CREATE SECRET statement.
The coordinator calls SecretsController::apply with a single
SecretOp::Ensure. If the apply fails, the coordinator reports the SQL
transaction as failed. If the apply succeeds, the coordinator calls
Catalog::transact to create the secret object. If the catalog transaction
succeeds, the coordinator reports the SQL transaction as successful. If
the catalog transaction fails, the coordinator makes a best-effort attempt
to remove the secret by calling SecretsController::apply with a single
SecretOp::Delete, then reports the SQL transaction as failed.
The SQL transaction contains exactly one DROP SECRET statement.
The coordinator calls Catalog::transact to remove the secret from the
catalog. If the catalog transaction fails, the coordinator reports the SQL
transaction as failed. If the catalog transaction succeeds, the coordinator
calls SecretsController::apply with a single SecretOp::Delete to remove
the secret from disk. If the apply succeeds, the coordinator reports the SQL
transaction as successful. If the apply fails, the coordinator logs an
error, but still reports the SQL transaction as successful.
The SQL transaction contains one or more ALTER SECRET ... AS statements.
The coordinator calls SecretsController:apply with multiple
SecretOp::Ensures. If the apply fails, the coordinator reports the
SQL transaction as failed; if the apply succeeds, the coordinator reports
the SQL transaction as successful.
SQL transactions that contain any other combination of secret DDL statements are prohibited.
The above integration is designed so that failures result in orphaned secrets in the secrets controller; i.e., secrets that exist in the secrets controller but have no corresponding entry in the catalog. (The inverse would not be permissible: a secret that exists in the catalog must be known to the secrets controller.) The coordinator will periodically attempt to garbage collect orphaned secrets, either at startup or in a background task.
The ADAPTER layer must be configured with a SecretsController
implementation. The STORAGE and COMPUTE layers must be configured
with the path on the local filesystem in which the secrets controller has made
the contents of the secrets available.
That means that presently materialized will create a
FilesystemSecretsController that stores data in mzdata/secrets. For
simplicity, to start, this directory will not be configurable. This controller
will be provided to the Coordinator, while the path to mzdata/secrets will
be provided to the dataflow layer.
In the future, coordd will learn to instantiate a
KubernetesSecretsController, and dataflowd will learn to expect the secrets
to be mounted at a known location.
Consider the addition of a --secrets-dir command-line flag to control where
secrets are stored.
Consider the addition of a VaultSecretsController, if we need to use
HashiCorp Vault in Materialize Platform.