docs/design/security/03-tls-key-material-reload.md
Allow using configurable timeout for caching keyStores and trustStores
in the TLS configuration (<ssl/>).
ssl config property for configuring timeout (duration) for caching keyStore and trustStore;keyFile, keyCertChainFile, and trustCertCollectionFile
in the OpenSSLEngineFactory.Hazelcast Enterprise allows using TLS protocol for data in transit protection. We want to enable key material (keyStores and trustStores) rotation without needing a Hazelcast instance restart.
The current process required for the key material update is described in the Updating Certificates in the Running Cluster section of the official documentation. It includes:
These steps might require to be repeated (1-3 times) - based on the certificates trust configuration,
New optional property, keyMaterialDuration, will be introduced into the <ssl>
properties config. The value will be a
duration expression in ISO 8601 format
as supported by Java
java.time.Duration.parse()
method.
The keyMaterialDuration property value is a string such as PnDTnHnMn.nS.
The Duration.parse() JavaDoc describes the format as:
The string starts with an optional sign, denoted by the ASCII negative or positive symbol. If negative, the whole period is negated. The ASCII letter "P" is next in upper or lower case. There are then four sections, each consisting of a number and a suffix. The sections have suffixes in ASCII of "D", "H", "M" and "S" for days, hours, minutes and seconds, accepted in upper or lower case. The suffixes must occur in order. The ASCII letter "T" must occur before the first occurrence, if any, of an hour, minute or second section. At least one of the four sections must be present, and if "T" is present there must be at least one section after the "T". The number part of each section must consist of one or more ASCII digits. The number may be prefixed by the ASCII negative or positive symbol. The number of days, hours and minutes must parse to an long. The number of seconds must parse to an long with optional fraction. The decimal point may be either a dot or a comma. The fractional part may have from zero to 9 digits.
The leading plus/minus sign, and negative values for other units are not part of the ISO-8601 standard.
A positive keyMaterialDuration value (e.g. PT1H) says for how long should be the key material cached before it's newly loaded.
A negative keyMaterialDuration value (e.g. PT-1s) means the key material will be cached indefinitely.
A zero-value duration expression (e.g. PT0s) means the key material will not be cached and will always be newly loaded for each TLS-protected connection.
The key material is cached indefinitely if the new property is not specified (default value). We keep the behavior backward-compatible.
If the value has a wrong format, the Hazelcast instance won't start.
The OpenSSLEngineFactory doesn't cache the key material when native key
and certificate files are used (keyFile, keyCertChainFile,
and trustCertCollectionFile). This behavior won't change.
The following configuration example will cache the key material for 10 minutes before the new reload.
<network>
<ssl enabled="true">
<properties>
<property name="keyMaterialDuration">PT10M</property>
<property name="keyStore">${keyStore.path}</property>
<property name="keyStorePassword">${keyStore.password}</property>
<property name="trustStore">${trustStore.path}</property>
<property name="trustStorePassword">${trustStore.password}</property>
<property name="protocol">TLSv1.3</property>
<property name="mutualAuthentication">REQUIRED</property>
</properties>
</ssl>
</network>
The process of replacing key material described in the Motivation section will be simplified to the following steps
when a non-negative keyMaterialDuration value is used.
This is usually a case when self-signed certificates are used on the members.
Before we can deploy new member certificates, we have to update the list of trusted certificates on all members:
keyMaterialDuration parameter.Switch certificates/keys on all members:
At the latest, after the specified duration, all new connections will use the new key material.
When the mutual TLS authentication is enabled, and there is a key leakage, or the old certificates are not allowed to be used anymore for any reason, the trustStores have to be updated once more.
This point is not described in the documentation, but it worked in the same way as point 1. I.e. Prepare trustStores without old certificates and update trustStores on all members one by one.
Again this is simplified by setting a non-negative keyMaterialDuration.
At the latest, after the specified duration, new connections
with old certificates (used for mutual authentication) won't be allowed.
Another approach to deal with reloads would be reloading the material for every connection or introducing only the true/false flag
to enable/disable a non-expiring cache altogether. As these approaches could impact performance, we won't implement these alternatives.
The abstract class com.hazelcast.internal.nio.ssl.SSLEngineFactorySupport
in hazelcast-enterprise repository will be extended to support the new property.
We will also align the implementations (i.e. child classes).
Hazelcast sample full configuration will be extended. The property will be added to
hazelcast-full-example and hazelcast-client-full-example files.
New tests covering the new functionality will be added to the hazelcast-enterprise repository.