doc/security/CVE-2021-20288.rst
.. _CVE-2021-20288:
NIST information page <https://nvd.nist.gov/vuln/detail/CVE-2021-20288>_Ceph was not ensuring that reconnecting/renewing clients were presenting an existing ticket when reclaiming their global_id value. An attacker that was able to authenticate could claim a global_id in use by a different client and potentially disrupt other cluster services.
Each authenticated client or daemon in Ceph is assigned a numeric global_id identifier. That value is assumed to be unique across the cluster. When clients reconnect to the monitor (e.g., due to a network disconnection) or renew their ticket, they are supposed to present their old ticket to prove prior possession of their global_id so that it can be reclaimed and thus remain constant over the lifetime of that client instance.
Ceph was not correctly checking that the old ticket was valid, allowing an arbitrary global_id to be reclaimed, even if it was in use by another active client in the system.
Any potential attacker must:
Confidentiality Impact
None
Integrity Impact
Partial. An attacker could potentially exploit assumptions around global_id uniqueness to disrupt other clients' access or disrupt Ceph daemons.
Availability Impact
High. An attacker could potentially exploit assumptions around global_id uniqueness to disrupt other clients' access or disrupt Ceph daemons.
Access Complexity
High. The client must make use of modified client code in order to exploit specific assumptions in the behavior of other Ceph daemons.
Authentication
Yes. The attacker must also be authenticated and have access to the same services as a client it is wishing to impersonate or disrupt.
Gained Access
Partial. An attacker can partially impersonate another client.
All prior versions of Ceph monitors fail to ensure that global_id reclaim attempts are authentic.
In addition, all user-space daemons and clients starting from Luminous v12.2.0 were failing to securely reclaim their global_id following commit a2eb6ae3fb57 ("mon/monclient: hunt for multiple monitor in parallel").
All versions of the Linux kernel client properly authenticate.
#. Patched monitors now properly require that clients securely reclaim
their global_id when the auth_allow_insecure_global_id_reclaim
is false. Initially, by default, this option is set to
true so that existing clients can continue to function without
disruption until all clients have been upgraded. When this option
is set to false, then an unpatched client will not be able to reconnect
to the cluster after an intermittent network disruption breaking
its connect to a monitor, or be able to renew its authentication
ticket when it times out (by default, after 72 hours).
Patched monitors raise the AUTH_INSECURE_GLOBAL_ID_RECLAIM_ALLOWED
health alert if auth_allow_insecure_global_id_reclaim is enabled.
This health alert can be muted with::
ceph health mute AUTH_INSECURE_GLOBAL_ID_RECLAIM_ALLOWED 1w
Although it is not recommended, the alert can also be disabled with::
ceph config set mon mon_warn_on_insecure_global_id_reclaim_allowed false
#. Patched monitors can disconnect new clients right after they have
authenticated (forcing them to reconnect and reclaim) in order to
determine whether they securely reclaim global_ids. This allows
the cluster and users to discover quickly whether clients would be
affected by requiring secure global_id reclaim: most clients will
report an authentication error immediately. This behavior can be
disabled by setting auth_expose_insecure_global_id_reclaim to
false::
ceph config set mon auth_expose_insecure_global_id_reclaim false
#. Patched monitors will raise the AUTH_INSECURE_GLOBAL_ID_RECLAIM health
alert for any clients or daemons that are not securely reclaiming their
global_id. These clients should be upgraded before disabling the
auth_allow_insecure_global_id_reclaim option to avoid disrupting
client access.
By default (if auth_expose_insecure_global_id_reclaim has not
been disabled), clients' failure to securely reclaim global_id will
immediately be exposed and raise this health alert.
However, if auth_expose_insecure_global_id_reclaim has been
disabled, this alert will not be triggered for a client until it is
forced to reconnect to a monitor (e.g., due to a network disruption)
or the client renews its authentication ticket (by default, after
72 hours).
#. The default time-to-live (TTL) for authentication tickets has been increased from 12 hours to 72 hours. Because we previously were not ensuring that a client's prior ticket was valid when reclaiming their global_id, a client could tolerate a network outage that lasted longer than the ticket TTL and still reclaim its global_id. Once the cluster starts requiring secure global_id reclaim, a client that is disconnected for longer than the TTL may fail to reclaim its global_id, fail to reauthenticate, and be unable to continue communicating with the cluster until it is restarted. The default TTL was increased to minimize the impact of this change on users.
#. Users should upgrade to a patched version of Ceph at their earliest convenience.
#. Users should upgrade any unpatched clients at their earliest
convenience. By default, these clients can be easily identified by
checking the ceph health detail output for the
AUTH_INSECURE_GLOBAL_ID_RECLAIM alert.
#. If all clients cannot be upgraded immediately, the health alerts can be temporarily muted with::
ceph health mute AUTH_INSECURE_GLOBAL_ID_RECLAIM 1w # 1 week
ceph health mute AUTH_INSECURE_GLOBAL_ID_RECLAIM_ALLOWED 1w # 1 week
#. After all clients have been updated and the AUTH_INSECURE_GLOBAL_ID_RECLAIM
alert is no longer present, the cluster should be set to prevent insecure
global_id reclaim with::
ceph config set mon auth_allow_insecure_global_id_reclaim false