design/ceph/object/store.md
An object store is a collection of resources and services that work together to serve HTTP requests to PUT and GET objects. Rook will automate the configuration of the Ceph resources and services that are necessary to start and maintain a highly available, durable, and performant object store.
The Ceph object store supports S3 and Swift APIs and a multitude of features such as replication of object stores between different zones. The Rook object store is designed to support all of these features, though will take some time to implement them. We welcome contributions! In the meantime, features that are not yet implemented can be configured by using the Rook toolbox to run the radosgw-admin and other tools for advanced configuration.
A Rook storage cluster must be configured and running in Kubernetes. In this example, it is assumed the cluster is in the rook namespace.
When the storage admin is ready to create an object storage, the admin will specify his desired configuration settings in a yaml file such as the following object-store.yaml. This example is a simple object store with metadata that is replicated across different hosts, and the data is erasure coded across multiple devices in the cluster.
apiVersion: ceph.rook.io/v1alpha1
kind: CephObjectStore
metadata:
name: my-store
namespace: rook-ceph
spec:
metadataPool:
failureDomain: host
replicated:
size: 3
dataPool:
failureDomain: device
erasureCoded:
dataChunks: 6
codingChunks: 2
gateway:
port: 80
securePort: 443
instances: 3
hosting:
dnsNames:
- "my-ingress.mydomain.com"
- "rook-ceph-rgw-my-store.rook-ceph.svc"
Now create the object store.
kubectl create -f object-store.yaml
At this point the Rook operator recognizes that a new object store resource needs to be configured. The operator will create all of the resources to start the object store.
.rgw.root, my-store.rgw.control, my-store.rgw.meta, my-store.rgw.log, my-store.rgw.buckets.index)my-store.rgw.buckets.data)When the RGW pods start, the object store is ready to receive the http or https requests as configured.
The object store settings are exposed to Rook as a Custom Resource Definition (CRD). The CRD is the Kubernetes-native means by which the Rook operator can watch for new resources. The operator stays in a control loop to watch for a new object store, changes to an existing object store, or requests to delete an object store.
The pools are the backing data store for the object store and are created with specific names to be private to an object store. Pools can be configured with all of the settings that can be specified in the Pool CRD. The underlying schema for pools defined by a pool CRD is the same as the schema under the metadataPool and dataPool elements of the object store CRD. All metadata pools are created with the same settings, while the data pool can be created with independent settings. The metadata pools must use replication, while the data pool can use replication or erasure coding.
If preservePoolsOnDelete is set to 'true' the pools used to support the object store will remain when the object store will be deleted. This is a security measure to avoid accidental loss of data. It is set to 'false' by default. If not specified is also deemed as 'false'.
spec:
metadataPool:
failureDomain: host
replicated:
size: 3
dataPool:
failureDomain: device
erasureCoded:
dataChunks: 6
codingChunks: 2
preservePoolsOnDelete: true
If user want to use existing pools for metadata and data, the pools must be created before the object store is created. This will be useful if multiple objectstore can share same pools. The detail of pools need to shared in sharedPools settings in object-store CRD. Now the object stores can consume same pool isolated with different namespaces. Usually RGW server itself create different namespaces on the pools. User can create via Pool CRD, this is need to present before the object store is created. Similar to preservePoolsOnDelete setting, preserveRadosNamespaceDataOnDelete is used to preserve the data in the rados namespace when the object store is deleted. It is set to 'false' by default.
spec:
sharedPools:
metadataPoolName: rgw-meta-pool
dataPoolName: rgw-data-pool
preserveRadosNamespaceDataOnDelete: true
To create the pools that will be shared by multiple object stores, create the following CephBlockPool CRs:
apiVersion: ceph.rook.io/v1
kind: CephBlockPool
metadata:
name: rgw-meta-pool
spec:
failureDomain: host
replicated:
size: 3
parameters:
pg_num: 8
application: rgw
---
apiVersion: ceph.rook.io/v1
kind: CephBlockPool
metadata:
name: rgw-data-pool
spec:
failureDomain: osd
erasureCoded:
dataChunks: 6
codingChunks: 2
application: rgw
---
apiVersion: ceph.rook.io/v1
kind: CephBlockPool
metadata:
name: .rgw.root
spec:
name: .rgw.root
failureDomain: host
replicated:
size: 3
parameters:
pg_num: 8
application: rgw
The pools for this configuration will be created as below:
# ceph osd pool ls
.rgw.root
rgw-meta-pool
rgw-data-pool
And the pool configuration in zone is as below:
# radosgw-admin zone get --rgw-zone=my-store
{
"id": "2220eb5f-2751-4a51-9c7d-da4ce1b0e4e1",
"name": "my-store",
"domain_root": "rgw-meta-pool:my-store.meta.root",
"control_pool": "rgw-meta-pool:my-store.control",
"gc_pool": "rgw-meta-pool:my-store.log.gc",
"lc_pool": "rgw-meta-pool:my-store.log.lc",
"log_pool": "rgw-meta-pool:my-store.log",
"intent_log_pool": "rgw-meta-pool:my-store.log.intent",
"usage_log_pool": "rgw-meta-pool:my-store.log.usage",
"roles_pool": "rgw-meta-pool:my-store.meta.roles",
"reshard_pool": "rgw-meta-pool:my-store.log.reshard",
"user_keys_pool": "rgw-meta-pool:my-store.meta.users.keys",
"user_email_pool": "rgw-meta-pool:my-store.meta.users.email",
"user_swift_pool": "rgw-meta-pool:my-store.meta.users.swift",
"user_uid_pool": "rgw-meta-pool:my-store.meta.users.uid",
"otp_pool": "rgw-meta-pool:my-store.otp",
"system_key": {
"access_key": "",
"secret_key": ""
},
"placement_pools": [
{
"key": "default-placement",
"val": {
"index_pool": "rgw-metadata-pool:my-store.buckets.index",
"storage_classes": {
"STANDARD": {
"data_pool": "rgw-data-pool:my-store.buckets.data"
}
},
"data_extra_pool": "rgw-data-pool:my-store.buckets.non-ec", #only pool is not erasure coded, otherwise use different pool
"index_type": 0,
"inline_data": "true"
}
}
],
"realm_id": "65a7bf34-42d3-4344-ac40-035e160d7f9e",
"notif_pool": "rgw-meta-pool:my-store.notif"
}
The following steps need to implement internally in Rook Operator to add this feature assuming zone and pool are already created:
# radosgw-admin zone get --rgw-zone=my-store > zone.json
# modify the zone.json to add the pools as above
# radosgw-admin zone set --rgw-zone=my-store --infile=zone.json
# radosgw-admin period update --commit
After deleting the object store the data in the rados namespace can deleted by Rook Operator using following commands:
# rados -p rgw-meta-pool ls --all | grep my-store
# rados -p rgw-data-pool ls --all | grep my-store
# rados -p <rgw-meta-pool | rgw-data-pool> rm <each object> -N <namespace> (from above list)
If there is a zone section in object-store configuration, then the pool creation will configured by the CephObjectZone CR. The CephObjectStore CR will include below section to specify the zone name.
spec:
zone:
name: zone1
Object Storage API allows users to specify where bucket data will be stored during bucket creation. With <LocationConstraint> parameter in S3 API and X-Storage-Policy header in SWIFT. Similarly, users can override where object data will be stored by setting X-Amz-Storage-Class and X-Object-Storage-Class during object creation.
In Ceph these data locations represented by Pools. Placement targets control which Pools are associated with a particular bucket. A bucket’s placement target is selected on creation, and cannot be modified (See Ceph doc). Placement target can also define custom Storage classes to override data pool where object will be stored.
Ceph administrator creates set of index_pool, data_pool, and optional data_extra_pool per each target placement. Selects arbitrary names for target placements. For example fast and cold:
[!NOTE]
data_extra_poolis for data that cannot use erasure coding. For example, multi-part uploads allow uploading a large object such as a movie in multiple parts. These parts must first be stored without erasure coding. So ifdata_poolis not erasure coded, then there is not need fordata_extra_pool.
# Default Data pool
apiVersion: ceph.rook.io/v1
kind: CephBlockPool
metadata:
name: rgw-meta-pool
namespace: rook-ceph
spec:
...
---
# Default index pool
apiVersion: ceph.rook.io/v1
kind: CephBlockPool
metadata:
name: rgw-data-pool
namespace: rook-ceph
spec:
...
# Bucket Data pool for fast storage
apiVersion: ceph.rook.io/v1
kind: CephBlockPool
metadata:
name: rgw-fast-data-pool
namespace: rook-ceph
spec:
...
---
# Bucket index pool for fast storage
apiVersion: ceph.rook.io/v1
kind: CephBlockPool
metadata:
name: rgw-fast-meta-pool
namespace: rook-ceph
spec:
...
---
# Bucket Data pool for cold storage
apiVersion: ceph.rook.io/v1
kind: CephBlockPool
metadata:
name: rgw-cold-data-pool
namespace: rook-ceph
spec:
...
---
# Bucket index pool for cold storage
apiVersion: ceph.rook.io/v1
kind: CephBlockPool
metadata:
name: rgw-cold-meta-pool
namespace: rook-ceph
spec:
...
If needed, create additional pool to use it in custom StorageClass, for example GLACIER:
[!NOTE] Ceph allows arbitrary name for StorageClasses, however some clients/libs insist on AWS names so it is recommended to use one of the valid x-amz-storage-class values for better compatibility:
STANDARD | REDUCED_REDUNDANCY | STANDARD_IA | ONEZONE_IA | INTELLIGENT_TIERING | GLACIER | DEEP_ARCHIVE | OUTPOSTS | GLACIER_IR | SNOW | EXPRESS_ONEZONE
apiVersion: ceph.rook.io/v1
kind: CephBlockPool
metadata:
name: glacier-data-pool
namespace: rook-ceph
spec:
...
Then update CephObjectStore CRD sharedPools section with target placements and StorageClasses definitions:
spec:
# [OPTIONAL] - sharedPools is mutually exclusive with spec.metadataPool and spec.dataPool
# It is the place to tell RGW which pools can be used to store buckets and objects data and metadata.
# All pools under this section should be created beforehand.
# Rook will append rados namespace to each of the pools, so it is safe to use the same pool across multiple RGW instances.
# It is also safe to use the same pool in different target placements.
sharedPools:
# [EXISTING,OPTIONAL] - metadata pool. Will be used by default to store bucket index
# default poolPlacement.metadataPoolName will override this value, so it is not makes sense to use together with poolPlacements.
metadataPoolName: rgw-meta-pool # results to rgw-meta-pool:<CephObjectStore.name>.index
# [EXISTING,OPTIONAL] - data pool. Will be used by default to store bucket objects
# default poolPlacement.dataPoolName will override this value, so it is not makes sense to use together with poolPlacements.
dataPoolName: rgw-data-pool # results to rgw-data-pool:<CephObjectStore.name>.data
# New settings:
# [OPTIONAL] - defines pool placement targets.
# Placement targets control which Pools are associated with a particular bucket.
# See: https://docs.ceph.com/en/latest/radosgw/placement/#placement-targets
poolPlacements:
# Default pool placement:
- name: "default" # [REQUIRED] - placement id. Placement with name "default" will be used as default,
# and therefore will override sharedPools.metadataPoolName and sharedPools.dataPoolName if presented.
# This means that default poolPlacement is also mutually exclusive with spec.metadataPool and spec.dataPool
metadataPoolName: rgw-meta-pool # [REQUIRED] results to rgw-meta-pool:<CephObjectStore.name>.index
dataPoolName: rgw-data-pool # [REQUIRED] results to rgw-data-pool:<CephObjectStore.name>.data
dataNonECPoolName: "" # [OPTIONAL] results to rgw-data-pool:<CephObjectStore.name>.data.nonec
# [OPTIONAL] - Storage classes for default placement target.
# Any placement has default storage class named STANDARD pointing to dataPoolName.
# So stageClasses allows to define additional storage classes on top of STANDARD for given placement.
storageClasses:
# Pool placement target named "fast"
- name: "fast"
metadataPoolName: rgw-fast-meta-pool # [REQUIRED] results to rgw-fast-meta-pool:<CephObjectStore.name>.fast.index
dataPoolName: rgw-fast-data-pool # [REQUIRED] results to rgw-fast-data-pool:<CephObjectStore.name>.fast.data
dataNonECPoolName: "" # [OPTIONAL] results to rgw-fast-data-pool:<CephObjectStore.name>.fast.data.nonec
# [OPTIONAL] - Additional Storage classes for fast placement target. Provide alternatives to default STANDARD storage clsss.
# STANDARD storage class cannot be changed or removed and always points to placement dataPoolName to store object data.
storageClasses:
# Pool placement target named "cold"
- name: "cold" # [REQUIRED] - placement id. Can be arbitrary.
metadataPoolName: rgw-cold-meta-pool # [REQUIRED] results to rgw-cold-meta-pool:<CephObjectStore.name>.cold.index
dataPoolName: rgw-cold-data-pool # [REQUIRED] results to rgw-cold-data-pool:<CephObjectStore.name>.cold.data
dataNonECPoolName: "" # [OPTIONAL] results to rgw-cold-data-pool:<CephObjectStore.name>.cold.data.nonec
# [OPTIONAL] - Additional Storage classes for cold placement target. Provides alternatives to default STANDARD storage clsss.
# STANDARD storage class cannot be changed or removed and always points to placement dataPoolName to store object data.
storageClasses:
- name: GLACIER # [REQUIRED] StorageClass name. Can be arbitrary but it is better to use one of AWS x-amz-storage-class names.
dataPoolName: glacier-data-pool # [REQUIRED] results to glacier-data-pool:<CephObjectStore.name>.glacier
Here we specified that pools rgw-meta-pool and rgw-data-pool will be used by default. We also added extra placements named fast and cold with corresponding data and metadata pools. We also introduced extra storage class GLACIER for cold placement, which can be used to override object data pool to glacier-data-pool within cold placement.
Rook CephObjectStore controller reconciliation loop for target placements:
spec.sharedPools:CephObjectStore.spec.zone.name | CephObjectStore.metadata.name - equal to zone name for multisite, otherwise equal to rgw name.radosgw-admin zonegroup get --rgw-zonegroup=<name>
{
"placement_targets": [
{
"name": "default-placement",
"tags": [],
"storage_classes": [
"STANDARD"
]
}
],
"default_placement": "default-placement",
...
}
{
"placement_targets": [
{
"name": "default-placement",
"tags": [],
"storage_classes": [
"STANDARD"
]
}
{
"name": "fast",
"tags": [],
"storage_classes": [
"STANDARD"
]
},
{
"name": "cold",
"tags": [],
"storage_classes": [
"STANDARD","GLACIER"
]
}
],
"default_placement": "default-placement",
...
}
radosgw-admin zonegroup set --rgw-zonegroup=<name> --infile zonegroup.jsonradosgw-admin zone get --rgw-zone=<name>
{
"placement_pools": [
{
"key": "default-placement",
"val": {
"index_pool": "<CephObjectStore name>.rgw.buckets.index",
"storage_classes": {
"STANDARD": {
"data_pool": "<CephObjectStore name>.rgw.buckets.data"
}
},
"data_extra_pool": "<CephObjectStore name>.rgw.buckets.non-ec",
"index_type": 0,
"inline_data": true
}
}
],
...
}
{
"placement_pools": [
{
"key": "default-placement",
"val": {
"index_pool": "rgw-meta-pool:<CephObjectStore.name>.index",
"storage_classes": {
"STANDARD": {
"data_pool": "rgw-data-pool:<CephObjectStore.name>.data"
}
},
"data_extra_pool": "rgw-data-pool:<CephObjectStore.name>.data.nonec",
"index_type": 0,
"inline_data": true
}
},
{
"key": "fast",
"val": {
"index_pool": "rgw-cold-meta-pool:<CephObjectStore.name>.fast.index",
"storage_classes": {
"STANDARD": {
"data_pool": "rgw-fast-data-pool:<CephObjectStore.name>.fast.data"
},
},
"data_extra_pool": "rgw-fast-data-pool:<CephObjectStore.name>.fast.data.nonec",
"index_type": 0,
"inline_data": true
}
},
{
"key": "cold",
"val": {
"index_pool": "rgw-cold-meta-pool:<CephObjectStore.name>.cold.index",
"storage_classes": {
"STANDARD": {
"data_pool": "rgw-cold-data-pool:<CephObjectStore.name>.cold.data"
},
"GLACIER": {
"data_pool": "glacier-data-pool:<CephObjectStore.name>.glacier"
}
},
"data_extra_pool": "rgw-cold-data-pool:<CephObjectStore.name>.cold.data.nonec",
"index_type": 0,
"inline_data": true
}
},
],
...
}
radosgw-admin zone set --rgw-zone=<name> --infile zone.jsonradosgw-admin period update --commitThe gateway settings correspond to the RGW service.
type: Can be s3. In the future support for swift can be added.sslCertificateRef: If specified, this is the name of the Kubernetes secret that contains the SSL
certificate to be used for secure connections to the object store. The secret must be in the same
namespace as the Rook cluster. If it is an opaque Kubernetes Secret, Rook will look in the secret
provided at the cert key name. The value of the cert key must be in the format expected by the
RGW
service:
"The server key, server certificate, and any other CA or intermediate certificates be supplied in
one file. Each of these items must be in pem form." If the certificate is not specified, SSL will
not be configured.caBundleRef: If specified, this is the name of the Kubernetes secret (type opaque) that contains ca-bundle to use. The secret must be in the same namespace as the Rook cluster. Rook will look in the secret provided at the cabundle key name.port: The service port where the RGW service will be listening (http)securePort: The service port where the RGW service will be listening (https)instances: The number of RGW pods that will be started for this object storeplacement: The rgw pods can be given standard Kubernetes placement restrictions with nodeAffinity, tolerations, podAffinity, podAntiAffinity, and topologySpreadConstraints similar to placement defined for daemons configured by the cluster CRD.The RGW service can be configured to listen on both http and https by specifying both port and securePort.
gateway:
sslCertificateRef: my-ssl-cert-secret
securePort: 443
instances: 1
By default, the object store will be created independently from any other object stores and replication to another object store will not be configured. This done by creating a new Ceph realm, zone group, and zone all with the name of the new object store.
If desired to configure the object store to replicate and sync data amongst object-store or Ceph clusters, the zone section would be required.
This section enables the object store to be part of a specified ceph-object-zone.
Specifying this section also ensures that the pool section in the ceph-object-zone is used for the object-store. If pools are specified for the object-store they are neither created nor deleted.
name: name of the ceph-object-zone the object store is in. This name must be of a ceph-object-zone resource not just of a zone that has been already created. zone:
name: "name"
The Ceph Object Gateway supports integrating with LDAP for authenticating and creating users, please refer here. This means that the rgw backend user is also required to be part of groups in the LDAP server otherwise, authentication will fail. The rgw backend user can be generated from CephObjectStoreUser or ObjectBucketClaim CRDs. For the both resources credentials are saved in Kubernetes Secrets which may not be valid with LDAP Server, user need to follow the steps mentioned here.The following settings need to be configured in the RGW server:
rgw ldap binddn =
rgw ldap secret = /etc/ceph/ldap/bindpass.secret
rgw ldap uri =
rgw ldap searchdn =
rgw ldap dnattr =
rgw ldap searchfilter =
rgw s3 auth use ldap = true
So the CRD for the Ceph Object Store will be modified to include the above changes:
spec:
security
ldap:
config:
uri: ldaps://ldap-server:636
binddn: "uid=ceph,cn=users,cn=accounts,dc=example,dc=com"
searchdn: "cn=users,cn=accounts,dc=example,dc=com"
dnattr: "uid"
searchfilter: "memberof=cn=s3,cn=groups,cn=accounts,dc=example,dc=com"
credential:
volumeSource:
secret:
secretName: object-my-store-ldap-creds
defaultMode: 0600 #required
The config section includes options used for RGW wrt LDAP server. These options are strongly typed rather than string map approach since very less chance to modify in future.
uri: It specifies the address of LDAP server to use.* binddn: The bind domain for the service account used by RGW server.searchdn: The search domain where can it look for the user details.dnattr: The attribute being used in the constructed search filter to match a username, this can either be uid or cn.searchfilter: A generic search filter. If dnattr is set, this filter is &()'d together with the automatically constructed filter.The credential defines where the password for accessing ldap server should be sourced from
volumeSource: this is a standard Kubernetes VolumeSource for the Kerberos keytab file like
what is normally used to configure Volumes for a Pod. For example, a Secret or HostPath.
There are two requirements for the source's content:
subPath: password. For example, in a Secret, the
data item must be named password, or items must be defined to select the key and
give it path password. A HostPath directory must have the password file.The CA bundle for ldap can be added to the caBundleRef option in Gateway settings:
spec:
gateway:
caBundleRef: #ldaps-cabundle
The Ceph Object Gateway supports accessing buckets using virtual host-style addressing, which allows accessing buckets using the bucket name as a subdomain in the endpoint. This is important because AWS (the primary definer of the S3 interface) has deprecated the alternative, path-style access which is Ceph (and Rook's) default deployment. AWS has extended support for path-style access, but Rook/OBC/COSI users have begun to identify applications which do not support the long-deprecated path-style access.
Virtual host-style (vhost-style) addressing requires 2 things:
rgw_dns_names listThe user can configure this option manually in Rook like below for a wildcard-enabled endpoint:
# ceph config set client.rgw.mystore.a rgw_dns_name my-ingress.mydomain.com,rook-ceph-rgw-my-store.rook-ceph.svc
In this example, if the object store contains a bucket named sample, then the S3 vhost-style
access URL would be sample.my-ingress.mydomain.com. More details about the feature can be
found in Ceph Documentation.
This is supported from Ceph Reef release(v18.0) onwards.
Wildcard DNS addressing can be configured in myriad ways. Some options:
Rook will implement the following API to allow for user configuration, with detailed design notes to follow.
hosting (optional) - configures hosting settings for the CephObjectStore
advertiseEndpoint (optional) - allow users to definitively specify which endpoint the user
wants applications to use by default for internal S3 connections.dnsNames (optional) - RGW will reject any S3 connections to unknown endpoints. Users should
add any additional endpoints RGW should accept here.The below example illustrates a commonly-anticipated configuration, and one that Rook can recommend to users. The user is using an internal, wildcard-enabled K8s service as the advertised endpoint for OBCs. The user also has externally-available ingress that the RGW needs to accept connections from, which serves S3 applications outside the Kubernetes cluster. Both endpoints are wildcard-enabled, but the internal service is preferable to the external ingress to prevent the ingress router from being a cluster-wide S3 bottleneck for Kubernetes applications.
spec:
hosting:
advertiseEndpoint:
dnsName: my-internal-wildcard-service.mydomain.com
port: 8443
useTLS: true
dnsNames:
- my-external-ingress.mydomain.com
These proposed configurations are necessary for users to enable vhost-style addressing, but they are not only useful when DNS wildcarding is enabled. These configurations are independent and may provide some value to other user scenarios. Be clear in documentation that wildcarding is enabled by these features but not required.
When the user configures DNS wildcarding on an endpoint other than the CephObjectStore service
endpoint, Rook should advertise that endpoint to CephObjectStores, OBCs, and COSI Buckets/Accesses
as a priority. However, Rook cannot know which endpoints support wildcarding in order to prioritize
advertising them. Therefore, allow users to disambiguate for Rook which endpoint should be
advertised via advertiseEndpoint.
S3 clients (and therefore OBCs and COSI) generally assume a single endpoint for an object store. Multiple advertised endpoints will not be supported to avoid user and internal/developer confusion.
By default, Rook will advertise the CephObjectStore service endpoint with a priority on advertising
the HTTPS (securePort) endpoint. Because the advertised endpoint is primarily relevant for
resources internal to the Kubernetes cluster, this default should be sufficient for most users, and
this is the behavior expected by users when dnsNames is not configured, so it should be familiar.
When this feature is enabled, there should be no ambiguity about which endpoint Rook will use for Admin Ops API communication. As an HTTP server, RGW is only able to return a single TLS certificate to S3 clients (more detail). For maximum compatibility while TLS is enabled, Rook should connect to the same endpoint that users do. Internally, Rook will use the advertise endpoint as configured.
Rook documentation will inform users that if TLS is enabled, they must give Rook a certificate that
accepts the service endpoint. Alternately, if that is not possible, Rook will add an
insecureSkipTlsVerification option to the CephObjectStore to allow users to provision a healthy
CephObjectStore. This opens users up to machine-in-the-middle attacks, so users should be advised to
only use it for test/proof-of-concept clusters, or to work around bugs temporarily.
Some users have reported issues with Rook using a dnsNames endpoint
(or advertiseEndpoint) when they wish to set up ingress certificates after Rook deployment. The
obvious alternative is to have Rook always use the CephObjectStore service, but other users have
expressed troubles creating certificates or CAs that allow the service endpoint in the past.
Each rgw_dns_name entry must be a valid RFC-1123 hostname, and Rook Operator will perform input validation using k8s apimachinery IsDNS1123Subdomain().
When rgw_dns_name is changed for an RGW cluster, all RGWs need to be restarted. To enforce this in Rook, we can apply the --rgw-dns-name flag, which will restart RGWs with no user action needed.
This is different from CephObjectZone customEndpoints which is used for configuring the object store to replicate and sync data amongst Multisite.
When endpoints are configured in rgw_dns_names, the RGW will reject any incoming connections
intended for endpoints not in the list. Therefore, all endpoints that might be used must be added.
For convenience, and to ensure other CephObjectStore configurations are not rendered unusable when
users add additional endpoints to dnsNames, the following should be added to the rgw_dns_names
list automatically by Rook:
advertiseEndpoint.dnsNamerook-ceph-rgw-my-store.rook-ceph.svc)customEndpointsWhen Rook builds the rgw_dns_names list internally, Rook should remove any duplicate entries.
While Rook add endpoints to the list for safety and convenience, users might add the same endpoints,
which Rook should not treat as a configuration bug. Rook should also ensure the list ordering is
consistent between reconciles.
Rook can refer users to this Kubernetes doc for a suggested way that they can manage certificates in a Kubernetes cluster that work with Kubernetes services like the CephObjectStore service: https://kubernetes.io/docs/tasks/tls/managing-tls-in-a-cluster/
For external CephCephObjectStores (i.e., when spec.gateway.externalRgwEndpoints are set),
vhost-style addressing should be configured on the host cluster, and hosting.dnsNames is
irrelevant. The default advertiseEndpoint for external CephObjectStores is the first entry in the
spec.gateway.externalRgwEndpoints list, which users should be able to override if desired.
RGW is a very active project with many existing features and regular new feature additions. Rook maintainers have a hard time catching up with RGW's latest features. In an effort to help users who want access to rare and new features, Rook can allow users to set arbitrary Ceph configs in their CephObjectStore CR. RGW config reference: https://docs.ceph.com/en/reef/radosgw/config-ref/
This is an advanced feature. It allows users to modify many untested aspects of the RGW and object store. Any user configs that break the object store should be documented to be a responsibility of the user.
spec:
metadataPool: # ...
dataPool: # ...
gateway:
rgwConfig:
conf-name: value
other_conf: value
debug_rgw: "20"
rgwCommandFlags:
flag-name: value
other_flag: value
rgw_enable_apis: "s3, s3website, swift, swift_auth, sts, iam, notifications"
CephCluster has a configuration section called config that sets mon config store values. config
may be too vague in the context of the CephObjectStore CR. rgwConfig is a clearer naming for users
and gives context that Ceph RGW configs are expected. A search engine search for "rgw config" also
yields the correct Ceph documentation for users, which suggest the naming is apt.
CephObjectStore will also add an rgwCommandFlags variable to allow users to set CLI flags to the
radosgw deployment. Changes to these parameters will result in RGW daemon restarts, which can be
desired behavior for config values like rgw_enable_apis. In testing, the radosgw command process
supports adding multiple flags, and the last flag sequentially is used as the input source.
Rook users can already use the CephCluster.spec.config section to control individual
CephObjectStore configurations today. However, this requires users to know the name of the Ceph
client user for each CephObjectStore, and it only allows setting object store fields after the
object store is provisioned. Adding this feature improves config override usability for object store
users, and it keeps config overrides in the same place as the resource. This is a clear quality of
life improvement for users managing object stores.
Rook will apply rgwConfig values to the mon config store after its own values are set. Similarly,
Rook will append rgwCommandFlags to the CLI flags it sets normally. This allows users to override
anything Rook is doing to configure the RGW. This tradeoff favors user flexibility at the risk of
less advanced users having more ability to break their RGW deployments. Rook can't make guarantees
about data security when configs are overridden.
For debugging, Rook should log when it overrides its own ceph mon store config with a user setting. This is not easily feasible when Rook applies CLI flags to the RGW deployment, but users can see when when their values conflict by inspecting the RGW deployment; this is sufficient for debugging.
Rook will apply each configuration under the client scope of the shared RGW client.
i.e., client.rook-ceph-rgw-<store-name>.a. This will also help ensure that configs changed don't
affect other Rook-managed Ceph daemons, critically mons, osds, and mds.
Configs set via rgwConfig will be equivalent to a Ceph config section like the following ini.
Config keys can use spaces, hyphens, or underscores. CLI flags must use hyphens, and Rook can
auto-convert spaces/underscores for ease of use.
[client.rook-ceph-rgw-<store-name>.a]
conf_name = value
other_conf = value
debug_rgw = "20"
Because the subject of the configs are limited to RGWs using the
client.rook-ceph-rgw-<store-name>.a ID, risk of affecting other Ceph components is low. Rook has
the option to allow-/deny-list the configs users are able to set, but this limits user flexibility.
Since it will be documented that any breakage or bugs related to options set via this mechanism is
their responsibility, we can be further sure Rook maintainers won't be obligated to stress
themselves for abuse or misuse of this feature.
Rook will also not perform validation to check if CephObjectStore configs conflict with CephCluster
config options. Users should refrain from setting client.rook-ceph-rgw-* configs in the
CephCluster when this feature is used. Documentation should note this for users. Rook can revisit
this decision in the future to add cross-CRD validation if it becomes a strong user desire/need.