design/Implemented/backup-repo-cache-volume.md
Backup Storage: The storage to store the backup data. Check Unified Repository design for details.
Backup Repository: Backup repository is layered between BR data movers and Backup Storage to provide BR related features that is introduced in Unified Repository design.
Velero Generic Data Path (VGDP): VGDP is the collective of modules that is introduced in Unified Repository design. Velero uses these modules to finish data transfer for various purposes (i.e., PodVolume backup/restore, Volume Snapshot Data Movement). VGDP modules include uploaders and the backup repository.
Data Mover Pods: Intermediate pods which hold VGDP and complete the data transfer. See VGDP Micro Service for Volume Snapshot Data Movement and VGDP Micro Service For fs-backup for details.
Repository Maintenance Pods: Pods for Repository Maintenance Jobs, which holds VGDP to run repository maintenance.
According to the Unified Repository design Velero uses selectable backup repositories for various backup/restore methods, i.e., fs-backup, volume snapshot data movement, etc. Some backup repositories may need to cache data on the client side for various repository operation, so as to accelerate the execution.
In the existing Backup Repository Configuration, we allow users to configure the cache data size (cacheLimitMB). However, the cache data is still stored in the root file system of data mover pods/repository maintenance pods, so stored in the root file system of the node. This is not good enough, reasons:
We need to allow users to prepare a dedicated location, e.g., a dedictated volume, for the cache.
Not all backup repositories or not all backup repository operations require cache, we need to define the details when and how the cache is used.
Varying on backup repositoires, cache data may include payload data or repository metadata, e.g., indexes to the payload data chunks.
Payload data is highly related to the backup data, and normally take the majority of the repository data as well as the cache data.
Repository metadata is related to the backup repository's chunking algorithm, data chunk mapping method, etc, and so the size is not proportional to the backup data size.
On the other hand for some backup repository, in extreme cases, the repository metadata may be significantly large. E.g., Kopia's indexes are per chunks, if there are huge number of small files in the repository, Kopia's index data may be in the same level of or even larger than the payload data.
However, in the cases that repository metadata data become the majority, other bottlenecks may emerge and concurrency of data movers may be significantly constrained, so the requirement to cache volumes may go away.
Therefore, for now we only consider the cache volume requirement for payload data, and leave the consideration for metadata as a future enhancement.
Backup repository cache varies on backup repositories and backup repository operation during VGDP runs. Below are the scenarios when VGDP runs:
The above analyses are based on the common behavior of backup repositories and they are not considering the case that backup repository metadata takes majority or siginficant proportion of the cache data.
As a conclusion of the analyses, we will create dedicated cache volumes for restore scenarios.
For other scenarios, we can add them regarded to the future changes/requirements. The mechanism to expose and connect the cache volumes should work for all scenarios. E.g., if we need to consider the backup repository metadata case, we may need cache volumes for backup and repository maintenance as well, then we can just reuse the same cache volume provision and connection mechanism to backup and repository maintenance scenarios.
If available, one cache volume is dedicately assigned to one data mover pod. That is, the cached data is destroyed when the data mover pod completes. Then the backup repository instance also closes.
Cache data are fully managed by the specific backup repository. So the backup repository may also have its own way to GC the cache data.
That is to say, cache data GC may be launched by the backup repository instance during the running of the data mover pod; then the left data are automatically destroyed when the data mover pod and the cache PVC are destroyed (cache PVC's reclaimPolicy is always Deleted, so once the cache PVC is destroyed, the volume will also be destroyed). So no specially logics are needed for cache data GC.
Cache volumes take storage space and cluster resources (PVC, PV), therefore, cache volumes should be created only when necessary and the volumes should be with reasonable size based on the cache data size:
cacheLimitMB is used for this purpose. E.g., it could be set as 1024 for a 1TB backup, which means 1GB of data is cached and the old cache data exceeding this size will be cleared. Therefore, it is meaningless to set the cache volume size much larger than cacheLimitMBThe cache volume size is calculated from below factors (for Restore scenarios):
cacheLimitMB, the default value is 5GBA formula is as below:
cacheVolumeSize = ((backupSize != 0 ? (backupSize > residentThreshold ? limit : 0) : limit) * (100 + inflationPercentage)) / 100
Finally, the cacheVolumeSize will be rounded up to GiB considering the UX friendliness, storage friendliness and management friendliness.
The PVC for a cache volume is created in Velero namespace and a storage class is required for the cache PVC. The PVC's accessMode is ReadWriteOnce and volumeMode is FileSystem, so the storage class provided should support this specification. Otherwise, if the storageclass doesn't support either of the specifications, the data mover pod may be hang in Pending state until a timeout setting with the data movement (e.g. prepareTimeout) and the data movement will finally fail.
It is not expected that the cache volume is retained after data mover pod is deleted, so the reclaimPolicy for the storageclass must be Delete.
To detect the problems in the storageclass and fail earlier, a validation is applied to the storageclass and once the validation fails, the cache configuration will be ignored, so the data mover pod will be created without a cache volume.
Below configurations are introduced:
Not like cacheLimitMB which is set to and affect the backup repository, the above two configurations are actually data mover configurations of how to create cache volumes to data mover pods; and the two configurations don't need to be per backup repository. So we add them to the node-agent Configuration.
Below are some examples of the node-agent configMap with the configurations:
Sample-1:
{
"cacheVolume": {
"storageClass": "sc-1",
"residentThresholdMB": 1024
}
}
Sample-2:
{
"cacheVolume": {
"storageClass": "sc-1",
}
}
Sample-3:
{
"cacheVolume": {
"residentThresholdMB": 1024
}
}
sample-1: This is a valid configuration. Restores with backup data size larger than 1G will be assigned a cache volume using storage class sc-1.
sample-2: This is a valid configuration. Data mover pods are always assigned a cache volume using storage class sc-1.
sample-3: This is not a valid configuration because the storage class is absent. Velero gives up creating a cache volume.
To create the configMap, users need to save something like the above sample to a json file and then run below command:
kubectl create cm <ConfigMap name> -n velero --from-file=<json file name>
The cache volume configurations will be visited by node-agent server, so they also need to specify the --node-agent-configmap to the velero node-agent parameters.
The restore needs to know the backup size so as to calculate the cache volume size, some new fields are added to the DataDownload and PodVolumeRestore CRDs.
snapshotSize field is also added to DataDownload and PodVolumeRestore's spec:
spec:
snapshotID:
description: SnapshotID is the ID of the Velero backup snapshot to
be restored from.
type: string
snapshotSize:
description: SnapshotSize is the logical size of the snapshot.
format: int64
type: integer
snapshotSize represents the total size of the backup; during restore, the value is transferred from DataUpload/PodVolumeBackup's Status.Progress.TotalBytes to DataDownload/PodVolumeRestore.
It is unlikely that Status.Progress.TotalBytes from DataUpload/PodVolumeBackup is unavailable, but once it happens, according to the above formula, residentThresholdMB is ignored, cache volume size is calculated directly from cache limit for the corresponding backup repository.
Cache volume configurations are retrieved by node-agent and passed through DataDownload/PodVolumeRestore to GenericRestore exposer/PodVolume exposer.
The exposers are responsible to calculate cache volume size, create cache PVCs and mount them to the restorePods.
If the calculated cache volume size is 0, or any of the critical parameters is missing (e.g., cache volume storage class), the exposers ignore the cache volume configuration and continue with creating restorePods without cache volumes, so no impact to the result of the restore.
Exposers mount the cache volume to a predefined directory and pass the directory to the data mover pods through the cache-volume-path parameter.
Below data structure is added to the exposers' expose parameters:
type GenericRestoreExposeParam struct {
// RestoreSize specifies the data size for the volume to be restored
RestoreSize int64
// CacheVolume specifies the info for cache volumes
CacheVolume *CacheVolumeInfo
}
type PodVolumeExposeParam struct {
// RestoreSize specifies the data size for the volume to be restored
RestoreSize int64
// CacheVolume specifies the info for cache volumes
CacheVolume *repocache.CacheConfigs
}
type CacheConfigs struct {
// StorageClass specifies the storage class for cache volumes
StorageClass string
// Limit specifies the maximum size of the cache data
Limit int64
// ResidentThreshold specifies the minimum size of the cache data to create a cache volume
ResidentThreshold int64
}
Data mover pods retrieve the cache volume directory from cache-volume-path parameter and pass it to Unified Repository.
If the directory is empty, Unified Repository uses the resident location for data cache, that is, the root file system.
Kopia repository supports cache directory configuration for both metadata and data. The existing SetupConnectOptions is modified to customize the CacheDirectory:
func SetupConnectOptions(ctx context.Context, repoOptions udmrepo.RepoOptions) repo.ConnectOptions {
...
return repo.ConnectOptions{
CachingOptions: content.CachingOptions{
CacheDirectory: cacheDir,
...
},
...
}
}