design/Implemented/priority-class-name-support_design.md
This design document outlines the implementation of priority class name support for Velero components, including the Velero server deployment, node agent daemonset, and maintenance jobs. This feature allows users to specify a priority class name for Velero components, which can be used to influence the scheduling and eviction behavior of these components.
Kubernetes allows users to define priority classes, which can be used to influence the scheduling and eviction behavior of pods. Priority classes are defined as cluster-wide resources, and pods can reference them by name. When a pod is created, the priority admission controller uses the priority class name to populate the priority value for the pod. The scheduler then uses this priority value to determine the order in which pods are scheduled.
Currently, Velero does not provide a way for users to specify a priority class name for its components. This can be problematic in clusters where resource contention is high, as Velero components may be evicted or not scheduled in a timely manner, potentially impacting backup and restore operations.
The implementation will add new fields to the Velero options struct to store the priority class names for the server deployment and node agent daemonset. The Velero CLI will be updated to accept new flags for these components. For data mover pods and maintenance jobs, priority class names will be configured through existing ConfigMap mechanisms (node-agent-configmap for data movers and repo-maintenance-job-configmap for maintenance jobs). The Velero deployment, node agent daemonset, maintenance jobs, and data mover pods will be updated to use their respective priority class names.
New flags will be added to the velero install command to specify priority class names for different components:
flags.StringVar(
&o.ServerPriorityClassName,
"server-priority-class-name",
o.ServerPriorityClassName,
"Priority class name for the Velero server deployment. Optional.",
)
flags.StringVar(
&o.NodeAgentPriorityClassName,
"node-agent-priority-class-name",
o.NodeAgentPriorityClassName,
"Priority class name for the node agent daemonset. Optional.",
)
Note: Priority class names for data mover pods and maintenance jobs will be configured through their respective ConfigMaps (--node-agent-configmap for data movers and --repo-maintenance-job-configmap for maintenance jobs).
The VeleroOptions struct in pkg/install/resources.go will be updated to include new fields for priority class names:
type VeleroOptions struct {
// ... existing fields ...
ServerPriorityClassName string
NodeAgentPriorityClassName string
}
The podTemplateConfig struct in pkg/install/deployment.go will be updated to include a new field for the priority class name:
type podTemplateConfig struct {
// ... existing fields ...
priorityClassName string
}
A new function, WithPriorityClassName, will be added to set this field:
func WithPriorityClassName(priorityClassName string) podTemplateOption {
return func(c *podTemplateConfig) {
c.priorityClassName = priorityClassName
}
}
The Deployment function will be updated to use the priority class name:
deployment := &appsv1api.Deployment{
// ... existing fields ...
Spec: appsv1api.DeploymentSpec{
// ... existing fields ...
Template: corev1api.PodTemplateSpec{
// ... existing fields ...
Spec: corev1api.PodSpec{
// ... existing fields ...
PriorityClassName: c.priorityClassName,
},
},
},
}
The DaemonSet function will use the priority class name passed via the podTemplateConfig (from the CLI flag):
daemonSet := &appsv1api.DaemonSet{
// ... existing fields ...
Spec: appsv1api.DaemonSetSpec{
// ... existing fields ...
Template: corev1api.PodTemplateSpec{
// ... existing fields ...
Spec: corev1api.PodSpec{
// ... existing fields ...
PriorityClassName: c.priorityClassName,
},
},
},
}
The JobConfigs struct in pkg/repository/maintenance/maintenance.go will be updated to include a field for the priority class name:
type JobConfigs struct {
// LoadAffinities is the config for repository maintenance job load affinity.
LoadAffinities []*kube.LoadAffinity `json:"loadAffinity,omitempty"`
// PodResources is the config for the CPU and memory resources setting.
PodResources *kube.PodResources `json:"podResources,omitempty"`
// PriorityClassName is the priority class name for the maintenance job pod
// Note: This is only read from the global configuration, not per-repository
PriorityClassName string `json:"priorityClassName,omitempty"`
}
The buildJob function will be updated to use the priority class name from the global job configuration:
func buildJob(cli client.Client, ctx context.Context, repo *velerov1api.BackupRepository, bslName string, config *JobConfigs,
podResources kube.PodResources, logLevel logrus.Level, logFormat *logging.FormatFlag) (*batchv1.Job, error) {
// ... existing code ...
// Use the priority class name from the global job configuration if available
// Note: Priority class is only read from global config, not per-repository
priorityClassName := ""
if config != nil && config.PriorityClassName != "" {
priorityClassName = config.PriorityClassName
}
// ... existing code ...
job := &batchv1.Job{
// ... existing fields ...
Spec: batchv1.JobSpec{
// ... existing fields ...
Template: corev1api.PodTemplateSpec{
// ... existing fields ...
Spec: corev1api.PodSpec{
// ... existing fields ...
PriorityClassName: priorityClassName,
},
},
},
}
// ... existing code ...
}
Users will be able to configure the priority class name for all maintenance jobs by creating the repository maintenance job ConfigMap before installation. For example:
# Create the ConfigMap before running velero install
cat <<EOF | kubectl create configmap repo-maintenance-job-config -n velero --from-file=config.json=/dev/stdin
{
"global": {
"priorityClassName": "low-priority",
"podResources": {
"cpuRequest": "100m",
"memoryRequest": "128Mi"
}
}
}
EOF
# Then install Velero referencing this ConfigMap
velero install --provider aws \
--repo-maintenance-job-configmap repo-maintenance-job-config \
# ... other flags
The ConfigMap can be updated after installation to change the priority class for future maintenance jobs. Note that only the "global" configuration is used for priority class - all maintenance jobs will use the same priority class regardless of which repository they are maintaining.
We'll update the Configs struct in pkg/nodeagent/node_agent.go to include a field for the priority class name in the node-agent-configmap:
type Configs struct {
// ... existing fields ...
// PriorityClassName is the priority class name for the data mover pods
// created by the node agent
PriorityClassName string `json:"priorityClassName,omitempty"`
}
This will allow users to configure the priority class name for data mover pods through the node-agent-configmap. Note that the node agent daemonset itself gets its priority class from the --node-agent-priority-class-name CLI flag during installation, not from this configmap. For example:
# Create the ConfigMap before running velero install
cat <<EOF | kubectl create configmap node-agent-config -n velero --from-file=config.json=/dev/stdin
{
"priorityClassName": "low-priority",
"loadAffinity": [
{
"nodeSelector": {
"matchLabels": {
"node-role.kubernetes.io/worker": "true"
}
}
}
]
}
EOF
# Then install Velero referencing this ConfigMap
velero install --provider aws \
--node-agent-configmap node-agent-config \
--use-node-agent \
# ... other flags
The createBackupPod function in pkg/exposer/csi_snapshot.go will be updated to accept and use the priority class name:
func (e *csiSnapshotExposer) createBackupPod(
ctx context.Context,
ownerObject corev1api.ObjectReference,
backupPVC *corev1api.PersistentVolumeClaim,
operationTimeout time.Duration,
label map[string]string,
annotation map[string]string,
affinity *kube.LoadAffinity,
resources corev1api.ResourceRequirements,
backupPVCReadOnly bool,
spcNoRelabeling bool,
nodeOS string,
priorityClassName string, // New parameter
) (*corev1api.Pod, error) {
// ... existing code ...
pod := &corev1api.Pod{
// ... existing fields ...
Spec: corev1api.PodSpec{
// ... existing fields ...
PriorityClassName: priorityClassName,
// ... existing fields ...
},
}
// ... existing code ...
}
The call to createBackupPod in the Expose method will be updated to pass the priority class name retrieved from the node-agent-configmap:
priorityClassName, _ := kube.GetDataMoverPriorityClassName(ctx, namespace, kubeClient, configMapName)
backupPod, err := e.createBackupPod(
ctx,
ownerObject,
backupPVC,
csiExposeParam.OperationTimeout,
csiExposeParam.HostingPodLabels,
csiExposeParam.HostingPodAnnotations,
csiExposeParam.Affinity,
csiExposeParam.Resources,
backupPVCReadOnly,
spcNoRelabeling,
csiExposeParam.NodeOS,
priorityClassName, // Priority class name from node-agent-configmap
)
A new function, GetDataMoverPriorityClassName, will be added to the pkg/util/kube package (in the same file as ValidatePriorityClass) to retrieve the priority class name for data mover pods:
// In pkg/util/kube/priority_class.go
// GetDataMoverPriorityClassName retrieves the priority class name for data mover pods from the node-agent-configmap
func GetDataMoverPriorityClassName(ctx context.Context, namespace string, kubeClient kubernetes.Interface, configName string) (string, error) {
// configData is a minimal struct to parse only the priority class name from the ConfigMap
type configData struct {
PriorityClassName string `json:"priorityClassName,omitempty"`
}
// Get the ConfigMap
cm, err := kubeClient.CoreV1().ConfigMaps(namespace).Get(ctx, configName, metav1.GetOptions{})
if err != nil {
if apierrors.IsNotFound(err) {
// ConfigMap not found is not an error, just return empty string
return "", nil
}
return "", errors.Wrapf(err, "error getting node agent config map %s", configName)
}
if cm.Data == nil {
// No data in ConfigMap, return empty string
return "", nil
}
// Extract the first value from the ConfigMap data
jsonString := ""
for _, v := range cm.Data {
jsonString = v
break // Use the first value found
}
if jsonString == "" {
// No data to parse, return empty string
return "", nil
}
// Parse the JSON to extract priority class name
var config configData
if err := json.Unmarshal([]byte(jsonString), &config); err != nil {
// Invalid JSON is not a critical error for priority class
// Just return empty string to use default behavior
return "", nil
}
return config.PriorityClassName, nil
}
This function will get the priority class name from the node-agent-configmap. If it's not found, it will return an empty string.
To improve observability and help with troubleshooting, the implementation will include:
pkg/util/kube package alongside other Kubernetes utility functions:// In pkg/util/kube/priority_class.go
// ValidatePriorityClass checks if the specified priority class exists in the cluster
// Returns true if the priority class exists or if priorityClassName is empty
// Returns false if the priority class doesn't exist or validation fails
// Logs warnings when the priority class doesn't exist
func ValidatePriorityClass(ctx context.Context, kubeClient kubernetes.Interface, priorityClassName string, logger logrus.FieldLogger) bool {
if priorityClassName == "" {
return true
}
_, err := kubeClient.SchedulingV1().PriorityClasses().Get(ctx, priorityClassName, metav1.GetOptions{})
if err != nil {
if apierrors.IsNotFound(err) {
logger.Warnf("Priority class %q not found in cluster. Pod creation may fail if the priority class doesn't exist when pods are scheduled.", priorityClassName)
} else {
logger.WithError(err).Warnf("Failed to validate priority class %q", priorityClassName)
}
return false
}
logger.Infof("Validated priority class %q exists in cluster", priorityClassName)
return true
}
// In deployment creation
if c.priorityClassName != "" {
logger.Debugf("Setting priority class %q for Velero server deployment", c.priorityClassName)
}
// In daemonset creation
if c.priorityClassName != "" {
logger.Debugf("Setting priority class %q for node agent daemonset", c.priorityClassName)
}
// In maintenance job creation
if priorityClassName != "" {
logger.Debugf("Setting priority class %q for maintenance job %s", priorityClassName, job.Name)
}
// In data mover pod creation
if priorityClassName != "" {
logger.Debugf("Setting priority class %q for data mover pod %s", priorityClassName, pod.Name)
}
These validation and logging features will help administrators:
The ValidatePriorityClass function should be called at the following points:
During velero install: Validate the priority classes specified via CLI flags:
--server-priority-class-name flag--node-agent-priority-class-name flagWhen reading from ConfigMaps: Validate priority classes when loading configurations:
GetDataMoverPriorityClassName when reading from node-agent-configmapDuring pod/job creation (optional, for runtime validation):
Example usage:
// During velero install
if o.ServerPriorityClassName != "" {
_ = kube.ValidatePriorityClass(ctx, kubeClient, o.ServerPriorityClassName, logger.WithField("component", "server"))
// For install command, we continue even if validation fails (warnings are logged)
}
// When reading from ConfigMap in node-agent server
priorityClassName, err := kube.GetDataMoverPriorityClassName(ctx, namespace, kubeClient, configMapName)
if err == nil && priorityClassName != "" {
// Validate the priority class exists in the cluster
if kube.ValidatePriorityClass(ctx, kubeClient, priorityClassName, logger.WithField("component", "data-mover")) {
dataMovePriorityClass = priorityClassName
logger.WithField("priorityClassName", priorityClassName).Info("Using priority class for data mover pods")
} else {
logger.WithField("priorityClassName", priorityClassName).Warn("Priority class not found in cluster, data mover pods will use default priority")
// Clear the priority class to prevent pod creation failures
priorityClassName = ""
}
}
Note: The validation function returns a boolean to allow callers to decide how to handle missing priority classes. For the install command, validation failures are ignored (only warnings are logged) to allow for scenarios where priority classes might be created after Velero installation. For runtime components like the node-agent server, the priority class is cleared if validation fails to prevent pod creation failures.
Using a single flag for all components: We could have used a single flag for all components, but this would not allow for different priority classes for different components. Since maintenance jobs and data movers typically require lower priority than the Velero server, separate flags provide more flexibility.
Using a configuration file: We could have added support for specifying the priority class names in a configuration file. However, this would have required additional changes to the Velero CLI and would have been more complex to implement.
Inheriting priority class from parent components: We initially considered having maintenance jobs inherit their priority class from the Velero server, and data movers inherit from the node agent. However, this approach doesn't allow for the appropriate prioritization of different components based on their importance and resource requirements.
There are no security considerations for this feature.
This feature is compatible with all Kubernetes versions that support priority classes. The PodPriority feature became stable in Kubernetes 1.14. For more information, see the Kubernetes documentation on Pod Priority and Preemption.
The node-agent server reads and parses the ConfigMap once during initialization and passes configurations (like podResources, loadAffinity, and priorityClassName) directly to controllers as parameters. This approach ensures:
ConfigMap changes require a restart of the node-agent to take effect.
The implementation will involve the following steps:
VeleroOptions structpodTemplateConfig structWithPriorityClassName function for the server deployment and daemonsetDeployment function to use the server priority class nameDaemonSet function to use the node agent priority class nameJobConfigs struct to include PriorityClassName fieldbuildJob function in maintenance job to use the priority class name from JobConfigs (global config only)Configs struct in node agent to include PriorityClassName field for data mover podsGetDataMoverPriorityClassName utility function to retrieve priority class from configmapvelero install commandWithPriorityClassName functionGetDataMoverPriorityClassName function--server-priority-class-name and --node-agent-priority-class-name)Note: The server deployment and node agent daemonset will have CLI flags for priority class. Data mover pods and maintenance jobs will use their respective ConfigMaps for priority class configuration.
This approach ensures that different Velero components can use different priority class names based on their importance and resource requirements:
Priority class names are configured through different mechanisms:
Server Deployment: Uses the --server-priority-class-name CLI flag during installation.
Node Agent DaemonSet: Uses the --node-agent-priority-class-name CLI flag during installation.
Data Mover Pods: Will use the node-agent-configmap (specified via the --node-agent-configmap flag). This ConfigMap controls priority class for all data mover pods (including PVB and PVR) created by the node agent.
Maintenance Jobs: Will use the repository maintenance job ConfigMap (specified via the --repo-maintenance-job-configmap flag). Users should create this ConfigMap before running velero install with the desired priority class configuration. The ConfigMap can be updated after installation to change priority classes for future maintenance jobs. While the ConfigMap structure supports per-repository configuration for resources and affinity, priority class is intentionally only read from the global configuration to ensure all maintenance jobs have the same priority.
For components that use ConfigMaps for priority class configuration, the ConfigMaps must be created before running velero install. Here's the recommended workflow:
# Step 1: Create priority classes in your cluster (if not already existing)
kubectl apply -f - <<EOF
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: velero-critical
value: 100
globalDefault: false
description: "Critical priority for Velero server"
---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: velero-standard
value: 50
globalDefault: false
description: "Standard priority for Velero node agent"
---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: velero-low
value: 10
globalDefault: false
description: "Low priority for Velero data movers and maintenance jobs"
EOF
# Step 2: Create the namespace
kubectl create namespace velero
# Step 3: Create ConfigMaps for data movers and maintenance jobs
kubectl create configmap node-agent-config -n velero --from-file=config.json=/dev/stdin <<EOF
{
"priorityClassName": "velero-low"
}
EOF
kubectl create configmap repo-maintenance-job-config -n velero --from-file=config.json=/dev/stdin <<EOF
{
"global": {
"priorityClassName": "velero-low"
}
}
EOF
# Step 4: Install Velero with priority class configuration
velero install \
--provider aws \
--server-priority-class-name velero-critical \
--node-agent-priority-class-name velero-standard \
--node-agent-configmap node-agent-config \
--repo-maintenance-job-configmap repo-maintenance-job-config \
--use-node-agent
When configuring priority classes for Velero components, consider the following hierarchy based on component criticality:
Velero Server (Highest Priority):
velero-critical with value 100Node Agent DaemonSet (Medium Priority):
velero-standard with value 50Data Mover Pods & Maintenance Jobs (Lower Priority):
velero-low with value 10This hierarchy ensures that core Velero components remain operational even under resource pressure, while allowing less critical workloads to be preempted if necessary.
This approach has several advantages:
The priority class name for data mover pods will be determined by checking the node-agent-configmap. This approach provides a centralized way to configure priority class names for all data mover pods. The same approach will be used for PVB (PodVolumeBackup) and PVR (PodVolumeRestore) pods, which will also retrieve their priority class name from the node-agent-configmap.
For PVB and PVR pods specifically, the implementation follows this approach:
// In node-agent server startup (pkg/cmd/cli/nodeagent/server.go)
dataMovePriorityClass := ""
if s.config.nodeAgentConfig != "" {
ctx, cancel := context.WithTimeout(context.Background(), time.Second*30)
defer cancel()
priorityClass, err := kube.GetDataMoverPriorityClassName(ctx, s.namespace, s.kubeClient, s.config.nodeAgentConfig)
if err != nil {
s.logger.WithError(err).Warn("Failed to get priority class name from node-agent-configmap, using empty value")
} else if priorityClass != "" {
// Validate the priority class exists in the cluster
if kube.ValidatePriorityClass(ctx, s.kubeClient, priorityClass, s.logger.WithField("component", "data-mover")) {
dataMovePriorityClass = priorityClass
s.logger.WithField("priorityClassName", priorityClass).Info("Using priority class for data mover pods")
} else {
s.logger.WithField("priorityClassName", priorityClass).Warn("Priority class not found in cluster, data mover pods will use default priority")
}
}
}
// Pass priority class to controllers
pvbReconciler := controller.NewPodVolumeBackupReconciler(
s.mgr.GetClient(), s.mgr, s.kubeClient, ..., dataMovePriorityClass)
pvrReconciler := controller.NewPodVolumeRestoreReconciler(
s.mgr.GetClient(), s.mgr, s.kubeClient, ..., dataMovePriorityClass)
type PodVolumeBackupReconciler struct {
// ... existing fields ...
dataMovePriorityClass string
}
With the introduction of VGDP micro-services (as described in the VGDP micro-service design), data mover pods are created as dedicated pods for volume snapshot data movement. These pods will also inherit the priority class configuration from the node-agent-configmap. Since VGDP-MS pods (backupPod/restorePod) inherit their configurations from the node-agent, they will automatically use the priority class name specified in the node-agent-configmap.
This ensures that all pods created by Velero for data movement operations (CSI snapshot data movement, PVB, and PVR) use a consistent approach for priority class name configuration through the node-agent-configmap.
CSI Snapshot Exposer and Generic Restore Exposer do not directly watch or read ConfigMaps. Instead, they receive configuration through their parent controllers:
Controller Initialization: Controllers receive the priority class name as a parameter during initialization from the node-agent server.
Configuration Propagation: During reconciliation of resources:
setupExposeParam() which includes the dataMovePriorityClass valueCSISnapshotExposeParam.PriorityClassName is setGenericRestoreExposeParam.PriorityClassName is setExpose() methodPod Creation: The exposer creates pods with the priority class name provided by the controller.
This design keeps exposers stateless and ensures:
None.