Back to Velero

Velero File System Backup Performance Guide

site/content/docs/main/performance-guidance.md

1.18.08.5 KB
Original Source

When using Velero to do file system backup & restore, Restic uploader or Kopia uploader are both supported now. But the resources used and time consumption are a big difference between them.

We've done series rounds of tests against Restic uploader and Kopia uploader through Velero, which may give you some guidance. But the test results will vary from different infrastructures, and our tests are limited and couldn't cover a variety of data scenarios, the test results and analysis are for reference only.

Infrastructure

Minio is used as Velero backend storage, Network File System (NFS) is used to create the persistent volumes (PVs) and Persistent Volume Claims (PVC) based on the storage. The minio and NFS server are deployed independently in different virtual machines (VM), which with 300 MB/s write throughput and 175 MB/s read throughput representatively.

The details of environmental information as below:

### KUBERNETES VERSION
root@velero-host-01:~# kubectl version
Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.4"
Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.14"

### DOCKER VERSION
root@velero-host-01:~# docker version
Client:
 Version:           20.10.12
 API version:       1.41

Server:
 Engine:
  Version:          20.10.12
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.16.2
 containerd:
  Version:          1.5.9-0ubuntu1~20.04.4
 runc:
  Version:          1.1.0-0ubuntu1~20.04.1
 docker-init:
  Version:          0.19.0

### NODES
root@velero-host-01:~# kubectl get nodes |wc -l 
6 // one master with 6 work nodes

### DISK INFO
root@velero-host-01:~# smartctl -a /dev/sda
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.0-126-generic] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               VMware
Product:              Virtual disk
Revision:             1.0
Logical block size:   512 bytes
Rotation Rate:        Solid State Device
Device type:          disk
### MEMORY INFO
root@velero-host-01:~# free -h
              total        used        free      shared  buff/cache   available
Mem:          3.8Gi       328Mi       3.1Gi       1.0Mi       469Mi       3.3Gi
Swap:            0B          0B          0B

### CPU INFO
root@velero-host-01:~# cat /proc/cpuinfo | grep name | cut -f2 -d: | uniq -c
      4  Intel(R) Xeon(R) Gold 6230R CPU @ 2.10GHz

### SYSTEM INFO
root@velero-host-01:~# cat /proc/version
root@velero-host-01:~# cat /proc/version
Linux version 5.4.0-126-generic (build@lcy02-amd64-072) (gcc version 9.4.0 (Ubuntu 9.4.0-1ubuntu1~20.04.1)) #142-Ubuntu SMP Fri Aug 26 12:12:57 UTC 2022

### VELERO VERSION
root@velero-host-01:~# velero version
Client:
	Version: main ###v1.10 pre-release version
	Git commit: 9b22ca6100646523876b18a491d881561b4dbcf3-dirty
Server:
	Version: main ###v1.10 pre-release version

Test

Below we've done 6 groups of tests, for each single group of test, we used limited resources (1 core CPU 2 GB memory or 4 cores CPU 4 GB memory) to do Velero file system backup under Restic path and Kopia path, and then compare the results.

Recorded the metrics of time consumption, maximum CPU usage, maximum memory usage, and minio storage usage for node-agent daemonset, and the metrics of Velero deployment are not included since the differences are not obvious by whether using Restic uploader or Kopia uploader.

Compression is either disabled or not unavailable for both uploader.

Case 1: 4194304(4M) files, 2396745(2M) directories, 0B per file total 0B content

result:

UploaderResourcesTimesMax CPUMax MemoryRepo Usage
Kopia1c2g24m54s65%1530 MB80 MB
Restic1c2g52m31s55%1708 MB3.3 GB
Kopia4c4g24m52s63%2216 MB80 MB
Restic4c4g52m28s54%2329 MB3.3 GB

conclusion:

  • The memory usage is larger than Velero's default memory limit (1GB) for both Kopia and Restic under massive empty files.
  • For both using Kopia uploader and Restic uploader, there is no significant time reduction by increasing resources from 1c2g to 4c4g.
  • Restic uploader is one more time slower than Kopia uploader under the same specification resources.
  • Restic has an irrational repository size (3.3GB)

Case 2: Using the same size (100B) of file and default Velero's resource configuration, the testing quantity of files from 20 thousand to 2 million, these groups of cases mainly test the behavior with the increasing quantity of files.

Case 2.1: 235298(23K) files, 137257 (10k)directories, 100B per file total 22.440MB content

result:

UploaderResourcesTimesMax CPUMax MemoryRepo Usage
Kopia1c1g2m34s70%692 MB108 MB
Restic1c1g3m9s54%714 MB275 MB

Case 2.2 470596(40k) files, 137257 (10k)directories, 100B per file total 44.880MB content

result:

UploaderResourcesTimesMax CPUMax MemoryRepo Usage
Kopia1c1g3m45s68%831 MB108 MB
Restic1c1g4m53s57%788 MB275 MB

Case 2.3 705894(70k) files, 137257(10k) directories, 100B per file total 67.319MB content

result:

UploaderResourcesTimesMax CPUMax MemoryRepo Usage
Kopia1c1g5m06s71%861 MB108 MB
Restic1c1g6m23s56%810 MB275 MB

Case 2.4 2097152(2M) files, 2396745(2M) directories, 100B per file total 200.000MB content

result:

UploaderResourcesTimesMax CPUMax MemoryRepo Usage
Kopia1c1gOOM74%N/AN/A
Restic1c1g41m47s52%904 MB3.2 GB

conclusion:

  • With the increasing number of files, there is no memory abnormal surge, the memory usage for both Kopia uploader and Restic uploader is linear increasing, until exceeds 1GB memory usage in Case 2.4 Kopia uploader OOM happened.
  • Kopia uploader gets increasingly faster along with the increasing number of files.
  • Restic uploader repository size is still much larger than Kopia uploader repository.

Case 3: 10625(10k) files, 781 directories, 1.000MB per file total 10.376GB content

result:

UploaderResourcesTimesMax CPUMax MemoryRepo Usage
Kopia1c2g1m37s75%251 MB10 GB
Restic1c2g5m25s100%153 MB10 GB
Kopia4c4g1m35s75%248 MB10 GB
Restic4c4g3m17s171%126 MB10 GB

conclusion:

  • This case involves a relatively large backup size, there is no significant time reduction by increasing resources from 1c2g to 4c4g for Kopia uploader, but for Restic uploader when increasing CPU from 1 core to 4, backup time-consuming was shortened by one-third, which means in this scenario should allocate more CPU resources for Restic uploader.
  • For the large backup size case, Restic uploader's repository size comes to normal

Case 4: 900 files, 1 directory, 1.000GB per file total 900.000GB content

result:

UploaderResourcesTimesMax CPUMax MemoryRepo Usage
Kopia1c2g2h30m100%714 MB900 GB
Restic1c2gTimeout100%416 MBN/A
Kopia4c4g1h42m138%786 MB900 GB
Restic4c4g2h15m351%606 MB900 GB

conclusion:

  • When the target backup data is relatively large, Restic uploader starts to Timeout under 1c2g. So it's better to allocate more memory for Restic uploader when backup large sizes of data.
  • For backup large amounts of data, Kopia uploader is both less time-consuming and less resource usage.

Summary

  • With the same specification resources, Kopia uploader is less time-consuming when backup.
  • Performance would be better if choosing Kopia uploader for the scenario in backup large mounts of data or massive small files.
  • It's better to set one reasonable resource configuration instead of the default depending on your scenario. For default resource configuration, it's easy to be timeout with Restic uploader in backup large amounts of data, and it's easy to be OOM for both Kopia uploader and Restic uploader in backup of massive small files.