docs/how-to/how-to-pull-images-in-guest-with-kata.md
Kata Containers 3.3.0 introduces the guest image management feature, which enables the guest VM to directly pull images using nydus snapshotter. This feature is designed to protect the integrity of container images and guard against any tampering by the host, which is used for confidential containers. Please refer to kata-guest-image-management-design for details.
yq is installed in the host and it's directory is included in the PATH environment variable. (optional, for DaemonSet only)nydus snapshotter for guest image managementTo pull images in the guest, we need to do the following steps:
nydus snapshotter:
nydus snapshotter by k8s DaemonSet (recommended)nydus snapshotter manuallyThough the CRI Runtime Specific Snapshotter is still an experimental feature in containerd, which containerd is not supported to manage the same image in different snapshotters(The default snapshotter in containerd is overlayfs). To avoid errors caused by this, it is recommended to delete images (including the pause image) in containerd that needs to be pulled in guest later before configuring nydus snapshotter in containerd.
nydus snapshotternydus snapshotter by k8s DaemonSet (recommended)To use DaemonSet to install nydus snapshotter, we need to ensure that yq exists in the host.
nydus snapshotter repo$ nydus_snapshotter_install_dir="/tmp/nydus-snapshotter"
$ nydus_snapshotter_url=https://github.com/containerd/nydus-snapshotter
$ nydus_snapshotter_version="v0.13.11"
$ git clone -b "${nydus_snapshotter_version}" "${nydus_snapshotter_url}" "${nydus_snapshotter_install_dir}"
$ pushd "$nydus_snapshotter_install_dir"
$ yq -i \
> '.data.FS_DRIVER = "proxy"' -P \
> misc/snapshotter/base/nydus-snapshotter.yaml
# Disable to read snapshotter config from configmap
$ yq -i \
> 'data.ENABLE_CONFIG_FROM_VOLUME = "false"' -P \
> misc/snapshotter/base/nydus-snapshotter.yaml
# Enable to run snapshotter as a systemd service
# (skip if you want to run nydus snapshotter as a standalone process)
$ yq -i \
> 'data.ENABLE_SYSTEMD_SERVICE = "true"' -P \
> misc/snapshotter/base/nydus-snapshotter.yaml
# Enable "runtime specific snapshotter" feature in containerd when configuring containerd for snapshotter
# (skip if you want to configure nydus snapshotter as a global snapshotter in containerd)
$ yq -i \
> 'data.ENABLE_RUNTIME_SPECIFIC_SNAPSHOTTER = "true"' -P \
> misc/snapshotter/base/nydus-snapshotter.yaml
nydus snapshotter as a DaemonSet$ kubectl create -f "misc/snapshotter/nydus-snapshotter-rbac.yaml"
$ kubectl apply -f "misc/snapshotter/base/nydus-snapshotter.yaml"
$ kubectl rollout status DaemonSet nydus-snapshotter -n nydus-system --timeout 5m
nydus snapshotter is running as a DaemonSet$ pods_name=$(kubectl get pods --selector=app=nydus-snapshotter -n nydus-system -o=jsonpath='{.items[*].metadata.name}')
$ kubectl logs "${pods_name}" -n nydus-system
deploying snapshotter
install nydus snapshotter artifacts
configuring snapshotter
Not found nydus proxy plugin!
running snapshotter as systemd service
Created symlink /etc/systemd/system/multi-user.target.wants/nydus-snapshotter.service → /etc/systemd/system/nydus-snapshotter.service.
nydus snapshotter manuallynydus snapshotter binary from release$ ARCH=$(uname -m)
$ golang_arch=$(case "$ARCH" in
aarch64) echo "arm64" ;;
ppc64le) echo "ppc64le" ;;
x86_64) echo "amd64" ;;
s390x) echo "s390x" ;;
esac)
$ release_tarball="nydus-snapshotter-${nydus_snapshotter_version}-linux-${golang_arch}.tar.gz"
$ curl -OL ${nydus_snapshotter_url}/releases/download/${nydus_snapshotter_version}/${release_tarball}
$ sudo tar -xfz ${release_tarball} -C /usr/local/bin --strip-components=1
nydus snapshotter configuration file for pulling images in the guest$ curl -OL https://github.com/containerd/nydus-snapshotter/blob/main/misc/snapshotter/config-proxy.toml
$ sudo install -D -m 644 config-proxy.toml /etc/nydus/config-proxy.toml
nydus snapshotter as a standalone process$ /usr/local/bin/containerd-nydus-grpc --config /etc/nydus/config-proxy.toml --log-to-stdout
level=info msg="Start nydus-snapshotter. Version: v0.13.11-308-g106a6cb, PID: 1100169, FsDriver: proxy, DaemonMode: none"
level=info msg="Run daemons monitor..."
nydus snapshotterConfigure nydus snapshotter to enable CRI Runtime Specific Snapshotter in containerd. This ensures run kata containers with nydus snapshotter. Below, the steps are illustrated using kata-qemu as an example.
# Modify containerd configuration to ensure that the following lines appear in the containerd configuration
# (Assume that the containerd config is located in /etc/containerd/config.toml)
[plugins."io.containerd.grpc.v1.cri".containerd]
disable_snapshot_annotations = false
discard_unpacked_layers = false
[proxy_plugins.nydus]
type = "snapshot"
address = "/run/containerd-nydus/containerd-nydus-grpc.sock"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata-qemu]
snapshotter = "nydus"
Notes: The
CRI Runtime Specific Snapshotterfeature only works for containerd v1.7.0 and above. So for Containerd v1.7.0 below, in addition to the above settings, we need to set the globalsnapshottertonydusin containerd config. For example:
[plugins."io.containerd.grpc.v1.cri".containerd]
snapshotter = "nydus"
$ sudo systemctl restart containerd
To verify pulling images in a guest VM, please refer to the following commands:
$ cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: busybox
spec:
runtimeClassName: kata-qemu
containers:
- name: busybox
image: quay.io/prometheus/busybox:latest
imagePullPolicy: Always
EOF
pod/busybox created
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
busybox 1/1 Running 0 10s
nydus snapshotter, the root filesystems required for the pod's images (including the pause image and the container image) should not be present on the host.$ sandbox_id=$(ps -ef| grep containerd-shim-kata-v2| grep -oP '(?<=-id\s)[a-f0-9]+'| tail -1)
$ rootfs_count=$(find /run/kata-containers/shared/sandboxes/$sandbox_id -name rootfs -type d| grep -o "rootfs" | wc -l)
$ echo $rootfs_count
0
Currently, the image pulled in the guest will be downloaded and unpacked in the /run/kata-containers/image directory. However, by default, in rootfs-confidential image, systemd allocates 50% of the available physical RAM to the /run directory using a tmpfs filesystem. As we all know, memory is valuable, especially for confidential containers. This means that if we run a kata container with the default configuration (where the default memory assigned for a VM is 2048 MiB), /run would be allocated around 1024 MiB. Consequently, we can only pull images up to 1024 MiB in the guest. So we can use a block volume from the host and use dm-crypt and dm-integrity to encrypt the block volume in the guest, providing a secure place to store downloaded container images.
There are a lot of CSI Plugins that support block volumes: AWS EBS, Azure Disk, Open-Local and so on. But as an example, we use Local Persistent Volumes to use local disks as block storage with k8s cluster.
/dev/loop0$ loop_file="/tmp/trusted-image-storage.img"
$ sudo dd if=/dev/zero of=$loop_file bs=1M count=2500
$ sudo losetup /dev/loop0 $loop_file
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: local-storage
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer
apiVersion: v1
kind: PersistentVolume
metadata:
name: trusted-block-pv
spec:
capacity:
storage: 10Gi
volumeMode: Block
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
storageClassName: local-storage
local:
path: /dev/loop0
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- NODE_NAME
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: trusted-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
volumeMode: Block
storageClassName: local-storage
apiVersion: v1
kind: Pod
metadata:
name: large-image-pod
spec:
runtimeClassName: kata-qemu
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- NODE_NAME
volumes:
- name: trusted-image-storage
persistentVolumeClaim:
claimName: trusted-pvc
containers:
- name: app-container
image: quay.io/confidential-containers/test-images:largeimage
command: ["/bin/sh", "-c"]
args:
- sleep 6000
volumeDevices:
- devicePath: /dev/trusted_store
name: trusted-image-storage
docker image ls|grep "largeimage"
quay.io/confidential-containers/test-images largeimage 00bc1f6c893a 4 months ago 2.15GB
$ lsblk --fs
NAME FSTYPE LABEL UUID FSAVAIL FSUSE% MOUNTPOINT
sda
└─encrypted_disk_GsLDt
178M 87% /run/kata-containers/image
$ cryptsetup status encrypted_disk_GsLDt
/dev/mapper/encrypted_disk_GsLDt is active and is in use.
type: LUKS2
cipher: aes-xts-plain64
keysize: 512 bits
key location: keyring
device: /dev/sda
sector size: 4096
offset: 32768 sectors
size: 5087232 sectors
mode: read/write
$ mount|grep "encrypted_disk_GsLDt"
/dev/mapper/encrypted_disk_GsLDt on /run/kata-containers/image type ext4
$ du -h --max-depth=1 /run/kata-containers/image/
16K /run/kata-containers/image/lost+found
2.1G /run/kata-containers/image/layers
60K /run/kata-containers/image/overlay
2.1G /run/kata-containers/image/
$ free -m
total used free shared buff/cache available
Mem: 1989 52 43 0 1893 1904
Swap: 0 0 0