Back to Kata Containers

Virtualization in Kata Containers

docs/design/virtualization.md

3.30.012.4 KB
Original Source

Virtualization in Kata Containers

Overview

Kata Containers creates a second layer of isolation on top of traditional namespace-based containers using hardware virtualization. Kata launches a lightweight virtual machine (VM) and uses the guest Linux kernel to create container workloads. In Kubernetes, the sandbox is implemented at the pod level using VMs.

This document describes:

  • How Kata Containers maps container technologies to virtualization technologies
  • The multiple hypervisors and Virtual Machine Monitors (VMMs) supported by Kata
  • Guidance for selecting the appropriate hypervisor for your use case

Architecture

A typical Kata Containers deployment integrates with Kubernetes through a Container Runtime Interface (CRI) implementation:

Kubelet → CRI (containerd/CRI-O) → Kata Containers (OCI runtime) → VM → Containers

The CRI API requires Kata to support the following constructs:

CRI ConstructVM EquivalentVirtualization Technology
Pod SandboxVMHypervisor/VMM
ContainerProcess in VMNamespace/Cgroup in guest
NetworkNetwork Interfacevirtio-net, vhost-net, physical, etc.
StorageBlock/File Devicevirtio-block, virtio-scsi, virtio-fs
ComputevCPU/MemoryKVM, ACPI hotplug

Mapping Container Concepts to Virtualization Technologies

Kata Containers implements the Kubernetes Container Runtime Interface (CRI) to provide pod and container lifecycle management. The CRI API defines abstractions that Kata must translate into virtualization primitives.

The mapping from CRI constructs to virtualization technologies follows a three-layer model:

CRI API Constructs → VM Abstractions → Para-virtualized Devices

Layer 1: CRI API Constructs

The CRI API (kubernetes/cri-api) defines the following abstractions that Kata must implement:

ConstructDescription
Pod SandboxIsolated execution environment for containers
ContainerProcess workload within a sandbox
NetworkPod and container networking interfaces
StorageVolume mounts and image storage
RuntimeConfigResource constraints (CPU, memory, cgroups)

Layer 2: VM Abstractions

Kata translates CRI constructs into VM-level concepts:

CRI ConstructVM Equivalent
Pod SandboxVirtual Machine
ContainerProcess/namespace in guest OS
NetworkVirtual NIC (vNIC)
StorageVirtual block device or filesystem
RuntimeConfigVM resources (vCPU, memory)

Layer 3: Para-virtualized Devices

VM abstractions are realized through para-virtualized drivers for optimal performance:

VM ConceptDevice Technology
vNICvirtio-net, vhost-net, macvtap
Block Storagevirtio-block, virtio-scsi
Shared Filesystemvirtio-fs
Agent Communicationvirtio-vsock
Device PassthroughVFIO with IOMMU

Note: Each hypervisor implements these mappings differently based on its device model and feature set. See the Hypervisor Details section for specific implementations.

Device Mapping

Container constructs map to para-virtualized devices:

ConstructDevice TypeTechnology
NetworkNetwork Interfacevirtio-net, vhost-net
Storage (ephemeral)Block Devicevirtio-block, virtio-scsi
Storage (shared)Filesystemvirtio-fs
CommunicationSocketvirtio-vsock
GPU/PassthroughPCI DeviceVFIO, IOMMU

Supported Hypervisors and VMMs

Kata Containers supports multiple hypervisors, each with different characteristics:

HypervisorLanguageArchitecturesType
QEMUCx86_64, aarch64, ppc64le, s390x, risc-vType 2 (KVM)
Cloud HypervisorRustx86_64, aarch64Type 2 (KVM)
FirecrackerRustx86_64, aarch64Type 2 (KVM)
DragonballRustx86_64, aarch64Type 2 (KVM) Built-in

Note: All supported hypervisors use KVM (Kernel-based Virtual Machine) as the underlying hardware virtualization interface on Linux.

Hypervisor Details

QEMU/KVM

QEMU is the most mature and feature-complete hypervisor option for Kata Containers.

Machine Types:

  • q35 (x86_64, default)
  • s390x (s390x)
  • virt (aarch64)
  • pseries (ppc64le)
  • risc-v (riscv64, experimental)

Devices and Features:

  • virtio-vsock (agent communication)
  • virtio-block or virtio-scsi (storage)
  • virtio-net/vhost-net/vhost-user-net (networking)
  • virtio-fs (shared filesystem, virtio-fs recommended)
  • VFIO (device passthrough)
  • CPU and memory hotplug
  • NVDIMM (x86_64, for rootfs as persistent memory)

Use Cases:

  • Production workloads requiring full CRI API compatibility
  • Scenarios requiring device passthrough (VFIO)
  • Multi-architecture deployments

Configuration: See configuration-qemu.toml

Dragonball (Built-in VMM)

Dragonball is a Rust-based VMM integrated directly into the Kata Containers Rust runtime as a library.

Advantages:

  • Zero IPC overhead: VMM runs in the same process as the runtime
  • Unified lifecycle: Simplified resource management and error handling
  • Optimized for containers: Purpose-built for container workloads
  • Upcall support: Direct VMM-to-Guest communication for efficient hotplug operations
  • Low resource overhead: Minimal CPU and memory footprint

Architecture:

┌─────────────────────────────────────────┐
│     Kata Containers Runtime (Rust)      │
│  ┌─────────────────────────────────┐    │
│  │      Dragonball VMM Library     │    │
│  └─────────────────────────────────┘    │
└─────────────────────────────────────────┘

Features:

  • Built-in virtio-fs/nydus support
  • Async I/O via Tokio
  • Single binary deployment
  • Optimized startup latency

Use Cases:

  • Default choice for most container workloads
  • High-density container deployments and low resource overhead scenarios
  • Scenarios requiring optimal startup performance

Configuration: See configuration-dragonball.toml

Cloud Hypervisor/KVM

Cloud Hypervisor is a Rust-based VMM designed for modern cloud workloads with a focus on performance and security.

Features:

  • CPU and memory resize
  • Device hotplug (disk, VFIO)
  • virtio-fs (shared filesystem)
  • virtio-pmem (persistent memory)
  • virtio-block (block storage)
  • virtio-vsock (agent communication)
  • Fine-grained seccomp filters per VMM thread
  • HTTP OpenAPI for management

Use Cases:

  • High-performance cloud-native workloads
  • Applications requiring memory/CPU resizing
  • Security-sensitive deployments (seccomp isolation)

Configuration: See configuration-clh-runtime-rs.toml

Firecracker/KVM

Firecracker is a minimalist VMM built on rust-vmm crates, optimized for serverless and FaaS workloads.

Devices:

  • virtio-vsock (agent communication)
  • virtio-block (block storage)
  • virtio-net (networking)

Limitations:

  • No filesystem sharing (virtio-fs not supported)
  • No device hotplug
  • No VFIO/passthrough support
  • No CPU/memory hotplug
  • Limited CRI API support

Use Cases:

  • Serverless/FaaS workloads
  • Single-tenant microVMs
  • Scenarios prioritizing minimal attack surface

Configuration: See configuration-fc.toml

Hypervisor Comparison Summary

FeatureQEMUCloud HypervisorFirecrackerDragonball
MaturityExcellentGoodGoodGood
CRI CompatibilityFullFullPartialFull
Filesystem Sharing
Device Hotplug
VFIO/Passthrough
CPU/Memory Hotplug
Security IsolationGoodExcellent (seccomp)ExcellentExcellent
Startup LatencyGoodExcellentExcellentBest
Resource OverheadMediumLowLowestLowest

Choosing a Hypervisor

Decision Matrix

RequirementRecommended Hypervisor
Full CRI API compatibilityQEMU, Cloud Hypervisor, Dragonball
Device passthrough (VFIO)QEMU, Cloud Hypervisor, Dragonball
Minimal resource overheadDragonball, Firecracker
Fastest startup timeDragonball, Firecracker
Serverless/FaaSDragonball, Firecracker
Production workloadsDragonball, QEMU
Memory/CPU resizingDragonball, Cloud Hypervisor, QEMU
Maximum security isolationCloud Hypervisor (seccomp), Firecracker, Dragonball
Multi-architectureQEMU

Recommendations

For Most Users: Use the default Dragonball VMM with the Kata Containers Rust runtime. It provides the best balance of performance, security, and container density.

For Device Passthrough: Use QEMU, Cloud Hypervisor, or Dragonball if you require VFIO device assignment.

For Serverless: Use Dragonball or Firecracker for ultra-lightweight, single-tenant microVMs.

For Legacy/Ecosystem Compatibility: Use QEMU for its extensive hardware emulation and multi-architecture support.

Hypervisor Configuration

Configuration Files

Each hypervisor has a dedicated configuration file:

HypervisorRust Runtime ConfigurationGo Runtime Configuration
QEMUconfiguration-qemu-runtime-rs.tomlconfiguration-qemu.toml
Cloud Hypervisorconfiguration-clh-runtime-rs.tomlconfiguration-clh.toml
Firecrackerconfiguration-rs-fc.tomlconfiguration-fc.toml
Dragonballconfiguration-dragonball.toml (default)No

Note: Configuration files are typically installed in /opt/kata/share/defaults/kata-containers/ or /opt/kata/share/defaults/kata-containers/runtime-rs/ or /usr/share/defaults/kata-containers/.

Switching Hypervisors

Use the kata-manager tool to switch the configured hypervisor:

bash
# List available hypervisors
$ kata-manager -L

# Switch to a different hypervisor
$ sudo kata-manager -S <hypervisor-name>

For detailed instructions, see the kata-manager documentation.

Hypervisor Versions

The following versions are used in this release (from versions.yaml):

HypervisorVersionRepository
Cloud Hypervisorv51.1https://github.com/cloud-hypervisor/cloud-hypervisor
Firecrackerv1.12.1https://github.com/firecracker-microvm/firecracker
QEMUv10.2.1https://github.com/qemu/qemu
Dragonballbuiltinhttps://github.com/kata-containers/kata-containers/tree/main/src/dragonball

Note: Dragonball is integrated into the Kata Containers Rust runtime and does not have a separate version number. For the latest hypervisor versions, see the versions.yaml file in the Kata Containers repository.

References