docs/devel/vfio-iommufd.rst
(Same meaning for backend/container/BE)
With the introduction of iommufd, the Linux kernel provides a generic interface for user space drivers to propagate their DMA mappings to kernel for assigned devices. While the legacy kernel interface is group-centric, the new iommufd interface is device-centric, relying on device fd and iommufd.
To support both interfaces in the QEMU VFIO device, introduce a base container to abstract the common part of VFIO legacy and iommufd container. So that the generic VFIO code can use either container.
The base container implements generic functions such as memory_listener and address space management whereas the derived container implements callbacks specific to either legacy or iommufd. Each container has its own way to setup secure context and dma management interface. The below diagram shows how it looks like with both containers.
::
VFIO AddressSpace/Memory
+-------+ +----------+ +-----+ +-----+
| pci | | platform | | ap | | ccw |
+---+---+ +----+-----+ +--+--+ +--+--+ +----------------------+
| | | | | AddressSpace |
| | | | +------------+---------+
+---V-----------V-----------V--------V----+ /
| VFIOAddressSpace | <------------+
| | | MemoryListener
| VFIOContainerBase list |
+-------+----------------------------+----+
| |
| |
+-------V------+ +--------V----------+
| iommufd | | vfio legacy |
| container | | container |
+-------+------+ +--------+----------+
| |
| /dev/iommu | /dev/vfio/vfio
| /dev/vfio/devices/vfioX | /dev/vfio/$group_id
Userspace | | ============+============================+=========================== Kernel | device fd | +---------------+ | group/container fd | (BIND_IOMMUFD | | (SET_CONTAINER/SET_IOMMU) | ATTACH_IOAS) | | device fd | | | | +-------V------------V-----------------+ iommufd | | vfio | (map/unmap | +---------+--------------------+-------+ ioas_copy) | | | map/unmap | | | +------V------+ +-----V------+ +------V--------+ | iommfd core | | device | | vfio iommu | +-------------+ +------------+ +---------------+
Secure Context setup
Device access
/dev/vfio/devices/vfioXDMA Mapping flow
It's exactly same as the VFIO device with legacy VFIO container.
Interactions with the /dev/iommu are abstracted by a new iommufd
object (compiled in with the CONFIG_IOMMUFD option).
Any QEMU device (e.g. VFIO device) wishing to use /dev/iommu must
be linked with an iommufd object. It gets a new optional property
named iommufd which allows to pass an iommufd object. Take vfio-pci
device for example:
.. code-block:: bash
-object iommufd,id=iommufd0
-device vfio-pci,host=0000:02:00.0,iommufd=iommufd0
Note the /dev/iommu and VFIO cdev can be externally opened by a
management layer. In such a case the fd is passed, the fd supports a
string naming the fd or a number, for example:
.. code-block:: bash
-object iommufd,id=iommufd0,fd=22
-device vfio-pci,iommufd=iommufd0,fd=23
If the fd property is not passed, the fd is opened by QEMU.
If no iommufd object is passed to the vfio-pci device, iommufd
is not used and the user gets the behavior based on the legacy VFIO
container:
.. code-block:: bash
-device vfio-pci,host=0000:02:00.0
Supports x86, Arm and s390x currently.
PCI p2p DMA is unsupported as IOMMUFD doesn't support mapping hardware PCI BAR region yet. Below warning shows for assigned PCI device, it's not a bug.
.. code-block:: none
qemu-system-x86_64: warning: IOMMU_IOAS_MAP failed: Bad address, PCI BAR?
qemu-system-x86_64: vfio_container_dma_map(0x560cb6cb1620, 0xe000000021000, 0x3000, 0x7f32ed55c000) = -14 (Bad address)
vfio-pci device checks sysfsdev property to decide if backend is a mdev.
If FD passing is used, there is no way to know that and the mdev is treated
like a real PCI device. There is an error as below if user wants to enable
RAM discarding for mdev.
.. code-block:: none
qemu-system-x86_64: -device vfio-pci,iommufd=iommufd0,x-balloon-allowed=on,fd=9: vfio VFIO_FD9: x-balloon-allowed only potentially compatible with mdev devices
vfio-ap and vfio-ccw devices don't have same issue as their backend
devices are always mdev and RAM discarding is force enabled.
Only IOMMUFD backed VFIO device is supported when intel_iommu is configured with x-flts=on, for legacy container backed VFIO device, below error shows:
.. code-block:: none
qemu-system-x86_64: -device vfio-pci,host=0000:02:00.0: vfio 0000:02:00.0: Failed to set vIOMMU: Need IOMMUFD backend when x-flts=on
VFIO device under PCI bridge is unsupported, use PCIE bridge if necessary, otherwise below error shows:
.. code-block:: none
qemu-system-x86_64: -device vfio-pci,host=0000:02:00.0,bus=bridge1,iommufd=iommufd0: vfio 0000:02:00.0: Failed to set vIOMMU: Host device downstream to a PCI bridge is unsupported when x-flts=on
If host IOMMU has ERRATA_772415_SPR17, running guest with "intel_iommu=on,sm_off" is unsupported, kexec or reboot guest from "intel_iommu=on,sm_on" to "intel_iommu=on,sm_off" is also unsupported. Configure scalable mode off as below if it's not needed by guest:
.. code-block:: bash
-device intel-iommu,x-scalable-mode=off