Documentation/admin-guide/device-mapper/dm-integrity.rst
The dm-integrity target emulates a block device that has additional per-sector tags that can be used for storing integrity information.
A general problem with storing integrity tags with every sector is that writing the sector and the integrity tag must be atomic - i.e. in case of crash, either both sector and integrity tag or none of them is written.
To guarantee write atomicity, the dm-integrity target uses journal, it writes sector data and integrity tags into a journal, commits the journal and then copies the data and integrity tags to their respective location.
The dm-integrity target can be used with the dm-crypt target - in this situation the dm-crypt target creates the integrity data and passes them to the dm-integrity target via bio_integrity_payload attached to the bio. In this mode, the dm-crypt and dm-integrity targets provide authenticated disk encryption - if the attacker modifies the encrypted device, an I/O error is returned instead of random data.
The dm-integrity target can also be used as a standalone target, in this mode it calculates and verifies the integrity tag internally. In this mode, the dm-integrity target can be used to detect silent data corruption on the disk or in the I/O path.
There's an alternate mode of operation where dm-integrity uses a bitmap instead of a journal. If a bit in the bitmap is 1, the corresponding region's data and integrity tags are not synchronized - if the machine crashes, the unsynchronized regions will be recalculated. The bitmap mode is faster than the journal mode, because we don't have to write the data twice, but it is also less reliable, because if data corruption happens when the machine crashes, it may not be detected.
When loading the target for the first time, the kernel driver will format the device. But it will only format the device if the superblock contains zeroes. If the superblock is neither valid nor zeroed, the dm-integrity target can't be loaded.
Accesses to the on-disk metadata area containing checksums (aka tags) are buffered using dm-bufio. When an access to any given metadata area occurs, each unique metadata area gets its own buffer(s). The buffer size is capped at the size of the metadata area, but may be smaller, thereby requiring multiple buffers to represent the full metadata area. A smaller buffer size will produce a smaller resulting read/write operation to the metadata area for small reads/writes. The metadata is still read even in a full write to the data covered by a single buffer.
To use the target for the first time:
Target arguments:
the underlying block device
the number of reserved sector at the beginning of the device - the dm-integrity won't read of write these sectors
the size of the integrity tag (if "-" is used, the size is taken from the internal-hash algorithm)
mode:
D - direct writes (without journal) in this mode, journaling is not used and data sectors and integrity tags are written separately. In case of crash, it is possible that the data and integrity tag doesn't match. J - journaled writes data and integrity tags are written to the journal and atomicity is guaranteed. In case of crash, either both data and tag or none of them are written. The journaled mode degrades write throughput twice because the data have to be written twice. B - bitmap mode - data and metadata are written without any synchronization, the driver maintains a bitmap of dirty regions where data and metadata don't match. This mode can only be used with internal hash. R - recovery mode - in this mode, journal is not replayed, checksums are not checked and writes to the device are not allowed. This mode is useful for data recovery if the device cannot be activated in any of the other standard modes.
the number of additional arguments
Additional arguments:
journal_sectors:number The size of journal, this argument is used only if formatting the device. If the device is already formatted, the value from the superblock is used.
interleave_sectors:number (default 32768) The number of interleaved sectors. This values is rounded down to a power of two. If the device is already formatted, the value from the superblock is used.
meta_device:device Don't interleave the data and metadata on the device. Use a separate device for metadata.
buffer_sectors:number (default 128) The number of sectors in one metadata buffer. The value is rounded down to a power of two.
journal_watermark:number (default 50) The journal watermark in percents. When the size of the journal exceeds this watermark, the thread that flushes the journal will be started.
commit_time:number (default 10000) Commit time in milliseconds. When this time passes, the journal is written. The journal is also written immediately if the FLUSH request is received.
internal_hash:algorithm(:key) (the key is optional) Use internal hash or crc. When this argument is used, the dm-integrity target won't accept integrity tags from the upper target, but it will automatically generate and verify the integrity tags.
You can use a crc algorithm (such as crc32), then integrity target
will protect the data against accidental corruption.
You can also use a hmac algorithm (for example
"hmac(sha256):0123456789abcdef"), in this mode it will provide
cryptographic authentication of the data without encryption.
When this argument is not used, the integrity tags are accepted
from an upper layer target, such as dm-crypt. The upper layer
target should check the validity of the integrity tags.
recalculate Recalculate the integrity tags automatically. It is only valid when using internal hash.
journal_crypt:algorithm(:key) (the key is optional) Encrypt the journal using given algorithm to make sure that the attacker can't read the journal. You can use a block cipher here (such as "cbc(aes)") or a stream cipher (for example "chacha20" or "ctr(aes)").
The journal contains history of last writes to the block device,
an attacker reading the journal could see the last sector numbers
that were written. From the sector numbers, the attacker can infer
the size of files that were written. To protect against this
situation, you can encrypt the journal.
journal_mac:algorithm(:key) (the key is optional) Protect sector numbers in the journal from accidental or malicious modification. To protect against accidental modification, use a crc algorithm, to protect against malicious modification, use a hmac algorithm with a key.
This option is not needed when using internal-hash because in this
mode, the integrity of journal entries is checked when replaying
the journal. Thus, modified sector number would be detected at
this stage.
block_size:number (default 512) The size of a data block in bytes. The larger the block size the less overhead there is for per-block integrity metadata. Supported values are 512, 1024, 2048 and 4096 bytes.
sectors_per_bit:number In the bitmap mode, this parameter specifies the number of 512-byte sectors that corresponds to one bitmap bit.
bitmap_flush_interval:number The bitmap flush interval in milliseconds. The metadata buffers are synchronized when this interval expires.
allow_discards Allow block discard requests (a.k.a. TRIM) for the integrity device. Discards are only allowed to devices using internal hash.
fix_padding Use a smaller padding of the tag area that is more space-efficient. If this option is not present, large padding is used - that is for compatibility with older kernels.
fix_hmac Improve security of internal_hash and journal_mac:
- the section number is mixed to the mac, so that an attacker can't
copy sectors from one journal section to another journal section
- the superblock is protected by journal_mac
- a 16-byte salt stored in the superblock is mixed to the mac, so
that the attacker can't detect that two disks have the same hmac
key and also to disallow the attacker to move sectors from one
disk to another
legacy_recalculate Allow recalculating of volumes with HMAC keys. This is disabled by default for security reasons - an attacker could modify the volume, set recalc_sector to zero, and the kernel would not detect the modification.
The journal mode (D/J), buffer_sectors, journal_watermark, commit_time and allow_discards can be changed when reloading the target (load an inactive table and swap the tables with suspend and resume). The other arguments should not be changed when reloading the target because the layout of disk data depend on them and the reloaded target would be non-functional.
For example, on a device using the default interleave_sectors of 32768, a block_size of 512, and an internal_hash of crc32c with a tag size of 4 bytes, it will take 128 KiB of tags to track a full data area, requiring 256 sectors of metadata per data area. With the default buffer_sectors of 128, that means there will be 2 buffers per metadata area, or 2 buffers per 16 MiB of data.
Status line:
The layout of the formatted block device:
reserved sectors (they are not used by this target, they can be used for storing LUKS metadata or for other purpose), the size of the reserved area is specified in the target arguments
superblock (4kiB)
journal The journal is divided into sections, each section contains:
metadata area (4kiB), it contains journal entries
every journal entry contains:
every metadata sector ends with
data area (the size is variable; it depends on how many journal entries fit into the metadata area)
To test if the whole journal section was written correctly, every 512-byte sector of the journal ends with 8-byte commit id. If the commit id matches on all sectors in a journal section, then it is assumed that the section was written correctly. If the commit id doesn't match, the section was written partially and it should not be replayed.
one or more runs of interleaved tags and data. Each run contains: