ext/zlib-ng/arch/s390/README.md
This directory contains SystemZ deflate hardware acceleration support. It can be enabled using the following build commands:
$ ./configure --with-dfltcc-deflate --with-dfltcc-inflate
$ make
or
$ cmake -DWITH_DFLTCC_DEFLATE=1 -DWITH_DFLTCC_INFLATE=1 .
$ make
When built like this, zlib-ng would compress using hardware on level 1,
and using software on all other levels. Decompression will always happen
in hardware. In order to enable hardware compression for levels 1-6
(i.e. to make it used by default) one could add
-DDFLTCC_LEVEL_MASK=0x7e to CFLAGS when building zlib-ng.
SystemZ deflate hardware acceleration is available on IBM z15 and newer machines under the name "Integrated Accelerator for zEnterprise Data Compression". The programming interface to it is a machine instruction called DEFLATE CONVERSION CALL (DFLTCC). It is documented in Chapter 26 of Principles of Operation. Both the code and the rest of this document refer to this feature simply as "DFLTCC".
Performance figures are published here. The compression speed-up can be as high as 110x and the decompression speed-up can be as high as 15x.
Two DFLTCC compression calls with identical inputs are not guaranteed to
produce identical outputs. Therefore care should be taken when using
hardware compression when reproducible results are desired. In
particular, zlib-ng-specific zng_deflateSetParams call allows setting
Z_DEFLATE_REPRODUCIBLE parameter, which disables DFLTCC support for a
particular stream.
DFLTCC does not support every single zlib-ng feature, in particular:
inflate(Z_BLOCK) and inflate(Z_TREES)inflateMark()inflatePrime()inflateSyncPoint()When used, these functions will either switch to software, or, in case this is not possible, gracefully fail.
All SystemZ-specific code lives in arch/s390 directory and is
integrated with the rest of zlib-ng using hook macros.
DFLTCC takes as arguments a parameter block, an input buffer, an output
buffer and a window. ZALLOC_STATE(), ZFREE_STATE(), ZCOPY_STATE(),
ZALLOC_WINDOW() and TRY_FREE_WINDOW() macros encapsulate allocation
details for the parameter block (which is allocated alongside zlib-ng
state) and the window (which must be page-aligned).
While inflate software and hardware window formats match, this is not
the case for deflate. Therefore, deflateSetDictionary() and
deflateGetDictionary() need special handling, which is triggered using
DEFLATE_SET_DICTIONARY_HOOK() and DEFLATE_GET_DICTIONARY_HOOK()
macros.
deflateResetKeep() and inflateResetKeep() update the DFLTCC
parameter block using DEFLATE_RESET_KEEP_HOOK() and
INFLATE_RESET_KEEP_HOOK() macros.
INFLATE_PRIME_HOOK(), INFLATE_MARK_HOOK() and
INFLATE_SYNC_POINT_HOOK() macros make the respective unsupported
calls gracefully fail.
DEFLATE_PARAMS_HOOK() implements switching between hardware and
software compression mid-stream using deflateParams(). Switching
normally entails flushing the current block, which might not be possible
in low memory situations. deflateParams() uses DEFLATE_DONE() hook
in order to detect and gracefully handle such situations.
The algorithm implemented in hardware has different compression ratio
than the one implemented in software. DEFLATE_BOUND_ADJUST_COMPLEN()
and DEFLATE_NEED_CONSERVATIVE_BOUND() macros make deflateBound()
return the correct results for the hardware implementation.
Actual compression and decompression are handled by DEFLATE_HOOK() and
INFLATE_TYPEDO_HOOK() macros. Since inflation with DFLTCC manages the
window on its own, calling updatewindow() is suppressed using
INFLATE_NEED_UPDATEWINDOW() macro.
In addition to compression, DFLTCC computes CRC-32 and Adler-32
checksums, therefore, whenever it's used, software checksumming is
suppressed using DEFLATE_NEED_CHECKSUM() and INFLATE_NEED_CHECKSUM()
macros.
While software always produces reproducible compression results, this
is not the case for DFLTCC. Therefore, zlib-ng users are given the
ability to specify whether or not reproducible compression results
are required. While it is always possible to specify this setting
before the compression begins, it is not always possible to do so in
the middle of a deflate stream - the exact conditions for that are
determined by DEFLATE_CAN_SET_REPRODUCIBLE() macro.
When zlib-ng is built with DFLTCC, the hooks described above are
converted to calls to functions, which are implemented in
arch/s390/dfltcc_* files. The functions can be grouped in three broad
categories:
dfltcc() and allocating aligned memory - dfltcc_alloc_state().dfltcc_deflate_set_dictionary().dfltcc_deflate() and dfltcc_inflate().The functions from the first two categories are fairly simple, however, various quirks in both software and hardware state machines make the functions from the third category quite complicated.
dfltcc_deflate() functionThis function is called by deflate() and has the following
responsibilities:
0, making deflate() use some
other function in order to compress in software. Otherwise it returns
1.DFLTCC_FIRST_FHT_BLOCK_SIZE bytes are placed into a fixed
block, and every next DFLTCC_BLOCK_SIZE bytes are placed into
dynamic blocks.soft_bcc variable.deflate()
must perform various additional actions when a block or a stream ends.
dfltcc_deflate() informs deflate() about this using
block_state *result parameter.wrap and Check Value Type or bi_valid
and Sub-Byte Boundary. Certain fields cannot be translated and must
persist untouched in the parameter block between calls, for example,
Continuation Flag or Continuation State Buffer.send_eobs(), or implicitly - by returning to deflate()
with certain return and *result values, when Continuation Flag is
set.Z_FINISH, Block Header Final parameter block bit is used to mark
this block as final. However, sometimes an empty final block is
needed, and, unfortunately, just like with EOBS, DFLTCC will silently
refuse to do this. The general idea of DFLTCC implementation is to
rely as much as possible on the existing code. Here in order to do
this, the code pretends that it does not support DFLTCC, which makes
deflate() call a software compression function, which writes an
empty final block. Whether this is required is controlled by
need_empty_block variable.deflate() return code.dfltcc_inflate() functionThis function is called by inflate() from the TYPEDO state (that is,
when all the metadata is parsed and the stream is positioned at the type
bits of deflate block header) and it's responsible for the following:
Z_BLOCK or Z_TREES.
Unfortunately, there is no way to ask DFLTCC to stop decompressing on
block or tree boundary.inflate() decompression loop management. This is controlled using
the return value, which can be either DFLTCC_INFLATE_BREAK or
DFLTCC_INFLATE_CONTINUE.whave and History Length or wnext and
History Offset.inflate() to return Z_STREAM_END
and is controlled by last state field.inflate() by setting mode field to MEM or BAD.Given complexity of DFLTCC machine instruction, it is not clear whether QEMU TCG will ever support it. At the time of writing, one has to have access to an IBM z15+ VM or LPAR in order to test DFLTCC support. Since DFLTCC is a non-privileged instruction, neither special VM/LPAR configuration nor root are required.
zlib-ng CI uses an IBM-provided z15 self-hosted builder for the DFLTCC testing. There are no IBM Z builds of GitHub Actions runner, and stable qemu-user has problems with .NET apps, so the builder runs the x86_64 runner version with qemu-user built from the master branch.
$ sudo dnf install docker
$ sudo cp self-hosted-builder/*.service /etc/systemd/system/
$ sudo systemctl daemon-reload
$ sudo tee /etc/actions-runner
repo=<owner>/<name>
access_token=<ghp_***>
Access token should have the repo scope, consult https://docs.github.com/en/rest/reference/actions#create-a-registration-token-for-a-repository for details.
$ sudo systemctl enable --now qemu-user-static
$ sudo systemctl enable --now actions-runner
In order to update the iiilinuxibmcom/actions-runner image, e.g. to get the
latest OS security fixes, use the following commands:
$ sudo docker build \
--pull \
-f self-hosted-builder/actions-runner.Dockerfile \
-t iiilinuxibmcom/actions-runner
$ sudo systemctl restart actions-runner
The actions-runner service stores various temporary data, such as runner
registration information, work directories and logs, in the actions-runner
volume. In order to remove it and start from scratch, e.g. when switching the
runner to a different repository, use the following commands:
$ sudo systemctl stop actions-runner
$ sudo docker rm -f actions-runner
$ sudo docker volume rm actions-runner