Back to Llama Cpp

llama.cpp for IBM zDNN Accelerator

docs/backend/zDNN.md

latest2.2 KB
Original Source

llama.cpp for IBM zDNN Accelerator

[!WARNING] Note: zDNN is not the same as ZenDNN.

  • zDNN (this page): IBM's Deep Neural Network acceleration library for IBM Z & LinuxONE Mainframes
  • ZenDNN: AMD's deep learning library for AMD EPYC CPUs (see ZenDNN documentation)

Background

IBM zDNN (Z Deep Neural Network) is a hardware acceleration library designed specifically to leverage the IBM NNPA (Neural Network Processor Assist) accelerator located within IBM Telum I and II processors. It provides significant performance improvements for neural network inference operations.

Llama.cpp + IBM zDNN

The llama.cpp zDNN backend is designed to enable llama.cpp on IBM z17 and later systems via the IBM zDNN hardware acceleration library.

Software & Hardware Support

Hardware LevelStatusVerified
IBM z17 / LinuxONE 5SupportedRHEL 9.6, IBM z17, 40 IFLs
IBM z16 / LinuxONE 4Not Supported

Data Types Supported

Data TypeStatus
F32Supported
F16Supported
BF16Supported

CMake Options

The IBM zDNN backend has the following CMake options that control the behaviour of the backend.

CMake OptionDefault ValueDescription
GGML_ZDNNOFFCompile llama.cpp with zDNN support
ZDNN_ROOT""Override zDNN library lookup

1. Install zDNN Library

Note: Using the zDNN library provided via apt or yum may not work correctly as reported in #15772. It is preferred that you compile from source.

sh
git clone --recurse-submodules https://github.com/IBM/zDNN
cd zDNN

autoreconf .
./configure --prefix=/opt/zdnn-libs

make build
sudo make install

2. Build llama.cpp

sh
git clone https://github.com/ggml-org/llama.cpp
cd llama.cpp

cmake -S . -G Ninja -B build \
    -DCMAKE_BUILD_TYPE=Release \
    -DGGML_ZDNN=ON \
    -DZDNN_ROOT=/opt/zdnn-libs
cmake --build build --config Release -j$(nproc)