src/plugins/intel_cpu/tools/dump_jit_disassm/README.md
This tool generates JIT dump disassembly (intel syntax) with file:line_no1 annotations showing the call stack of the C++ jit source code which generates the instruction. By reading the generated disassembly together with the JIT source code it helps developer to understand and debug JIT code.
python3 -m pip install argparse
python3 -m pip install colorama
-DENABLE_DEBUG_CAPS=ON -DCMAKE_BUILD_TYPE=Debug and install it.Launch OpenVINO CPU plugin through your application with following environment variable set:
export ONEDNN_JIT_DUMP=1
export ONEDNN_VERBOSE=2
export OV_CPU_DEBUG_LOG=-
You can find logs sequences of dump_debug_traces/register_jit_code/dump_jit_code as shown below:
[ oneDNN ] dump_debug_traces: dnnl_traces_cpu_jit_avx512_core_amx_compute_zp_pbuff_t.121.txt
[ oneDNN ] register_jit_code: /home/dev/tingqian/openvino/src/plugins/intel_cpu/thirdparty/onednn/src/cpu/x64/jit_avx512_core_amx_conv_kernel.hpp, jit_avx512_core_amx_compute_zp_pbuff_t
[ oneDNN ] dump_jit_code: dnnl_dump_cpu_jit_avx512_core_amx_compute_zp_pbuff_t.121.bin
[ DEBUG ] graph.cpp:856 CreatePrimitives() LOADTIME_createPrimitive tl_unet/outD4/Conv2D_1 jit_avx512_amx_I8 [+ 13177.633/88862.552 ms]
onednn_verbose,create:cache_miss,convolution,jit:avx512_core_amx_int8,forward_inference,src_u8::blocked:acdb:f0 wei_s8:p:blocked:ABcd16b16a4b:f8:zpm1 bia_f32::blocked:a:f0 dst_f32::blocked:acdb:f0,attr-zero-points:src0:0:167 attr-post-ops:depthwise_scale_shift+eltwise_tanh+eltwise_linear:296.41:227+eltwise_round_half_to_even+eltwise_clip:-0:255+eltwise_linear:0.964648:-218.975+sum:1:0:f32+eltwise_linear:0.581796:104+eltwise_round_half_to_even+eltwise_clip:0:255+eltwise_linear:0.00601129:-0.625175 ,alg:convolution_direct,mb1_ic96oc3_ih128oh128kh3sh1dh0ph1_iw128ow128kw3sw1dw0pw1,0.685791
[ oneDNN ] dump_debug_traces shows us the name of the file into which offsets & backtraces are dumpped.[ oneDNN ] register_jit_code shows us the corresponding jit source code.[ oneDNN ] dump_jit_code shows us the name of the file into which jit generated binary code is dumpped.[ DEBUG ] shows us for which layer are these dumps generated.onednn_verbose,create shows us the full description of the primitive generating the jit kernel and the dumps above.If we want to explorer the JIT code dumpped here, use following command:
python ~/openvino/src/plugins/intel_cpu/tools/dump_jit_disassm/ dnnl_traces_cpu_jit_avx512_core_amx_compute_zp_pbuff_t.121.txt dnnl_dump_cpu_jit_avx512_core_amx_compute_zp_pbuff_t.121.bin
This tool will extract line number debug information using the well-known linux command addr2line and disassemble the JIT binary dump using another well-known command objdump, so make sure they are correctly installed in your system.
the final output looks like this:
0000000000000000 <.data>:
0: 53 push rbx jit_avx512_core_amx_1x1_conv_kernel.cpp:834
1: 55 push rbp jit_avx512_core_amx_1x1_conv_kernel.cpp:834
2: 41 54 push r12 jit_avx512_core_amx_1x1_conv_kernel.cpp:834
4: 41 55 push r13 jit_avx512_core_amx_1x1_conv_kernel.cpp:834
6: 41 56 push r14 jit_avx512_core_amx_1x1_conv_kernel.cpp:834
8: 41 57 push r15 jit_avx512_core_amx_1x1_conv_kernel.cpp:834
a: bd 00 04 00 00 mov ebp,0x400
f: 48 83 ec 08 sub rsp,0x8
13: 4c 8b bf 88 00 00 00 mov r15,QWORD PTR [rdi+0x88] jit_uni_postops_injector.cpp:387 / jit_avx512_core_amx_1x1_conv_kernel.cpp:837
1a: 4d 8b 37 mov r14,QWORD PTR [r15] jit_uni_postops_injector.cpp:389 / jit_avx512_core_amx_1x1_conv_kernel.cpp:837
1d: 4c 89 34 24 mov QWORD PTR [rsp],r14 jit_uni_postops_injector.cpp:387 / jit_avx512_core_amx_1x1_conv_kernel.cpp:837
Please note that the line number showed is actually derived from return address of each function in call stack, thus it's the line of code right next to the JIT source that generated the instruction. User should focus on previous valid line of source code for exact mapping.
In VSCode, if the final output is displayed in TERMINAL, you can click the file:line_no while holdding Ctrl key to directly navigate to coresponding source code.
llvm-addr2line (llvm-symbolizer) may work significantly faster than default addr2line, consider using addr2line tool from LLVM toolchain if applicable. Customization is available using --addr2line=<path-to-addr2line-tool> flag.