docs/tutorial_one_liners.md
This teaches you bpftrace for Linux in 12 easy lessons, where each lesson is a one-liner you can try running. This series of one-liners introduces concepts which are summarized as bullet points. Read more about the language, the standard library, and CLI options.
Contributed by Brendan Gregg, Netflix (2018), based on his FreeBSD DTrace Tutorial.
Note: bpftrace 0.19 changed the way probe arguments are accessed (using
args.xxx instead of args->xxx). If you are using an older version of
bpftrace, you will need to use args->xxx in the below examples.
bpftrace -l 'tracepoint:syscalls:sys_enter_*'
"bpftrace -l" lists all probes, and a search term can be added.
* and ?)bpftrace -e 'BEGIN { printf("hello world\n"); }'
Expected Output:
Attached 1 probe
hello world
^C
This prints a welcome message. Run it, then hit Ctrl-C to end.
BEGIN is a special probe that fires at the start of the program (like awk's BEGIN). You can use it to set variables and print headers.bpftrace -e 'tracepoint:syscalls:sys_enter_openat { printf("%s %s\n", comm, str(args.filename)); }'
Expected Output:
Attached 1 probe
snmp-pass /proc/cpuinfo
snmp-pass /proc/stat
snmpd /proc/net/dev
snmpd /proc/net/if_inet6
^C
This traces file opens as they happen, and we're printing the process name and pathname.
tracepoint:syscalls:sys_enter_openat: this is the tracepoint probe type (kernel static tracing), and is instrumenting when the openat() syscall begins (is entered). Tracepoints are preferred over kprobes (kernel dynamic tracing, introduced in lesson 6), since tracepoints have stable API. Note: In modern Linux systems (glibc >= 2.26) the open wrapper always calls the openat syscall.comm is a builtin variable that has the current process's name. Other similar builtins include pid and tid.args is a struct containing all the tracepoint arguments. This
struct is automatically generated by bpftrace based tracepoint information. The
members of this struct can be found with: bpftrace -vl tracepoint:syscalls:sys_enter_openat.args.filename accesses the args struct and gets the value of the
filename member.str() turns a pointer into the string it points to.bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @[comm] = count(); }'
Expected Output:
Attached 1 probe
^C
@[bpftrace]: 6
@[systemd]: 24
@[snmp-pass]: 96
@[sshd]: 125
This summarizes syscalls by process name, printing a report on Ctrl-C.
Maps are automatically printed when bpftrace ends (eg, via Ctrl-C).
bpftrace -e 'tracepoint:syscalls:sys_exit_read /pid == 18644/ { @bytes = hist(args.ret); }'
Expected Output:
Attached 1 probe
^C
@bytes:
[0, 1] 12 |@@@@@@@@@@@@@@@@@@@@ |
[2, 4) 18 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
[4, 8) 0 | |
[8, 16) 0 | |
[16, 32) 0 | |
[32, 64) 30 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[64, 128) 19 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
[128, 256) 1 |@
This summarizes the return value of the sys_read() kernel function for PID 18644, printing it as a histogram.
[128, 256) means that the value is: 128 <= value < 256. The next number is the count of occurrences, and then an ASCII histogram is printed to visualize that count. The histogram can be used to study multi-modal distributions.bpftrace -e 'kretprobe:vfs_read { @bytes = lhist(retval, 0, 2000, 200); }'
Expected Output:
Attached 1 probe
^C
@bytes:
(...,0] 0 | |
[0, 200) 66 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[200, 400) 2 |@ |
[400, 600) 3 |@@ |
[600, 800) 0 | |
[800, 1000) 5 |@@@ |
[1000, 1200) 0 | |
[1200, 1400) 0 | |
[1400, 1600) 0 | |
[1600, 1800) 0 | |
[1800, 2000) 0 | |
[2000,...) 39 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
Summarize read() bytes as a linear histogram, and traced using kernel dynamic tracing.
kretprobe:vfs_read: this is the kretprobe probe type (kernel dynamic tracing of function returns) instrumenting the vfs_read() kernel function. There is also the kprobe probe type (shown in the next lesson), to instrument when functions begin execution (are entered). These are powerful probe types, letting you trace tens of thousands of different kernel functions. However, these are "unstable" probe types: since they can trace any kernel function, there is no guarantee that your kprobe/kretprobe will work between kernel versions, as the function names, arguments, return values, and roles may change. Also, since it is tracing the raw kernel, you'll need to browse the kernel source to understand what these probes, arguments, and return values, mean.retval) of vfs_read() is the return value: the number of bytes read.bpftrace -e 'kprobe:vfs_read { @start[tid] = nsecs; } kretprobe:vfs_read /@start[tid]/ { @ns[comm] = hist(nsecs - @start[tid]); delete(@start, tid); }'
Expected Output:
Attached 2 probes
[...]
@ns[snmp-pass]:
[0, 1] 0 | |
[2, 4) 0 | |
[4, 8) 0 | |
[8, 16) 0 | |
[16, 32) 0 | |
[32, 64) 0 | |
[64, 128) 0 | |
[128, 256) 0 | |
[256, 512) 27 |@@@@@@@@@ |
[512, 1k) 125 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
[1k, 2k) 22 |@@@@@@@ |
[2k, 4k) 1 | |
[4k, 8k) 10 |@@@ |
[8k, 16k) 1 | |
[16k, 32k) 3 |@ |
[32k, 64k) 144 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[64k, 128k) 7 |@@ |
[128k, 256k) 28 |@@@@@@@@@@ |
[256k, 512k) 2 | |
[512k, 1M) 3 |@ |
[1M, 2M) 1 | |
Summarize the time spent in read(), in nanoseconds, as a histogram, by process name.
@start[tid]: This uses the thread ID as a key. There may be many reads in-flight, and we want to store a start timestamp to each. How? We could construct a unique identifier for each read, and use that as the key. But because kernel threads can only be executing one syscall at a time, we can use the thread ID as the unique identifier, as each thread cannot be executing more than one.
nsecs: Nanoseconds since boot. This is a high resolution timestamp counter than can be used to time events.
/@start[tid]/: This filter checks that the start time was seen and recorded. Without this filter, this program may be launched during a read and only catch the end, resulting in a time calculation of now - zero, instead of now - start.
delete(@start, tid): this frees the variable.
bpftrace -e 'tracepoint:sched:sched* { @[probe] = count(); } interval:s:5 { exit(); }'
Expected Output:
Attached 25 probes
@[tracepoint:sched:sched_wakeup_new]: 1
@[tracepoint:sched:sched_process_fork]: 1
@[tracepoint:sched:sched_process_exec]: 1
@[tracepoint:sched:sched_process_exit]: 1
@[tracepoint:sched:sched_process_free]: 2
@[tracepoint:sched:sched_process_wait]: 7
@[tracepoint:sched:sched_wake_idle_without_ipi]: 53
@[tracepoint:sched:sched_stat_runtime]: 212
@[tracepoint:sched:sched_wakeup]: 253
@[tracepoint:sched:sched_waking]: 253
@[tracepoint:sched:sched_switch]: 510
Count process-level events for five seconds, printing a summary.
sched probe category has high-level scheduler and process events, such as fork, exec, and context switch.bpftrace -e 'profile:hz:99 { @[kstack] = count(); }'
Expected Output:
Attached 1 probe
^C
[...]
@[
filemap_map_pages+181
__handle_mm_fault+2905
handle_mm_fault+250
__do_page_fault+599
async_page_fault+69
]: 12
[...]
@[
cpuidle_enter_state+164
do_idle+390
cpu_startup_entry+111
start_secondary+423
secondary_startup_64+165
]: 22122
Profile kernel stacks at 99 Hertz, printing a frequency count.
ustack for the user-level stack trace.bpftrace -e 'tracepoint:sched:sched_switch { @[kstack] = count(); }'
Expected Output:
^C
[...]
@[
__schedule+697
__schedule+697
schedule+50
schedule_timeout+365
xfsaild+274
kthread+248
ret_from_fork+53
]: 73
@[
__schedule+697
__schedule+697
schedule_idle+40
do_idle+356
cpu_startup_entry+111
start_secondary+423
secondary_startup_64+165
]: 305
This counts stack traces that led to context switching (off-CPU) events. The above output has been truncated to show the last two only.
bpftrace -e 'tracepoint:block:block_rq_issue { @ = hist(args.bytes); }'
Expected Output:
Attached 1 probe
^C
@:
[0, 1] 1 |@@ |
[2, 4) 0 | |
[4, 8) 0 | |
[8, 16) 0 | |
[16, 32) 0 | |
[32, 64) 0 | |
[64, 128) 0 | |
[128, 256) 0 | |
[256, 512) 0 | |
[512, 1K) 0 | |
[1K, 2K) 0 | |
[2K, 4K) 0 | |
[4K, 8K) 24 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[8K, 16K) 2 |@@@@ |
[16K, 32K) 6 |@@@@@@@@@@@@@ |
[32K, 64K) 5 |@@@@@@@@@@ |
[64K, 128K) 0 | |
[128K, 256K) 1 |@@ |
Block I/O requests by size in bytes, as a histogram.
The context of this probe is important: this fires when the I/O is issued to the device. This often happens in process context, where builtins like comm will show you the process name, but it can also happen from kernel context (eg, readahead) when the pid and comm will not show the application you expect.
# cat path.bt
#ifndef BPFTRACE_HAVE_BTF
#include <linux/path.h>
#include <linux/dcache.h>
#endif
kprobe:vfs_open
{
printf("open path: %s\n", str(((struct path *)arg0)->dentry->d_name.name));
}
# bpftrace path.bt
Attached 1 probe
open path: dev
open path: if_inet6
open path: retrans_time_ms
[...]
This uses kernel dynamic tracing of the vfs_open() function, which has a (struct path *) as the first argument.
arg0 is a builtin variable containing the first probe argument, the meaning of which is defined by the probe type. For kprobe, it is the first argument to the function. Other arguments can be accessed as arg1, ..., argN.((struct path *)arg0)->dentry->d_name.name: this casts arg0 as struct path *, then dereferences dentry, etc.The kernel struct support is the same as bcc, making use of kernel headers. This means that many structs are available, but not everything, and sometimes it might be necessary to manually include a struct. For an example of this, see the dcsnoop tool, which includes a portion of struct nameidata manually as it wasn't in the available headers. If the kernel has BTF data, all kernel structs are always available.
At this point you understand much of bpftrace, and can begin to use and write powerful one-liners. See the Manual for more capabilities.