.. SPDX-License-Identifier: GPL-2.0

=================================== Running BPF programs from userspace

This document describes the BPF_PROG_RUN facility for running BPF programs from userspace.

.. contents:: :local: :depth: 2

Overview

The BPF_PROG_RUN command can be used through the bpf() syscall to execute a BPF program in the kernel and return the results to userspace. This can be used to unit test BPF programs against user-supplied context objects, and as way to explicitly execute programs in the kernel for their side effects. The command was previously named BPF_PROG_TEST_RUN, and both constants continue to be defined in the UAPI header, aliased to the same value.

The BPF_PROG_RUN command can be used to execute BPF programs of the following types:

BPF_PROG_TYPE_SOCKET_FILTER
BPF_PROG_TYPE_SCHED_CLS
BPF_PROG_TYPE_SCHED_ACT
BPF_PROG_TYPE_XDP
BPF_PROG_TYPE_SK_LOOKUP
BPF_PROG_TYPE_CGROUP_SKB
BPF_PROG_TYPE_LWT_IN
BPF_PROG_TYPE_LWT_OUT
BPF_PROG_TYPE_LWT_XMIT
BPF_PROG_TYPE_FLOW_DISSECTOR
BPF_PROG_TYPE_STRUCT_OPS
BPF_PROG_TYPE_RAW_TRACEPOINT
BPF_PROG_TYPE_SYSCALL
BPF_PROG_TYPE_TRACING
BPF_PROG_TYPE_NETFILTER

When using the BPF_PROG_RUN command, userspace supplies an input context object and (for program types operating on network packets) a buffer containing the packet data that the BPF program will operate on. The kernel will then execute the program and return the results to userspace. Note that programs will not have any side effects while being run in this mode; in particular, packets will not actually be redirected or dropped, the program return code will just be returned to userspace. A separate mode for live execution of XDP programs is provided, documented separately below.

Running XDP programs in "live frame mode"

The BPF_PROG_RUN command has a separate mode for running live XDP programs, which can be used to execute XDP programs in a way where packets will actually be processed by the kernel after the execution of the XDP program as if they arrived on a physical interface. This mode is activated by setting the BPF_F_TEST_XDP_LIVE_FRAMES flag when supplying an XDP program to BPF_PROG_RUN.

The live packet mode is optimised for high performance execution of the supplied XDP program many times (suitable for, e.g., running as a traffic generator), which means the semantics are not quite as straight-forward as the regular test run mode. Specifically:

When executing an XDP program in live frame mode, the result of the execution will not be returned to userspace; instead, the kernel will perform the operation indicated by the program's return code (drop the packet, redirect it, etc). For this reason, setting the data_out or ctx_out attributes in the syscall parameters when running in this mode will be rejected. In addition, not all failures will be reported back to userspace directly; specifically, only fatal errors in setup or during execution (like memory allocation errors) will halt execution and return an error. If an error occurs in packet processing, like a failure to redirect to a given interface, execution will continue with the next repetition; these errors can be detected via the same trace points as for regular XDP programs.
Userspace can supply an ifindex as part of the context object, just like in the regular (non-live) mode. The XDP program will be executed as though the packet arrived on this interface; i.e., the ingress_ifindex of the context object will point to that interface. Furthermore, if the XDP program returns XDP_PASS, the packet will be injected into the kernel networking stack as though it arrived on that ifindex, and if it returns XDP_TX, the packet will be transmitted out of that same interface. Do note, though, that because the program execution is not happening in driver context, an XDP_TX is actually turned into the same action as an XDP_REDIRECT to that same interface (i.e., it will only work if the driver has support for the ndo_xdp_xmit driver op).
When running the program with multiple repetitions, the execution will happen in batches. The batch size defaults to 64 packets (which is same as the maximum NAPI receive batch size), but can be specified by userspace through the batch_size parameter, up to a maximum of 256 packets. For each batch, the kernel executes the XDP program repeatedly, each invocation getting a separate copy of the packet data. For each repetition, if the program drops the packet, the data page is immediately recycled (see below). Otherwise, the packet is buffered until the end of the batch, at which point all packets buffered this way during the batch are transmitted at once.
When setting up the test run, the kernel will initialise a pool of memory pages of the same size as the batch size. Each memory page will be initialised with the initial packet data supplied by userspace at BPF_PROG_RUN invocation. When possible, the pages will be recycled on future program invocations, to improve performance. Pages will generally be recycled a full batch at a time, except when a packet is dropped (by return code or because of, say, a redirection error), in which case that page will be recycled immediately. If a packet ends up being passed to the regular networking stack (because the XDP program returns XDP_PASS, or because it ends up being redirected to an interface that injects it into the stack), the page will be released and a new one will be allocated when the pool is empty.

When recycling, the page content is not rewritten; only the packet boundary pointers (data, data_end and data_meta) in the context object will be reset to the original values. This means that if a program rewrites the packet contents, it has to be prepared to see either the original content or the modified version on subsequent invocations.