Performance Profiling Tutorial

Effective Benchmarking with Hyperfine

Hyperfine is a powerful command-line benchmarking tool that allows you to measure and compare execution times of commands with statistical rigor.

Benchmarking Best Practices

When evaluating performance improvements, always set up your benchmarks to compare:

The GNU implementation as reference
The implementation without the change
The implementation with your change

This three-way comparison provides clear insights into:

How your implementation compares to the standard (GNU)
The actual performance impact of your specific change

Example Benchmark

First, you will need to build the binary in release mode. Debug builds are significantly slower:

bash

cargo build --features unix --profile profiling

bash

# Three-way comparison benchmark
hyperfine \
  --warmup 3 \
  "/usr/bin/ls -R ." \
  "./target/profiling/coreutils.prev ls -R ." \
  "./target/profiling/coreutils ls -R ."

# can be simplified with:
hyperfine \
  --warmup 3 \
  -L ls /usr/bin/ls,"./target/profiling/coreutils.prev ls","./target/profiling/coreutils ls" \
  "{ls} -R ."

For Ubuntu 25.10 and other distributions that use uutils by default, replace bin/ls with bin/gnuls. Also:

# to improve the reproducibility of the results:
taskset -c 0

Interpreting Results

Hyperfine provides summary statistics including:

Mean execution time
Standard deviation
Min/max times
Relative performance comparison

Look for consistent patterns rather than focusing on individual runs, and be aware of system noise that might affect results.

Integrated Benchmarking

Utilities include integrated benchmarks in src/uu/*/benches/* using CodSpeed and Divan.

Important: Before starting performance optimization work, you should add a benchmark for the utility. This provides a baseline for measuring improvements and ensures changes have measurable impact.

Running Benchmarks

bash

# Build and run benchmarks for a specific utility
cargo codspeed build -p uu_expand
cargo codspeed run -p uu_expand

Writing Benchmarks

Use common functions from src/uucore/src/lib/features/benchmark.rs:

rust

use divan::{Bencher, black_box};
use uu_expand::uumain;
use uucore::benchmark::{create_test_file, run_util_function, text_data};

#[divan::bench(args = [10_000, 100_000])]
fn bench_expand(bencher: Bencher, num_lines: usize) {
    let data = text_data::generate_ascii_data(num_lines);
    let temp_dir = tempfile::tempdir().unwrap();
    let file_path = create_test_file(&data, temp_dir.path());

    bencher.bench(|| {
        black_box(run_util_function(uumain, &[file_path.to_str().unwrap()]));
    });
}

fn main() {
    divan::main();
}

Common helpers include text_data::generate_*() for test data and fs_tree::create_*() for directory structures.

Using Samply for Profiling

Samply is a sampling profiler that helps you identify performance bottlenecks in your code.

Basic Profiling

bash

# Generate a flame graph for your application
samply record ./target/debug/coreutils ls -R

# Profile with higher sampling frequency
samply record --rate 1000 ./target/debug/coreutils seq 1 1000

The output using the debug profile might be easier to understand, but the performance characteristics may be somewhat different from release profile that we actually care about.

Consider using the profiling profile, that compiles in release mode but with debug symbols. For example:

bash

cargo build --profile profiling -p uu_ls
samply record -r 10000 target/profiling/ls -lR /var .git .git .git > /dev/null

Workflow: Measuring Performance Improvements

Establish baselines:

bash

hyperfine --warmup 3 \
  "/usr/bin/sort large_file.txt" \
  "our-sort-v1 large_file.txt"

Identify bottlenecks:

bash

samply record ./our-sort-v1 large_file.txt

Make targeted improvements based on profiling data

Verify improvements:

bash

hyperfine --warmup 3 \
  "/usr/bin/sort large_file.txt" \
  "our-sort-v1 large_file.txt" \
  "our-sort-v2 large_file.txt"

Document performance changes with concrete numbers
bash
```
hyperfine --export-markdown file.md [...]
```