src/uu/split/BENCHMARKING.md
To compare the performance of the uutils version of split with the
GNU version of split, you can use a benchmarking tool like
hyperfine. On Ubuntu 18.04 or later, you can install hyperfine by
running
sudo apt-get install hyperfine
Next, build the split binary under the release profile:
cargo build --release -p uu_split
Now, get a text file to test split on. The split program has three
main modes of operation: chunk by lines, chunk by bytes, and chunk by
lines with a byte limit. You may want to test the performance of split
with various shapes and sizes of input files and under various modes of
operation. For example, to test chunking by bytes on a large input file,
you can create a file named testfile.txt containing one million null
bytes like this:
printf "%0.s\0" {1..1000000} > testfile.txt
For another example, to test chunking by bytes on a large real-world input file, you could download a database dump of Wikidata or some related files that the Wikimedia project provides. For example, this file contains about 130 million lines.
Finally, you can compare the performance of the two versions of split
by running, for example,
cd /tmp && hyperfine \
--prepare 'rm x* || true' \
"split -b 1000 testfile.txt" \
"target/release/split -b 1000 testfile.txt"
Since split creates a lot of files on the filesystem, I recommend
changing to the /tmp directory before running the benchmark. The
--prepare argument to hyperfine runs a specified command before each
timing run. We specify rm x* || true so that the output files from the
previous run of split are removed before each run begins.