docs/src/reference-main-compressed-data.md
As of Miller 6, Miller supports reading GZIP, BZIP2, ZLIB, and
ZSTD formats transparently, and in-process. And (as before Miller 6) you have a
more general --prepipe option to support other decompression programs.
If your files end in .gz, .bz2, .z, or .zst then Miller will autodetect by file extension:
This will decompress the input data on the fly, while leaving the disk file unmodified. This helps you save disk space, at the cost of some additional runtime CPU usage to decompress the data.
If the filename doesn't in in .gz, .bz2, -z, or .zst then you can use the flags --gzin, --bz2in, --zin, or --zstdin to let Miller know:
Using the --prepipe flag, you can provide the name of any decompression
program in your $PATH and Miller will run it on each input file, effectively
piping the standard output of that program to Miller's standard input.
You can, of course, already do without this for single input files, for example:
<pre class="pre-highlight-in-pair"> <b>gunzip < gz-example.csv.gz | mlr --csv sort -f color</b> </pre> <pre class="pre-non-highlight-in-pair"> color,shape,flag,k,index,quantity,rate purple,triangle,false,5,51,81.2290,8.5910 purple,triangle,false,7,65,80.1405,5.8240 purple,square,false,10,91,72.3735,8.2430 red,square,true,2,15,79.2778,0.0130 red,circle,true,3,16,13.8103,2.9010 red,square,false,4,48,77.5542,7.4670 red,square,false,6,64,77.1991,9.5310 yellow,triangle,true,1,11,43.6498,9.8870 yellow,circle,true,8,73,63.9785,4.2370 yellow,circle,true,9,87,63.5058,8.3350 </pre>The benefit of --prepipe is that Miller will run the specified program once per
file, respecting file boundaries.
The prepipe command can be anything which reads from standard input and produces data acceptable to Miller. Nominally this allows you to use whichever decompression utilities you have installed on your system, on a per-file basis.
If the command has flags, quote them: e.g. mlr --prepipe 'zcat -cf'.
In your .mlrrc file, --prepipe and --prepipex are not
allowed as they could be used for unexpected code execution. You can use
--prepipe-bz2, --prepipe-gunzip, --prepipe-zcat, and --prepipe-zstdcat in .mlrrc, though.
Note that this feature is quite general and is not limited to decompression
utilities. You can use it to apply per-file filters of your choice: e.g. mlr --prepipe 'head -n 10' ..., if you like.
There is a --prepipe and a --prepipex:
nameofprogram < filename.ext (such as gunzip or zcat -cf or xz -cd) then use --prepipe.nameofprogram filename.ext (such as unzip -qc) then use --prepipex.Lastly, note that if --prepipe or --prepipex is specified on the Miller
command line, it replaces any autodetect decisions that might have been made
based on the filename extension. Likewise, --gzin/--bz2in/--zin/--zstdin are ignored if
--prepipe or --prepipex is also specified.
Everything said so far on this page has to do with compressed input.
For compressed output:
Normally Miller output is to stdout, so you can pipe the output: mlr sort -n quantity foo.csv | gzip > sorted.csv.gz.
For tee statements, which write output to files rather than stdout, use tee's redirect syntax:
-I, the overwritten file will
be compressed when possible. See the page on in-place mode for details.