DSL user-defined functions - Miller

<div> Quick links:   <a class="quicklink" href="../reference-main-flag-list/index.html">Flags</a>   <a class="quicklink" href="../reference-verbs/index.html">Verbs</a>   <a class="quicklink" href="../reference-dsl-builtin-functions/index.html">Functions</a>   <a class="quicklink" href="../glossary/index.html">Glossary</a>   <a class="quicklink" href="../release-docs/index.html">Release docs</a> </div> # DSL user-defined functions

As of Miller 5.0.0, you can define your own functions, as well as subroutines.

User-defined functions

Here's the obligatory example of a recursive function to compute the factorial function:

<pre class="pre-highlight-in-pair"> mlr --opprint --from data/small put ' func f(n) { if (is_numeric(n)) { if (n > 0) { return n * f(n-1); } else { return 1; } } # implicitly return absent-null if non-numeric } $ox = f($x + NR); $oi = f($i); ' </pre> <pre class="pre-non-highlight-in-pair"> a b i x y ox oi pan pan 1 0.346791 0.726802 0.4670549976810001 1 eks pan 2 0.758679 0.522151 3.6808304227112796 2 wye wye 3 0.204603 0.338318 1.7412477437471126 6 eks wye 4 0.381399 0.134188 18.588317372151177 24 wye pan 5 0.573288 0.863624 211.38663947090302 120 </pre>

Properties of user-defined functions:

Function bodies start with func and a parameter list, defined outside of begin, end, or other func or subr blocks. (I.e., the Miller DSL has no nested functions.)
A function (uniqified by its name) may not be redefined: either by redefining a user-defined function, or by redefining a built-in function. However, functions and subroutines have separate namespaces: you can define a subroutine log (for logging messages to stderr, say) which does not clash with the mathematical log (logarithm) function.
Functions may be defined either before or after use -- there is an object-binding/linkage step at startup. More specifically, functions may be either recursive or mutually recursive.
Functions may be defined and called either within mlr filter or mlr put.
Argument values may be reassigned: they are not read-only.
When a return value is not implicitly returned, this results in a return value of absent-null. (In the example above, if there were records for which the argument to f is non-numeric, the assignments would be skipped.) See also the null-data reference page.
See the section on Local variables for information on the scope and extent of arguments, as well as for information on the use of local variables within functions.
See the section on Expressions from files for information on the use of -f and -e flags.

User-defined subroutines

Example:

<pre class="pre-highlight-in-pair"> mlr --opprint --from data/small put -q ' begin { @call_count = 0; } subr s(n) { @call_count += 1; if (is_numeric(n)) { if (n > 1) { call s(n-1); } else { print "numcalls=" . @call_count; } } } print "NR=" . NR; call s(NR); ' </pre> <pre class="pre-non-highlight-in-pair"> NR=1 numcalls=1 NR=2 numcalls=3 NR=3 numcalls=6 NR=4 numcalls=10 NR=5 numcalls=15 </pre>

Properties of user-defined subroutines:

Subroutine bodies start with subr and a parameter list, defined outside of begin, end, or other func or subr blocks. (I.e., the Miller DSL has no nested subroutines.)
A subroutine (uniqified by its name) may not be redefined. However, functions and subroutines have separate namespaces: you can define a subroutine log which does not clash with the mathematical log function.
Subroutines may be defined either before or after use -- there is an object-binding/linkage step at startup. More specifically, subroutines may be either recursive or mutually recursive. Subroutines may call functions.
Subroutines may be defined and called either within mlr put or mlr put.
Subroutines have read/write access to $-variables and @-variables.
Argument values may be reassigned: they are not read-only.
See the section on local variables for information on the scope and extent of arguments, as well as for information on the use of local variables within functions.
See the section on Expressions from files for information on the use of -f and -e flags.

Differences between functions and subroutines

Subroutines cannot return values, and they are invoked by the keyword call.

In hindsight, subroutines needn't have been invented. If foo is a function, then you can write foo(1,2,3)` while ignoring its return value, and that plays the role of a subroutine quite well.

Loading a library of functions

If you have a file with UDFs you use frequently, say my-udfs.mlr, you can use --load or --mload to define them for your Miller scripts. For example, in your shell,

<pre class="pre-highlight-non-pair"> alias mlr='mlr --load ~/my-functions.mlr' </pre>

<pre class="pre-highlight-non-pair"> alias mlr='mlr --load /u/miller-udfs/' </pre>

See the miscellaneous-flags page for more information.

Function literals

You can define unnamed functions and assign them to variables, or pass them to functions.

See also the page on higher-order functions for more information on select, apply, reduce, fold, and sort. sort,

For example:

<pre class="pre-highlight-in-pair"> mlr --c2p --from example.csv put ' f = func(s, t) { return s . ":" . t; }; $z = f($color, $shape); ' </pre> <pre class="pre-non-highlight-in-pair"> color shape flag k index quantity rate z yellow triangle true 1 11 43.6498 9.8870 yellow:triangle red square true 2 15 79.2778 0.0130 red:square red circle true 3 16 13.8103 2.9010 red:circle red square false 4 48 77.5542 7.4670 red:square purple triangle false 5 51 81.2290 8.5910 purple:triangle red square false 6 64 77.1991 9.5310 red:square purple triangle false 7 65 80.1405 5.8240 purple:triangle yellow circle true 8 73 63.9785 4.2370 yellow:circle yellow circle true 9 87 63.5058 8.3350 yellow:circle purple square false 10 91 72.3735 8.2430 purple:square </pre> <pre class="pre-highlight-in-pair"> mlr --c2p --from example.csv put ' a = func(s, t) { return s . ":" . t . " above"; }; b = func(s, t) { return s . ":" . t . " below"; }; f = $index >= 50 ? a : b; $z = f($color, $shape); ' </pre> <pre class="pre-non-highlight-in-pair"> color shape flag k index quantity rate z yellow triangle true 1 11 43.6498 9.8870 yellow:triangle below red square true 2 15 79.2778 0.0130 red:square below red circle true 3 16 13.8103 2.9010 red:circle below red square false 4 48 77.5542 7.4670 red:square below purple triangle false 5 51 81.2290 8.5910 purple:triangle above red square false 6 64 77.1991 9.5310 red:square above purple triangle false 7 65 80.1405 5.8240 purple:triangle above yellow circle true 8 73 63.9785 4.2370 yellow:circle above yellow circle true 9 87 63.5058 8.3350 yellow:circle above purple square false 10 91 72.3735 8.2430 purple:square above </pre>

Note that you need a semicolon after the closing curly brace of the function literal.

Unlike named functions, function literals (also known as unnamed functions) have access to local variables defined in their enclosing scope. That's so you can do things like this:

<pre class="pre-highlight-in-pair"> mlr --c2p --from example.csv put ' f = func(s, t, i) { if (i >= cap) { return s . ":" . t . " above"; } else { return s . ":" . t . " below"; } }; cap = 10; $z = f($color, $shape, $index); ' </pre> <pre class="pre-non-highlight-in-pair"> color shape flag k index quantity rate z yellow triangle true 1 11 43.6498 9.8870 yellow:triangle above red square true 2 15 79.2778 0.0130 red:square above red circle true 3 16 13.8103 2.9010 red:circle above red square false 4 48 77.5542 7.4670 red:square above purple triangle false 5 51 81.2290 8.5910 purple:triangle above red square false 6 64 77.1991 9.5310 red:square above purple triangle false 7 65 80.1405 5.8240 purple:triangle above yellow circle true 8 73 63.9785 4.2370 yellow:circle above yellow circle true 9 87 63.5058 8.3350 yellow:circle above purple square false 10 91 72.3735 8.2430 purple:square above </pre>

See the page on higher-order functions for more.