DSL higher-order functions - Miller

<div> Quick links:   <a class="quicklink" href="../reference-main-flag-list/index.html">Flags</a>   <a class="quicklink" href="../reference-verbs/index.html">Verbs</a>   <a class="quicklink" href="../reference-dsl-builtin-functions/index.html">Functions</a>   <a class="quicklink" href="../glossary/index.html">Glossary</a>   <a class="quicklink" href="../release-docs/index.html">Release docs</a> </div> # DSL higher-order functions

A higher-order function is one which takes another function as an argument. As of Miller 6 you can use select, apply, reduce, fold, and sort, and any, and every to express flexible, intuitive operations on arrays and maps, as an alternative to things which would otherwise require for-loops.

See also the get_keys and get_values functions which, when given a map, return an array of its keys or an array of its values, respectively.

select

The select function takes a map or array as its first argument and a function as its second argument. It includes each input element in the output if the function returns true.

For arrays, that function should take one argument, for an array element; for maps, it should take two, for a map element key and value. In either case, it should return a boolean.

A perhaps helpful analogy: the select function is to arrays and maps as the filter is to records.

Array examples:

Map examples:

<pre class="pre-highlight-in-pair"> mlr -n put ' end { my_map = {"cubit": 823, "dale": 13, "apple": 199, "ember": 191, "bottle": 107}; print "Original:"; print my_map; print; print "Keys with an 'o' in them:"; print select(my_map, func (k,v) { return k =~ "o"}); print; print "Values with last digit >= 5:"; print select(my_map, func (k,v) { return v % 10 >= 5}); } ' </pre> <pre class="pre-non-highlight-in-pair"> Original: { "cubit": 823, "dale": 13, "apple": 199, "ember": 191, "bottle": 107 } Keys with an o in them: { "bottle": 107 } Values with last digit >= 5: { "apple": 199, "bottle": 107 } </pre>

apply

The apply function takes a map or array as its first argument and a function as its second argument. It applies the function to each element of the array or map.

For arrays, the function should take one argument, representing an array element, and return a new element. For maps, it should take two, for the map element key and value. It should return a new key-value pair (i.e., a single-entry map).

A perhaps helpful analogy: the apply function is to arrays and maps as the put is to records.

Array examples:

<pre class="pre-highlight-in-pair"> mlr -n put ' end { my_array = [2, 9, 10, 3, 1, 4, 5, 8, 7, 6]; print "Original:"; print my_array; print; print "Squares:"; print apply(my_array, func(e) { return e**2 }); print; print "Cubes:"; print apply(my_array, func(e) { return e**3 }); print; print "Sorted cubes:"; print sort(apply(my_array, func(e) { return e**3 })); } ' </pre> <pre class="pre-non-highlight-in-pair"> Original: [2, 9, 10, 3, 1, 4, 5, 8, 7, 6] Squares: [4, 81, 100, 9, 1, 16, 25, 64, 49, 36] Cubes: [8, 729, 1000, 27, 1, 64, 125, 512, 343, 216] Sorted cubes: [1, 8, 27, 64, 125, 216, 343, 512, 729, 1000] </pre> <pre class="pre-highlight-in-pair"> mlr -n put ' end { my_map = {"cubit": 823, "dale": 13, "apple": 199, "ember": 191, "bottle": 107}; print "Original:"; print my_map; print; print "Squared values:"; print apply(my_map, func(k,v) { return {k: v**2} }); print; print "Cubed values, sorted by key:"; print sort(apply(my_map, func(k,v) { return {k: v**3} })); print; print "Same, with upcased keys:"; print sort(apply(my_map, func(k,v) { return {toupper(k): v**3} })); } ' </pre> <pre class="pre-non-highlight-in-pair"> Original: { "cubit": 823, "dale": 13, "apple": 199, "ember": 191, "bottle": 107 } Squared values: { "cubit": 677329, "dale": 169, "apple": 39601, "ember": 36481, "bottle": 11449 } Cubed values, sorted by key: { "apple": 7880599, "bottle": 1225043, "cubit": 557441767, "dale": 2197, "ember": 6967871 } Same, with upcased keys: { "APPLE": 7880599, "BOTTLE": 1225043, "CUBIT": 557441767, "DALE": 2197, "EMBER": 6967871 } </pre>

reduce

The reduce function takes a map or array as its first argument and a function as its second argument. It accumulates entries into a final output, such as a sum or product.

For arrays, the function should take two arguments, for the accumulated value and the array element; for maps, it should take four, for the accumulated key and value, and the map-element key and value. In either case it should return the updated accumulator.

The start value for the accumulator is the first element for arrays, or the first element's key-value pair for maps.

<pre class="pre-highlight-in-pair"> mlr -n put ' end { my_array = [2, 9, 10, 3, 1, 4, 5, 8, 7, 6]; print "Original:"; print my_array; print; print "First element:"; print reduce(my_array, func (acc,e) { return acc }); print; print "Last element:"; print reduce(my_array, func (acc,e) { return e }); print; print "Sum of values:"; print reduce(my_array, func (acc,e) { return acc + e }); print; print "Product of values:"; print reduce(my_array, func (acc,e) { return acc * e }); print; print "Concatenation of values:"; print reduce(my_array, func (acc,e) { return acc. "," . e }); } ' </pre> <pre class="pre-non-highlight-in-pair"> Original: [2, 9, 10, 3, 1, 4, 5, 8, 7, 6] First element: 2 Last element: 6 Sum of values: 55 Product of values: 3628800 Concatenation of values: 2,9,10,3,1,4,5,8,7,6 </pre> <pre class="pre-highlight-in-pair"> mlr -n put ' end { my_map = {"cubit": 823, "dale": 13, "apple": 199, "ember": 191, "bottle": 107}; print "Original:"; print my_map; print; print "First key-value pair:"; print reduce(my_map, func (acck,accv,ek,ev) { return {acck: accv}}); print; print "Last key-value pair:"; print reduce(my_map, func (acck,accv,ek,ev) { return {ek: ev}}); print; print "Concatenate keys and values:"; print reduce(my_map, func (acck,accv,ek,ev) { return {acck . "," . ek: accv . "," . ev}}); print; print "Sum of values:"; print reduce(my_map, func (acck,accv,ek,ev) { return {"sum": accv + ev }}); print; print "Product of values:"; print reduce(my_map, func (acck,accv,ek,ev) { return {"product": accv * ev }}); print; print "String-join of values:"; print reduce(my_map, func (acck,accv,ek,ev) { return {"joined": accv . "," . ev }}); } ' </pre> <pre class="pre-non-highlight-in-pair"> Original: { "cubit": 823, "dale": 13, "apple": 199, "ember": 191, "bottle": 107 } First key-value pair: { "cubit": 823 } Last key-value pair: { "bottle": 107 } Concatenate keys and values: { "cubit,dale,apple,ember,bottle": "823,13,199,191,107" } Sum of values: { "sum": 1333 } Product of values: { "product": 43512437137 } String-join of values: { "joined": "823,13,199,191,107" } </pre>

fold

The fold function is the same as reduce, except that instead of the starting value for the accumulation being taken from the first entry of the array/map, you specify it as the third argument.

<pre class="pre-highlight-in-pair"> mlr -n put ' end { my_array = [2, 9, 10, 3, 1, 4, 5, 8, 7, 6]; print "Original:"; print my_array; print; print "Sum with reduce:"; print reduce(my_array, func (acc,e) { return acc + e }); print; print "Sum with fold and 0 initial value:"; print fold(my_array, func (acc,e) { return acc + e }, 0); print; print "Sum with fold and 1000000 initial value:"; print fold(my_array, func (acc,e) { return acc + e }, 1000000); } ' </pre> <pre class="pre-non-highlight-in-pair"> Original: [2, 9, 10, 3, 1, 4, 5, 8, 7, 6] Sum with reduce: 55 Sum with fold and 0 initial value: 55 Sum with fold and 1000000 initial value: 1000055 </pre> <pre class="pre-highlight-in-pair"> mlr -n put ' end { my_map = {"cubit": 823, "dale": 13, "apple": 199, "ember": 191, "bottle": 107}; print "Original:"; print my_map; print; print "First key-value pair -- note this is the starting accumulator:"; print fold(my_map, func (acck,accv,ek,ev) { return {acck: accv}}, {"start": 999}); print; print "Last key-value pair:"; print fold(my_map, func (acck,accv,ek,ev) { return {ek: ev}}, {"start": 999}); print; print "Sum of values with fold and 0 initial value:"; print fold(my_map, func (acck,accv,ek,ev) { return {"sum": accv + ev} }, {"sum": 0}); print; print "Sum of values with fold and 1000000 initial value:"; print fold(my_map, func (acck,accv,ek,ev) { return {"sum": accv + ev} }, {"sum": 1000000}); } ' </pre> <pre class="pre-non-highlight-in-pair"> Original: { "cubit": 823, "dale": 13, "apple": 199, "ember": 191, "bottle": 107 } First key-value pair -- note this is the starting accumulator: { "start": 999 } Last key-value pair: { "bottle": 107 } Sum of values with fold and 0 initial value: { "sum": 1333 } Sum of values with fold and 1000000 initial value: { "sum": 1001333 } </pre>

sort

The sort function takes a map or array as its first argument, and it can take a function as its second argument. Unlike the other higher-order functions, the second argument can be omitted when the natural ordering is desired -- ordered by array element for arrays, or by key for maps.

As a second option, character flags such as r for reverse or c for case-folded lexical sort can be supplied as the second argument.

As a third option, a function can be supplied as the second argument.

For arrays, that function should take two arguments a and b, returning a negative, zero, or positive number as a<b, a==b, or a>b respectively. For maps, the function should take four arguments ak, av, bk, and bv, again returning negative, zero, or positive, using a's and b's keys and values.

Array examples:

Map examples:

<pre class="pre-highlight-in-pair"> mlr -n put ' end { my_map = {"cubit": 823, "dale": 13, "apple": 199, "ember": 191, "bottle": 107}; print "Original:"; print my_map; print; print "Ascending by key:"; print sort(my_map); print sort(my_map, func(ak,av,bk,bv) { return ak <=> bk }); print; print "Descending by key:"; print sort(my_map, "r"); print sort(my_map, func(ak,av,bk,bv) { return bk <=> ak }); print; print "Ascending by value:"; print sort(my_map, func(ak,av,bk,bv) { return av <=> bv }); print; print "Descending by value:"; print sort(my_map, func(ak,av,bk,bv) { return bv <=> av }); } ' </pre> <pre class="pre-non-highlight-in-pair"> Original: { "cubit": 823, "dale": 13, "apple": 199, "ember": 191, "bottle": 107 } Ascending by key: { "apple": 199, "bottle": 107, "cubit": 823, "dale": 13, "ember": 191 } { "apple": 199, "bottle": 107, "cubit": 823, "dale": 13, "ember": 191 } Descending by key: { "ember": 191, "dale": 13, "cubit": 823, "bottle": 107, "apple": 199 } { "ember": 191, "dale": 13, "cubit": 823, "bottle": 107, "apple": 199 } Ascending by value: { "dale": 13, "bottle": 107, "ember": 191, "apple": 199, "cubit": 823 } Descending by value: { "cubit": 823, "apple": 199, "ember": 191, "bottle": 107, "dale": 13 } </pre>

Please see the sorting page for more examples.

any and every

This is a way to do a logical OR/AND, respectively, of several boolean expressions, without the explicit ||/&& and without a for-loop. This is a keystroke-saving convenience.

<pre class="pre-highlight-in-pair"> mlr --c2p cat example.csv </pre> <pre class="pre-non-highlight-in-pair"> color shape flag k index quantity rate yellow triangle true 1 11 43.6498 9.8870 red square true 2 15 79.2778 0.0130 red circle true 3 16 13.8103 2.9010 red square false 4 48 77.5542 7.4670 purple triangle false 5 51 81.2290 8.5910 red square false 6 64 77.1991 9.5310 purple triangle false 7 65 80.1405 5.8240 yellow circle true 8 73 63.9785 4.2370 yellow circle true 9 87 63.5058 8.3350 purple square false 10 91 72.3735 8.2430 </pre> <pre class="pre-highlight-in-pair"> mlr --c2p --from example.csv filter 'any({"color":"red","shape":"square"}, func(k,v) {return $[k] == v})' </pre> <pre class="pre-non-highlight-in-pair"> color shape flag k index quantity rate red square true 2 15 79.2778 0.0130 red circle true 3 16 13.8103 2.9010 red square false 4 48 77.5542 7.4670 red square false 6 64 77.1991 9.5310 purple square false 10 91 72.3735 8.2430 </pre> <pre class="pre-highlight-in-pair"> mlr --c2p --from example.csv filter 'every({"color":"red","shape":"square"}, func(k,v) {return $[k] == v})' </pre> <pre class="pre-non-highlight-in-pair"> color shape flag k index quantity rate red square true 2 15 79.2778 0.0130 red square false 4 48 77.5542 7.4670 red square false 6 64 77.1991 9.5310 </pre> <pre class="pre-highlight-in-pair"> mlr --c2p --from example.csv put '$is_red_square = every({"color":"red","shape":"square"}, func(k,v) {return $[k] == v})' </pre> <pre class="pre-non-highlight-in-pair"> color shape flag k index quantity rate is_red_square yellow triangle true 1 11 43.6498 9.8870 false red square true 2 15 79.2778 0.0130 true red circle true 3 16 13.8103 2.9010 false red square false 4 48 77.5542 7.4670 true purple triangle false 5 51 81.2290 8.5910 false red square false 6 64 77.1991 9.5310 true purple triangle false 7 65 80.1405 5.8240 false yellow circle true 8 73 63.9785 4.2370 false yellow circle true 9 87 63.5058 8.3350 false purple square false 10 91 72.3735 8.2430 false </pre> <pre class="pre-highlight-in-pair"> mlr --c2p --from example.csv filter 'any([16,51,61,64], func(e) {return $index == e})' </pre> <pre class="pre-non-highlight-in-pair"> color shape flag k index quantity rate red circle true 3 16 13.8103 2.9010 purple triangle false 5 51 81.2290 8.5910 red square false 6 64 77.1991 9.5310 </pre>

This last example could also be done using a map:

<pre class="pre-highlight-in-pair"> mlr --c2p --from example.csv filter ' begin { @indices = {16:true, 51:true, 61:true, 64:true}; } @indices[$index] == true; ' </pre> <pre class="pre-non-highlight-in-pair"> color shape flag k index quantity rate red circle true 3 16 13.8103 2.9010 purple triangle false 5 51 81.2290 8.5910 red square false 6 64 77.1991 9.5310 </pre>

Combined examples

Using a paradigm from the page on operating on all records, we can retain a column from the input data as an array, then apply some higher-order functions to it:

Caveats

Remember return

From other languages, it's easy to write accidentally

<pre class="pre-highlight-in-pair"> mlr -n put 'end { print select([1,2,3,4,5], func (e) { e >= 3 })}' </pre> <pre class="pre-non-highlight-in-pair"> mlr: select: function returned non-boolean "(absent)". </pre>

instead of

<pre class="pre-highlight-in-pair"> mlr -n put 'end { print select([1,2,3,4,5], func (e) { return e >= 3 })}' </pre> <pre class="pre-non-highlight-in-pair"> [3, 4, 5] </pre>

No IIFEs

As of September 2021, immediately invoked function expressions (IIFEs) are not part of the Miller DSL's grammar. For example, this doesn't work yet:

<pre class="pre-highlight-in-pair"> mlr -n put ' end { x = 3; y = (func (e) { return e**7 })(x); print y; } ' </pre> <pre class="pre-non-highlight-in-pair"> mlr: cannot parse DSL expression. mlr: parse error: unexpected lparen ("(") </pre>

but this does:

<pre class="pre-highlight-in-pair"> mlr -n put ' end { x = 3; f = func (e) { return e**7 }; y = f(x); print y; } ' </pre> <pre class="pre-non-highlight-in-pair"> 2187 </pre>

Built-in functions are currently unsupported as arguments

Built-in functions are, as of September 2021, a bit separate from user-defined functions internally to Miller, and can't be used directly as arguments to higher-order functions.

For example, this doesn't work yet:

<pre class="pre-highlight-in-pair"> mlr -n put ' end { notches = [0,1,2,3]; radians = apply(notches, func (e) { return e * M_PI / 8 }); cosines = apply(radians, cos); print cosines; } ' </pre> <pre class="pre-non-highlight-in-pair"> mlr: apply: second argument must be a function; got absent. </pre>

but this does:

<pre class="pre-highlight-in-pair"> mlr -n put ' end { notches = [0,1,2,3]; radians = apply(notches, func (e) { return e * M_PI / 8 }); # cosines = apply(radians, cos); cosines = apply(radians, func (e) { return cos(e) }); print cosines; } ' </pre> <pre class="pre-non-highlight-in-pair"> [1, 0.9238795325112867, 0.7071067811865476, 0.38268343236508984] </pre>