docs/src/reference-main-data-types.md
Miller's types are:
"abcdefg", supporting concatenation, one-up indexing and slicing, and library functions. See the pages on strings and regular expressions.1.2 and 3: double-precision and 64-bit signed, respectively. See the section on arithmetic operators and math-related library functions as well as the Arithmetic page.true and false; results of ==, <, >, etc. See the section on boolean operators.{"a":1,"b":[2,3,4]}, supporting key-indexing, preservation of insertion order, library functions, etc. See the Maps page.["a", 2, true], supporting one-up indexing and slicing, library functions, etc. See the Arrays page.null in JSON files; also used in gapped auto-extend of arrays. See the null-data page.select, apply, reduce, and fold.See also the list of [type-checking functions](reference-dsl-builtin-functions.md#type-checkin -functions) for the Miller programming language.
See also Differences from other programming languages.
Miller's input and output are all text-oriented: all the file formats supported by Miller are human-readable text, such as CSV, TSV, JSON, and DCF; binary formats such as BSON and Parquet are not supported (as of mid-2021). In this sense, everything is a string in and out of Miller -- be it in data files, or in DSL expressions you key in.
In the DSL, 7 is an int and 8.9 is a float, as
one would expect. Likewise, on input from data files,
string values representable as numbers, e.g. 1.2 or 3, are treated as int
or float, respectively. If a record has x=1,y=2 then mlr put '$z=$x+$y'
will produce x=1,y=2,z=3.
Numbers retain their original string representation, so if x is 1.2 on one
record and 1.200 on another, they'll print out that way on output (unless of
course they've been modified during processing, e.g. mlr put '$x = $x + 10).
Generally strings, numbers, and booleans don't mix; use type-casting like
string($x) to convert. However, the dot (string-concatenation) operator has
been special-cased: mlr put '$z=$x.$y' does not give an error, because the
dot operator has been generalized to stringify non-strings
Examples:
<pre class="pre-highlight-in-pair"> <b>mlr --csv cat data/type-infer.csv</b> </pre> <pre class="pre-non-highlight-in-pair"> a,b,c 1.2,3,true 4,5.6,buongiorno </pre> <pre class="pre-highlight-in-pair"> <b>mlr --icsv --oxtab --from data/type-infer.csv put '</b> <b> $d = $a . $c;</b> <b> $e = 7;</b> <b> $f = 8.9;</b> <b> $g = $e + $f;</b> <b> $ta = typeof($a);</b> <b> $tb = typeof($b);</b> <b> $tc = typeof($c);</b> <b> $td = typeof($d);</b> <b> $te = typeof($e);</b> <b> $tf = typeof($f);</b> <b> $tg = typeof($g);</b> <b>' then reorder -f a,ta,b,tb,c,tc,d,td,e,te,f,tf,g,tg</b> </pre> <pre class="pre-non-highlight-in-pair"> a 1.2 ta float b 3 tb int c true tc string d 1.2true td string e 7 te int f 8.9 tf float g 15.9 tg float a 4 ta int b 5.6 tb float c buongiorno tc string d 4buongiorno td string e 7 te int f 8.9 tf float g 15.9 tg float </pre>On input, string values representable as boolean (e.g. "true", "false")
are not automatically treated as boolean. This is because "true" and
"false" are ordinary words, and auto string-to-boolean on a column consisting
of words would result in some strings mixed with some booleans. Use the
boolean function to coerce: e.g. giving the record x=1,y=2,w=false to mlr filter '$z=($x<$y) || boolean($w)'.
The same is true for inf, +inf, -inf, infinity, +infinity,
-infinity, NaN, and all upper-cased/lower-cased/mixed-case variants of
those. These are valid IEEE floating-point numbers, but Miller treats these as
strings. You can explicit force conversion: if x=infinity in a data file,
then typeof($x) is string but typeof(float($x)) is float.
If you have, say, a CSV file whose columns contain strings which are well-formatted JSON,
they will not be auto-converted, but you can use the
json-parse verb
or the
json_parse DSL function:
These have their respective operations to convert back to string: the
json-stringify verb
and
json_stringify DSL function.