docs/src/special-symbols-and-formatting.md
CSV handles this well and by design:
<pre class="pre-highlight-in-pair"> <b>cat commas.csv</b> </pre> <pre class="pre-non-highlight-in-pair"> Name,Role "Xiao, Lin",administrator "Khavari, Darius",tester </pre>Likewise JSON:
<pre class="pre-highlight-in-pair"> <b>mlr --icsv --ojson cat commas.csv</b> </pre> <pre class="pre-non-highlight-in-pair"> [ { "Name": "Xiao, Lin", "Role": "administrator" }, { "Name": "Khavari, Darius", "Role": "tester" } ] </pre>For Miller's XTAB there is no escaping for carriage returns, but commas work fine:
<pre class="pre-highlight-in-pair"> <b>mlr --icsv --oxtab cat commas.csv</b> </pre> <pre class="pre-non-highlight-in-pair"> Name Xiao, Lin Role administrator Name Khavari, Darius Role tester </pre>But for key-value-pairs and index-numbered formats, commas are the default field separator. And -- as of Miller 5.4.0 anyway -- there is no CSV-style double-quote-handling like there is for CSV. So commas within the data look like delimiters:
<pre class="pre-highlight-in-pair"> <b>mlr --icsv --odkvp cat commas.csv</b> </pre> <pre class="pre-non-highlight-in-pair"> Name=Xiao, Lin,Role=administrator Name=Khavari, Darius,Role=tester </pre>One solution is to use a different delimiter, such as a pipe character:
<pre class="pre-highlight-in-pair"> <b>mlr --icsv --odkvp --ofs pipe cat commas.csv</b> </pre> <pre class="pre-non-highlight-in-pair"> Name=Xiao, Lin|Role=administrator Name=Khavari, Darius|Role=tester </pre>Alternatively, use DKVPX format with --dkvpx, which supports CSV-style quoting so keys and values may contain commas, equals, newlines, and quotes natively. Example: echo '"a,b"="x,y",z=3' | mlr --dkvpx cat
To be extra-sure to avoid data/delimiter clashes, you can also use control characters as delimiters -- here, control-A:
<pre class="pre-highlight-in-pair"> <b>mlr --icsv --odkvp --ofs '\001' cat commas.csv | cat -v</b> </pre> <pre class="pre-non-highlight-in-pair"> Name=Xiao, Lin^ARole=administrator Name=Khavari, Darius^ARole=tester </pre>Simply surround the field names with curly braces:
<pre class="pre-highlight-in-pair"> <b>echo 'x.a=3,y:b=4,z/c=5' | mlr put '${product.all} = ${x.a} * ${y:b} * ${z/c}'</b> </pre> <pre class="pre-non-highlight-in-pair"> x.a=3,y:b=4,z/c=5,product.all=60 </pre>This is a little tricky due to the shell's handling of quotes. For simplicity, let's first put an update script into a file:
<pre class="pre-non-highlight-non-pair"> $a = "It's OK, I said, then 'for now'." </pre> <pre class="pre-highlight-in-pair"> <b>echo a=bcd | mlr put -f data/single-quote-example.mlr</b> </pre> <pre class="pre-non-highlight-in-pair"> a=It's OK, I said, then 'for now'. </pre>So: Miller's DSL uses double quotes for strings, and you can put single quotes (or backslash-escaped double-quotes) inside strings, no problem.
Without putting the update expression in a file, it's messier:
<pre class="pre-highlight-in-pair"> <b>echo a=bcd | mlr put '$a="It'\''s OK, I said, '\''for now'\''."'</b> </pre> <pre class="pre-non-highlight-in-pair"> a=It's OK, I said, 'for now'. </pre>The idea is that the outermost single-quotes are to protect the put expression from the shell, and the double quotes within them are for Miller. To get a single quote in the middle there, you need to actually put it outside the single-quoting for the shell. The pieces are the following, all concatenated together:
$a="It\'s OK, I said,\'for now\'.One way is to use square brackets; an alternative is to use simple string-substitution rather than a regular expression.
<pre class="pre-highlight-in-pair"> <b>cat data/question.dat</b> </pre> <pre class="pre-non-highlight-in-pair"> a=is it?,b=it is! </pre> <pre class="pre-highlight-in-pair"> <b>mlr --oxtab put '$c = gsub($a, "[?]"," ...")' data/question.dat</b> </pre> <pre class="pre-non-highlight-in-pair"> a is it? b it is! c is it ... </pre> <pre class="pre-highlight-in-pair"> <b>mlr --oxtab put '$c = ssub($a, "?"," ...")' data/question.dat</b> </pre> <pre class="pre-non-highlight-in-pair"> a is it? b it is! c is it ... </pre>The
ssub and
gssub
functions exist precisely for this reason: so you don't have to escape anything.
The ssub and gssub functions are also handy for dealing with non-UTF-8 strings such as Latin 1, since Go's
regexp library -- which Miller uses -- requires UTF-8 strings. For example:
More generally, though, we have the DSL functions
latin1_to_utf8 and
utf8_to_latin1
and the verbs
latin1-to-utf8 and
utf8-to-latin1. The former let you fix encodings on a field-by-field
level; the latter, for all records (with less keystroking). (Latin 1 is also known as
ISO/IEC 8859-1.)
In this example, all the inputs are convertible from Latin-1 to UTF-8, since Latin-1 already contains the German characters:
In this example, the English and German pangrams are convertible from UTF-8 to Latin-1, but the Russian one is not, since Latin-1 doesn't contain the Russian alphabet:
\1, \2, etc. to refer to the capturesint or floatSee also the page on regular expressions.
<pre class="pre-highlight-in-pair"> <b>echo "a=14°45'" | mlr put '$a =~"^([0-9]+)°([0-9]+)" {$degrees = float("\1") + float("\2") / 60}'</b> </pre> <pre class="pre-non-highlight-in-pair"> a=14°45',degrees=14.75 </pre>