pkg/dsl/README.md
Parsing a Miller DSL (domain-specific language) expression goes through three representations:
The job of the PGPG parser is to turn the DSL string into an AST.
The job of the CST builder is to turn the AST into a CST.
The job of the put and filter transformers is to execute the CST statements on each input record.
For example, the part between the single quotes in
mlr put '$v = $i + $x * 4 + 100.7 * $y' myfile.dat
Use put -v to display the AST:
mlr -n put -v '$v = $i + $x * 4 + 100.7 * $y'
RAW AST:
* StatementBlock
* SrecDirectAssignment "=" "="
* DirectFieldName "md_token_field_name" "v"
* Operator "+" "+"
* Operator "+" "+"
* DirectFieldName "md_token_field_name" "i"
* Operator "*" "*"
* DirectFieldName "md_token_field_name" "x"
* IntLiteral "md_token_int_literal" "4"
* Operator "*" "*"
* FloatLiteral "md_token_float_literal" "100.7"
* DirectFieldName "md_token_field_name" "y"
Note the following about the AST:
= + - * / **, function names, and so on remain as non-leaf nodes of the ASTOperator-precedence examples:
$ mlr -n put -v '$x = 1 + 2 * 3'
RAW AST:
* StatementBlock
* SrecDirectAssignment "=" "="
* DirectFieldName "md_token_field_name" "x"
* Operator "+" "+"
* IntLiteral "md_token_int_literal" "1"
* Operator "*" "*"
* IntLiteral "md_token_int_literal" "2"
* IntLiteral "md_token_int_literal" "3"
$ mlr -n put -v '$x = 1 * 2 + 3'
RAW AST:
* StatementBlock
* SrecDirectAssignment "=" "="
* DirectFieldName "md_token_field_name" "x"
* Operator "+" "+"
* Operator "*" "*"
* IntLiteral "md_token_int_literal" "1"
* IntLiteral "md_token_int_literal" "2"
* IntLiteral "md_token_int_literal" "3"
$ mlr -n put -v '$x = 1 * (2 + 3)'
RAW AST:
* StatementBlock
* SrecDirectAssignment "=" "="
* DirectFieldName "md_token_field_name" "x"
* Operator "*" "*"
* IntLiteral "md_token_int_literal" "1"
* Operator "+" "+"
* IntLiteral "md_token_int_literal" "2"
* IntLiteral "md_token_int_literal" "3"
There's no -v display for the CST, but it's simply a reshaping of the AST
with pre-processed setup of function pointers to handle each type of statement
on a per-record basis.
The if/else and/or switch statements to decide what to do with each AST node are done at CST-build time, so they don't need to be re-done when the syntax tree is executed once on every data record.
./ast*.go. I didn't use a pkg/dsl/ast naming convention, although that would have been nice, in order to avoid a Go package-dependency cycle../cst. Please see cst/README.md for more information.