rfcs/2021-08-29-8381-vrl-iteration-support.md
We add native, limited support for iteration to VRL in a way that fits the VRL design document, to allow operators to optimally remap their data.
*_keys and *_values Remap functions #5785replace_keys Remap function #5377replace_values Remap function #5783redact_values Remap function #5784break or return)loop).VRL is used to remap events to their desired state. Remapping involves manipulating existing fields, or adding new ones.
One gap in the language right now is the possibility to dynamically remap fields. That is, an event might have fields that can't be known at compile-time, which you still want to manipulate.
To do this, you have to be able to iterate over the data of your object or array, and remap them individually. This requires some form of iteration support in the language.
Operators gain access to a set of new functions that allows them to iterate over objects or arrays, and manipulate data within the collections.
To start, we’ll introduce 3 iteration functions:
map_keysmap_valuesfor_eachThese functions are sufficient to resolve all reported use-cases users have for iteration in VRL.
More functions can be added in the future (e.g. any, filter, reduce, etc),
but having the generic for_each allows us to take our time adding specialized
functions when sufficient demand requires it.
How each function handles iteration depends on the implementation of the function, but in general, the functions take a closure, which gets resolved for each item within the iterated collection. For function-specific details, see the ”Functions” chapter.
Function closures are tied to the function-call, meaning you cannot pass around closures in variables. This prevents tail-call recursion, which in turn prevents unbounded iteration, preventing operators from writing valid programs that become invalid (e.g. never resolve to completion) at runtime.
There is no unbounded loop iterator, similarly to avoid accidental infinite
loops in programs. Additionally, control-flow statements (e.g. break or
return) to manipulate the iteration is not supported at this time (see
"future improvements"). Iteration always runs to
completion.
Because VRL does not support defining custom functions, and because we do not support tail-call recursion, there is no way to use VRL’s syntax to do any direct or indirect recursion during iteration.
However, as the examples section shows, there is a clear need for multi-level recursion when mapping observability data.
To support this, each function implementation itself can allow for recursive behavior, either by default, or depending on function-call arguments.
For the starting set of iteration functions, all function support recursion, by
providing an optional recursive: bool function parameter. See the description
of the individual functions for more details on this.
map_keysMap each individual key of an object to a different key.
map_keys(value: object, recursive: bool) -> |string| { string }
The map_keys function allows you to iterate over an object, and change the
keys within that object.
It supports recursion by passing true for the recursive parameter. When
recursion is enabled, it will return the key of the to-be-recursed collection
first, and then any items within that collection. Note that arrays are recursed
as well, to allow recursing ”through” arrays into objects within those arrays.
This allows for mapping all keys in an object, even if those keys are deeply
nested within objects within array(s) within the top-level object.
map_valuesMap each individual value of an object or array to a different value.
map_values(value: object|array, recursive: bool) -> |any| { any }
The function works similarly to map_keys, except that it maps the values
instead of keys, and thus can also be used to map values within arrays.
Recursion behaves similarly to map_keys as well.
for_eachIterate over objects or arrays, without mutating any data.
for_each(value: object|array, recursive: bool) -> |string OR integer, any| { any }
This can be considered a ”trap door” iteration function that allows you to tackle any use-case not solved by any of the existing (or future) specialized iteration functions.
The drawback of such a function is that it potentially requires more manual
”set-up” code to get the end-result (e.g. initializing empty collections to
populate during a for_each run, for example).
As the name implies, this function does not mutate the given collection, and
instead always returns null. It can be used to mutate data external to the
closure, while iterating over the collection. In a sense, it’s the most
general-purpose iteration function that allows you to manually write mapping,
reducing, filtering or counting logic.
map function existsSome might note that there’s no map function, only specialized map_keys and
map_values.
The reason for this omission is that map becomes complicated when dealing
with recursion, and the closure signature differs when dealing with an object or
array, requiring us to know at compile-time the exact type of the iteration
target.
In the end, all current requested use-cases by operators could be solved by one
of the three proposed iteration functions, allowing us to skip the additional
work of figuring out how map would work exactly, until there’s an actual need
for such a function (if ever).
What follows is a list of reported use-cases, and a valid program that uses iteration to solve that use-case. Note that there are multiple ways to solve individual use-cases, this list shows one available solution per use-case.
. = map_values(., recursive: true) -> |value| { if value == "" { null } else { value } }
converting a single metric into multiple metrics
. = { "id": "booster", "timestamp": 123456, "data": { "acceleration": 10, "velocity": 20 } }
data = del(.data)
metrics = []
for_each(data) -> |key, value| {
metric = set(., [key], value)
metrics = push(metrics, metric)
}
. = map_keys(., recursive: true) -> |key| { replace(key, ".", "_") }
delete a field from all objects in an array
. = {"answers":[{"class":"IN","ttl":"264"},{"class":"IN","ttl":"264"}],"other":"data"}
.answers = map_values(.answers) -> |value| { del(value.ttl); value }
check property on variable-sized array of objects
array = [{ "a": 2}, {"a": 3}]
any_two = false
for_each(array) -> |_index, value| { if value == 2 { any_two = true } }
NOTE This is a good use-case for future any and all iteration
functions:
any_two = any(array) -> |_index, value| { value == 2 }
any_two = all(array) -> |_index, value| { value != 2 }
call parse_timestamp on array of Cloudtrail records
. = [{ ... }, { ... }]
. = map_values(.) -> |value| {
value.timestamp = parse_timestamp(value.eventTime, "%Y-%m-%dT%H:%M:%SZ") ?? now()
value
}
”unzip” object into separate key/value arrays
keys = []
values = []
for_each(.) -> |key, value| {
keys = push(keys, key)
values = push(values, value)
}
add fields to objects in array
. = { "foo": "bar", "items": [{}, {}] }
.items = map_values(.items) -> |value| { value.foo = .foo; value }
"zip" an array of objects with fields key and value into one object
data = [{ "key": "name", "value": "value" }, { "key": "key", "value": "otherValue" }]
for_each(data) -> |_index, value| {
. = set(., [value.key], value.value)
}
. = map_keys(., recursive: true) -> |key| { trim_start(key, "_") }
. = map_keys(., recursive: true) -> |key| { "my_" + key }
parse message using list of Grok patterns until one matches
patterns = []
matched = false
for_each(patterns) -> |_index, pattern| {
if !matched && (parsed, err = parse_grok(.message, pattern); err == null) {
matched = true
. |= parsed
}
}
find match against list of regular expressions
matched = false
for_each(patterns) -> |pattern| {
if !matched && match(.message, pattern) {
matched = true
}
}
NOTE this would be less verbose (and slightly more performant) using
a future any function:
matched = any(patterns) -> |pattern| { match(.message, pattern) }
. = map_keys(. ,recursive: true) -> |key| { replace(key, "my_prefix_", "") }
run encode_json on all top-level object fields
. = map_values(.) -> |value| {
if value.is_object() {
encode_json(value)
} else {
value
}
}
map key/value pairs to object with ”key” and ”value” fields
. = { "labels": { "key1": "value1", "key2": "value2" } }
new_labels = []
for_each(.labels) -> |key, value| {
new_labels = push(new_labels, { "key": key, "value": value })
}
.labels = new_labels
**NOTE** this is similar to [Jq’s `to_entries`
function](https://stedolan.github.io/jq/manual/#to_entries,from_entries,with_entries),
and could be worth a custom `map_to_array` function in VRL, in which each
individual key/value pair is mapped to an element in the new array:
```coffee
. = map_to_array(.) -> |key, value| { { "key": key, "value": value } }
or even just a specialized to_entries, without any iteration closure:
. = to_entries(.)
run parse_json on multiple strings in array, and emit as multiple
events
. = { "message": "{\"name\": \"Chase\"}\n{\"name\": \"Sky\"}\n" }
strings = split(.message, "\n")
. = compact(map_values(strings) -> |value| { parse_json(value) ?? null })
convert object to specific string format
. = { "key1": "value1", "key2": "value2" }
strings = []
for_each(.) -> |key, value| { strings = push(strings, key + "=" encode_json(value)) }
"{" + join(strings, ",") + "}"
NOTE this too would be (slightly) simpler with map_to_array:
. = { "key1": "value1", "key2": "value2" }
strings = map_to_array(.) -> |key, value| { key + "=" encode_json(value) }
"{" + join(strings, ",") + "}"
re-introduce previous only_fields functionality using iteration
only_fields = ["some", "set", "of", "fields"]
for_each(.) -> |key, _| {
if !includes(only_fields, key) {
. = remove(., [key])
}
}
NOTE this would be easier (and more performant) with a filter iteration
function:
only_fields = ["some", "set", "of", "fields"]
. = filter(.) -> |key, _| { includes(only_fields, key) }
map complex dynamic object based on conditionals
.input = map_values(.input) -> |input| {
input.items = map_values(input.items) -> |item| {
item.userAttributes = map_values(item.userAttributes) -> |attribute| {
if attribute.key == "Name" {
del(attribute.__type)
key = del(attribute.key)
value = del(attribute.value)
attribute = set!(attribute, [key], value)
} else if attribute.key == "Address" {
attribute.values = map_values(attribute.values) -> |address| {
del(address.city)
address
}
}
attribute
}
item.userId = map_values(item.userId) -> |id| {
del(id.userGroupId)
id
}
item
}
input
}
merge array of objects into single object
result = {}
objects = [
{ "foo": "bar" },
{ "foo": "baz" },
{ "bar": true },
{ "baz": [{ "qux": null, "quux": [2,4,6] }] },
]
for_each(objects) -> |_, value| { result |= value }
To explain iteration, let’s look at a more in-depth scenario, including comments
to explain what is happening, using the map_values function.
We’ll start with the following data:
{
"tags": {
"foo": true,
"bar": false,
"baz": "no",
"qux": [true, false],
"quux": {
"one": true,
"two": false
}
},
"ips": [
"180.14.129.174",
"31.73.200.120",
"82.35.219.252",
"113.58.218.2",
"32.85.172.216"
]
}
# Once Vector’s "schema support" is enabled, this can be removed.
.tags = object(.tags) ?? {}
.ips = array(.ips) ?? []
# Recursively map all `.tags` values to their new values.
#
# A copy of the object is returned, with the value changes applied.
.tags = map_values(.tags, recursive: true) { |value|
# Recursively iterating values also maps over collection types (objects or
# arrays). In this case, we don’t want to mutate those.
if is_object(value) || is_array(value) {
value
} else {
# `value` can be a boolean, or any other value. We enforce it to be
# a boolean.
value = bool!(value) ?? false
# Change the value to an object.
value = { "enabled": value }
# Mapping an object requires you to return any value at the end of the
# closure.
#
# This invariant will be checked at compile-time.
value
}
}
# Map all IP addresses in `.ips`.
order = 0
.ips = map_values(.ips) { |ip|
# Enforce `ip` to be a string.
ip = string(ip) ?? "unknown"
value = {
"address": ip,
"order": order,
"private": starts_with(ip, "180.14"),
}
# We can access and mutate outer-scope variables.
order = order + 1
# Mapping an array requires you to return a single value to which the
# item-under-iteration will be mapped to.
value
}
{
"tags": {
"foo": { "enabled": true },
"bar": { "enabled": false },
"baz": { "enabled": false },
"qux": { "enabled": false },
"quux": {
"one": { "enabled": true },
"two": { "enabled": false }
}
},
"ips": [
{ "address": "180.14.129.174", "order": 0, "private": true },
{ "address": "31.73.200.120", "order": 1, "private": false },
{ "address": "82.35.219.252", "order": 2, "private": false },
{ "address": "113.58.218.2", "order": 3, "private": false },
{ "address": "32.85.172.216", "order": 4, "private": false }
]
}
Each iteration function can define its own set of function parameters to accept, and the signature of the enumeration closure.
As an example, let’s take a look at the map_keys function signature.
map_keys(value: OBJECT, recursive: BOOLEAN) -> |<key variable>| { EXPRESSION } -> OBJECT
Let's break this down:
map_keys.value and recursive.
value has to be of type object, which is the object to be iterated over.recursive has to be of type boolean, determining whether to iterate over
nested objects and arrays. It defaults to false.).
-> |...| { expression }.|...|.string.string value, representing the new
key.object, with the mutated keys.Here's a simplified example on how to use the function:
{ "foo": true, "bar": false }
. = map_keys(.) -> |key| { upcase(key) }
{ "FOO": false, "BAR": true }
The object under iteration is not mutated, instead a copy of the value is iterated, and mutated, returning a new object or array after iteration completes.
This proposal favors adding a iteration function over for-loop syntax. That is, the RFC proposes:
map_keys(.) -> |key| { key }
over:
for (key, _value) in . {
key = upcase(key)
}
This choice is made both on technical merits, based on the VRL Design Document and for improved future capabilities. See the "for-loop" alternative section for more details on this.
For the chosen proposal to work, there are two separate concepts that need to be implemented:
Let's discuss these one by one, before we arrive at the final part, implementing
the map_keys function that uses both concepts.
For iteration to land in the form proposed in this RFC, we need a way for operators to write what they want to happen to keys and/or values of objects and arrays.
We do this by allowing functions to expose the fact that they accept a closure as a stand-alone argument to their function call.
"stand-alone" means the closure comes after the function call itself, e.g. this:
map(.) -> |k, v| { [k, v] }
over this:
map(., |k, v| { [k, v] })
This choice is made to make it clear that closures in VRL can't be passed around through variables, but are instead syntactically attached to a function call.
That is, we don't want to allow this:
my_closure = |k, v| { [k, v] }
map(., my_closure)
There are several reasons for rejecting this functionality:
It allows for slow or infinite recursion, violating the "Safety and performance over ease of use" VRL design principle.
It can make reading (and writing) VRL programs more complex, and code can no longer be reasoned about by reading from top-to-bottom, violating the "design the feature for the intended target audience" design principle.
We cannot allow assigning closures to event fields, requiring us to make a distinction between assigning to a variable and an event field, one we haven't had to make before, and would like to avoid making.
In practice, we haven't seen any use-case from operators that couldn't be solved by the current RFC proposal, but would be solved by the above syntax.
Instead, the closure-syntax is tied to a function call, and can only be added to
functions that explicitly expose their ability to take a closure with x
arguments that returns y value.
The return type of a closure is checked at compile-time, including the
requirement in map_string for a string return type.
The variable names used to access the provided closure values (e.g. |key, value|) are checked at compile-time to make sure you are actually using the
variables (to avoid potential variable name typo's). This behaves the same to
any other "unused variable assignment" checks happening at compile-time.
Lexical scoping (variables being accessible within a given scope, instead of globally) is something we've discussed before.
Before, we decided that the complexity of adding lexical scoping wasn't worth the investment before our first release, and we also hoped that lexical scoping wouldn't be something that was ever needed in VRL.
With this feature, and particular the function-closure syntax, lexical scoping comes to top of mind again.
The reason for that, is the following example:
map(.) { |key, value|
key = upcase(key)
[key, value]
}
key
We reference key outside the closure, at the last line of the program. What
should the value of key be in this case?
Without lexical scoping, it would be set to the upper-case variant of the "last" key in the event.
With lexical scoping, it would return an "undefined variable" error at
compile-time, because the key variable inside the closure is
lexically-scoped to that block, and remains undefined outside of the block.
However, while the above syntax would be new and thus not a breaking change, for existing code, adding lexical scoping would be a breaking change:
{
foo = "baz"
}
foo
Previously, foo would return "baz" when the program runs, but with lexical
scoping, the compiler returns an "undefined variable" compilation error instead.
This is a breaking change, but because it results in a compilation error, there will not be any unexpected runtime behavior for this case.
In terms of exact rules, the following applies to lexical scoping in VRL:
{ ... }) expression.The return type of a closure matters for the actual result of the function call. Without this requirement, mapping would work as follows:
map_keys(.) { |key|
key = upcase(key)
}
That is, key would be a "special variable" inside the closure, which modifies
the actual key of the record within the object.
This doesn't fit existing patterns in VRL. It looks as if there's a dangling
variable key at the end that remains unused, but because we special-cased this
situation, it would instead magically update the actual key in the object after
the closure runs to completion.
This can become more difficult to reason about if/when we introduce control-flow
statements such as break, as you could have set key before calling break,
which would then either still mutate the actual key, or not, depending on how we
implement break, but either way, the program itself becomes less readable, and
operators have to read the language documentation to understand the semantic
differences between how code behaves inside a function-closure and outside.
Instead, the map_keys function-closure is required to return a string-type
value, which the function machinery then uses to update the actual values of the
object record, e.g.:
map_keys(.) { |key|
key = upcase(key)
# The string return-value clearly defines the eventual key value. The `key`
# variable is no longer ”unused”.
key
}
Because the closure syntax will be tied to function calls, we don't need to add
a new top-level node type to the abstract syntax tree (AST). Instead, we need to
extend the existing FunctionCall type to support an optional closure:
pub struct FunctionCall {
pub ident: Node<Ident>,
pub abort_on_error: bool,
pub arguments: Vec<Node<FunctionArgument>>,
}
We'll modify the type to this:
pub struct FunctionCall {
pub ident: Ident,
pub abort_on_error: bool,
pub arguments: Vec<FunctionArgument>,
pub closure: Option<FunctionClosure>,
}
pub struct FunctionClosure {
pub variables: Vec<Ident>,
pub block: Block,
}
Next, we need to teach the parser to parse optional closures for function calls.
The existing LALRPOP grammar:
FunctionCall: FunctionCall = {
<ident: Sp<"function call">> <abort_on_error: "!"?> "("
NonterminalNewline*
<arguments: CommaMultiline<Sp<FunctionArgument>>?>
")" => { /* ... */ },
};
Is updated to support optional closures:
FunctionCall: FunctionCall = {
<ident: Sp<"function call">> <abort_on_error: "!"?> "("
NonterminalNewline*
<arguments: CommaMultiline<Sp<FunctionArgument>>?>
")" <closure: FunctionClosure?> => { /* ... */ },
};
#[inline]
FunctionClosure: FunctionClosure = {
"{"
"|" <variables: CommaList<"identifier">?> "|" NonterminalNewline*
<expressions: Exprs>
"}" => FunctionClosure { variables, block: Block(expressions) },
};
This will allow the parser to unambiguously parse optional function closures, and add them as nodes to the program AST.
Once the parser knows how to parse function closures, the compiler needs to interpret them.
To start, we need to update the FunctionCall expression:
pub struct FunctionCall {
expr: Box<dyn Expression>,
abort_on_error: bool,
maybe_fallible_arguments: bool,
// new addition
closure: Option<FunctionClosure>,
}
pub struct FunctionClosure {
variables: Vec<dyn Expression>,
block: Block,
}
We also need to update compile_function_call (not expanded here), to translate
the AST to updated FunctionCall expression type.
The bulk of the work needs to happen in the Function trait:
pub type Compiled = Result<Box<dyn Expression>, Box<dyn DiagnosticError>>;
pub trait Function: Sync + fmt::Debug {
/// The identifier by which the function can be called.
fn identifier(&self) -> &'static str;
/// One or more examples demonstrating usage of the function in VRL source
/// code.
fn examples(&self) -> &'static [Example];
/// Compile a [`Function`] into a type that can be resolved to an
/// [`Expression`].
///
/// This function is called at compile-time for any `Function` used in the
/// program.
///
/// At runtime, the `Expression` returned by this function is executed and
/// resolved to its final [`Value`].
fn compile(&self, state: &super::State, arguments: ArgumentList) -> Compiled;
/// An optional list of parameters the function accepts.
///
/// This list is used at compile-time to check function arity, keyword names
/// and argument type definition.
fn parameters(&self) -> &'static [Parameter] {
&[]
}
}
First, we're going to have to extend the compile method to take an optional
Closure:
fn compile(&self, state: &super::State, arguments: ArgumentList, closure: Option<FunctionClosure>) -> Compiled;
This will require us to update all currently existing function implementations,
but this is a mechanical change, as no existing functions can deal with closures
right now, so all of them will add _closure: Option<Closure> to their method
implementation, to indicate to the reader/Rust compiler that the closure
variable is unused.
Next, we need to have a way for the function definition to tell the compiler a few questions:
To resolve these questions, function definitions must implement a new method:
fn closure(&self) -> Option<closure::Definition> {
None
}
With closure::Definition defined as such:
mod closure {
/// The definition of a function-closure block a function expects to
/// receive.
struct Definition {
inputs: Vec<Input>,
}
/// One input variant for a function-closure.
///
/// A closure can support different variable input shapes, depending on the
/// type of a given parameter of the function.
///
/// For example, the `map` function takes either an `Object` or an `Array`
/// for the `value` parameter, and the closure it takes either accepts
/// `|key, value|`, where "key" is always a string, or `|index, value|` where
/// "index" is always a number, depending on the parameter input type.
struct Input {
/// The parameter name upon which this closure input variant depends on.
parameter: &'static str,
/// The value kind this closure input expects from the parameter.
kind: value::Kind,
/// The list of variables attached to this closure input type.
variables: Vec<Variable>,
/// The return type this input variant expects the closure to have.
output: Output,
}
/// One variable input for a closure.
///
/// For example, in `{ |foo, bar| ... }`, `foo` and `bar` are each
/// a `ClosureVariable`.
struct Variable {
/// The value kind this variable will return when called.
kind: value::Kind,
}
enum Output {
Array {
/// The number, and kind of elements expected.
elements: Vec<value::Kind>,
}
Object {
/// The field names, and value kinds expected.
fields: HashMap<&'static str, value::Kind,
}
Scalar {
/// The expected scalar kind.
kind: value::Kind,
}
Any,
}
}
As shown above, the default trait implementation for this new method returns
None, which means any function (the vast majority) that doesn't accept
a closure can forgo implementing this method, and continue to work as normal.
In the case of the for_each function, we'd implement it like so:
fn closure(&self) -> Option<closure::Definition> {
let field = closure::Variable { kind: kind::String };
let index = closure::Variable { kind: kind::Integer };
let value = closure::Variable { kind: kind::Any };
let object = closure::Input {
parameter: "value",
kind: kind::Object,
variables: vec![field, value],
output: closure::Output::Any,
};
let array = closure::Input {
parameter: "value",
kind: kind::Array,
variables: vec![index, value],
output: closure::Output::Any,
};
Some(closure::Definition {
inputs: vec![object, array],
})
}
With the above in place, for_each can now iterate over both objects and
arrays, and depending on which type is detected at compile-time, the closure
attached to the function call can make guarantees about which type the first
variable name will have.
For example:
. = { "foo": true }
. = for_each(.) -> |key, value| { ... }
. = ["foo", true]
. = for_each(.) -> |index, value| { ... }
In the first example, because the compiler knows for_each receives an object
as its first argument, it can guarantee that key will be a string, and value
of "any" type.
The second example is similar, except that it guarantees that the first variable is a number (the index of the value in the array).
Note that for the above to work, the compiler must know the exact type
provided to (in this case) the value function parameter. It can't be either
array or object, it has to be exactly one of the two. Operators can guarantee
this by using to_object, etc.
With all of this in place, the for_each function can compile its expression
given the closure details, and run the closure multiple times to completion,
doing something like this:
fn resolve(&self, ctx: &mut Context) -> Result<Value, Error> {
let run = |key, value| {
// TODO: handle variable scope stack
ctx.variables.insert(key, value);
let closure_value = self.closure.resolve(self)?;
ctx.variables.remove(key);
Ok(closure_value)
};
let result = match self.value.resolve(ctx)? {
Value::Object(object) => {
let mut result = BTreeMap::default();
for (key, value) in object.into_iter() {
let v = run(key, value)?.try_array()?;
result.insert(v[0], v[1]);
}
result.into()
}
Value::Array(array) => {
let mut result = Vec::with_capacity(array.len());
for (index, value) in array.into_iter().enumerate() {
let v = run(index, value)?;
result.push(v);
}
result.into()
}
_ => unreachable!("expected object or array"),
};
Ok(result)
}
This should get us most of the way towards adding function-closure support to
VRL, and using that support in the initial for_each function to do its work.
Iteration unlocks solutions to many remapping scenarios we currently don't support. Not implementing this RFC would hold VRL back, and prevent operators with more complex use-cases from using Vector with VRL to achieve their goals.
By adding iteration, we unlock the capability to resolve almost all use-cases in the future by introducing more iteration-based functions.
A different approach to iteration is to use a built-in syntax for-loop:
for (key, _value) in . {
key = upcase(key)
}
The biggest strength of this approach is the simplicity of the syntax, and the familiarity with many other languages that have for-loops.
It's relevant to mention that this solution also still needs lexical-scoping
implemented, to avoid "leaking" the values of the key and value variables
outside of the loop.
One problem with this approach is that recursive iteration (accessing nested
object fields) isn't possible, unless we add another special syntax (e.g.
recursive for (.., ..) in . {}). This adds more surface-level syntax and
removes some of its familiarity, making it a less attractive solution.
An additional problem is that the key and value variables become "special",
in that, even though it appears that they aren't used after assignment, the
for-loop expression would actually update the object key after each iteration
in the loop.
While this is technically the same problem we had to solve in the function-based
solution, applying that same solution to a for-loop again makes it look less
like for-loops in other languages, defeating one of the strengths of this
approach:
for (key, value) in . {
key = upcase(key)
(key, value)
}
A solution to the magic-variable problem would be to allow dynamic paths, and have operators directly assign to those paths:
for (key, _value) in . {
.[upcase(key)] = value
}
This solves one problem, but introduces another: using .<path> always starts
at the root of the target. Given the following example:
{ "foo": { "bar": true } }
How would we use dynamic paths in a recursive for-loop?
recursive for (key, value) in . {
.[upcase(key)] = value
}
Because key is "foo" and then "bar", you would end up with:
{ "FOO": true, "BAR": true }
Which is not the expected outcome.
This could be solved by making . relative in the for-loop, but that's a major
shift from the current way VRL works, requires a new way of accessing the root
object if you can't use ., and goes against the rules as laid out in the
design document.
None.
map_keys, map_values and for_each functionsWhile likely desirable, this RFC intentionally avoids control-flow operations inside iterators.
They are likely to be one of the first enhancements to this feature, though:
. = map_values(.) -> |value| {
# Return default value pairs if the value is an object.
if is_object(value) {
return value
}
# ...
}
Once this RFC is implemented, additional iteration capability can be expanded by adding new functions to the standard library.
For example, filtering:
# Return a new array with "180.14.129.174" removed.
.ips = filter(.ips) -> |_index, ip| {
ip = string(ip) ?? "unknown"
!starts_with(ip, "180.14")
}
Or ensuring all elements adhere to a condition:
# Add new `all_public` boolean field.
.all_public = all(.ips) -> |_index, ip| {
ip = string(ip) ?? "unknown"
!starts_with(ip, "180.14")
}
Some additional suggestions include flatten, partition, fold, any,
find, max, min, etc...
Potential list of future functions
flattenpartitionfoldanyallfindmaxminreplace_keysto_entriesfrom_entriesmap_to_arrayzipchainOnce [schema support][] is enabled, writing iterators can become less verbose.
For example, this example from the RFC:
.ips = array(.ips) ?? []
.ips = filter(.ips) -> |_index, ip| {
ip = string(ip) ?? "unknown"
!starts_with(ip, "180.14")
}
Can be written as follows, when applying the correct schema:
.ips = filter(.ips) -> |_, ip| !starts_with(ip, "180.14")
Because a type schema could guarantee the compiler that .ips is an array, with
only string items.
Once the [pipeline operations][] land, we can further expand the above example as follows:
.private_and_public_ips = filter(.ip) -> |_, ip| is_ip(ip) |> partition() -> |_, ip| starts_with(ip, "180.14")
Once [dynamic field assignment][] lands, you can dynamically move fields as well:
["foo", "bar", "baz"]
for_each(.) |index, value| ."{{value}}" = index
{
"foo": 0,
"bar": 1,
"baz": 2
}