website/content/en/guides/getting-started/transformation.md
{{< requirement >}} Before you begin, this guide assumes the following:
{{< /requirement >}}
Vector provides multiple transforms that you can use to modify your observability data as it passes through your Vector topology.
The transform that you will likely use most often is the remap
transform, which uses a single-purpose data transformation language called
Vector Remap Language (VRL for short) to define event
transformation logic. VRL has several features that should make it your first
choice for transforming data in Vector:
In cases where VRL doesn't fit your use case, Vector also offers a Lua runtime transform that offer a bit more flexibility than VRL but also come with downsides (listed below) that should always be borne in mind.
Let's jump straight into an example of using VRL to modify some data. We'll create a simple topology consisting of three components:
demo_logs source produces random Syslog
messages at a rate of 10 per second.remap transform uses VRL to parse incoming Syslog lines
into named fields (severity, timestamp, etc.).console sink pipes the output of the topology to stdout,
so that we can see the results on the command line.This configuration defines that topology:
sources:
logs:
type: demo_logs
format: syslog
interval: 0.1
transforms:
modify:
type: remap
inputs:
- logs
source: |
# Parse Syslog input. The "!" means that the script should abort on error.
. = parse_syslog!(.message)
sinks:
out:
type: console
inputs:
- modify
encoding:
codec: json
{{< info >}} Although we're using YAML for the configuration here, Vector also supports TOML and JSON.
{{< /info >}}
To start Vector using this topology:
vector --config /etc/vector/vector.yaml
You should see lines like this emitted via stdout (formatted for readability here):
{
"appname": "authsvc",
"facility": "daemon",
"hostname": "acmecorp.biz",
"message": "#hugops to everyone who has to deal with this",
"msgid": "ID486",
"procid": 5265,
"severity": "notice",
"timestamp": "2021-01-19T18:16:40.027Z"
}
So far, we've gotten Vector to parse the Syslog data but we're not yet
modifying that data. So let's update the source script of our remap
transform to make some ad hoc transformations:
transforms:
modify:
type: remap
inputs:
- logs
source: |
. = parse_syslog!(.message)
# Convert the timestamp to a Unix timestamp, aborting on error
.timestamp = to_unix_timestamp!(.timestamp)
# Remove the "facility" and "procid" fields
del(.facility)
del(.procid)
# Replace the "msgid" field with a unique ID
.msgid = uuid_v4()
# If the log message contains the phrase "Great Scott!", set the new field
# "critical" to true, otherwise set it to false. If the "contains" function
# errors, log the error (instead of aborting the script, as above).
if (is_critical = contains(.message, "Great Scott!"); is_critical) {
log("It contains 'Great Scott!'", level: "info")
}
.critical = is_critical
A few things to notice about this script:
parse_syslog function, for example,
the VRL compiler would provide a very specific warning and Vector wouldn't
start up.if statements, comments, and
logging.. acts as a sort of "container" for the event data. . by itself refers
to the root event, while you can use paths like .foo,
.foo[0], .foo.bar, .foo.bar[0], and so on to reference subfields, array
indices, and more.{{< info >}}
Note that VRL functions can behave differently depending on the execution context.
For example, the contains function above is infallible when ran inside a Vector/Remap process if the compiler can detect that the type of .message is a string.
The same code might behave differently when run in the VRL Playground,
VRL CLI, or when schema.log_namespace is set to true.
{{< /info >}}
If you stop and restart Vector, you should see log lines like this (again reformatted for readability):
{
"appname": "authsvc",
"hostname": "acmecorp.biz",
"message": "Great Scott! We're never gonna reach 88 mph with the flux capacitor in its current state!",
"msgid": "4e4437b6-13e8-43b3-b51e-c37bd46de490",
"severity": "notice",
"timestamp": 1611080200,
"critical": true
}
And that's it! We've successfully created a Vector topology that transforms every event that passes through it. If you'd like to know more about VRL, we recommend checking out the following documentation:
If VRL doesn't cover your use case—and that should happen rarely—Vector also
offers a lua runtime transform that you can use instead of
VRL. It enables you to run Lua code that you can include directly in
your Vector configuration
The lua transform provides maximal flexibility because they enable you to use
a full-fledged programming language right inside of Vector. But we recommend
using it only when truly necessary, for several reasons:
lua transform makes it all too easy to write scripts that are slow,
error prone, and hard to read.