Parsing Data

Telegraf has the ability to take data in a variety of formats. Telegraf requires configuration from the user in order to correctly parse, store, and send the original data. Telegraf does not take the raw data and maintain it internally.

Telegraf uses an internal metric representation consisting of the metric name, tags, fields and a timestamp, very similar to line protocol. This means that data needs to be broken up into a metric name, tags, fields, and a timestamp. While none of these options are required, they are available to the user and might be necessary to ensure the data is represented correctly.

Parsers

The first step is to determine which parser to use. Look at the list of parsers and find one that will work with the user's data. This is generally straightforward as the data-type will only have one parser that is actually applicable to the data.

JSON parsers

There is an exception when it comes to JSON data. Instead of a single parser, there are three different parsers capable of reading JSON data:

json: This parser is great for flat JSON data. If the JSON is more complex and for example, has other objects or nested arrays, then do not use this and look at the other two options.
json_v2: The v2 parser was created out of the need to parse JSON objects. It can take on more advanced cases, at the cost of additional configuration.
xpath_json: The xpath parser is the most capable of the three options. While the xpath name may imply XML data, it can parse a variety of data types using XPath expressions.

Tags and fields

The next step is to look at the data and determine how the data needs to be split up between tags and fields. Tags are generally strings or values that a user will want to search on. While fields are the raw data values, numeric types, etc. Generally, data is considered to be a field unless otherwise specified as a tag.

Timestamp

To parse a timestamp, at the very least the users needs to specify which field has the timestamp and what the format of the timestamp is. The format can either be a predefined Unix timestamp or parsed using a custom format based on Go reference time.

For Unix timestamps Telegraf understands the following settings:

Timestamp	Timestamp Format
`1709572232`	`unix`
`1709572232123`	`unix_ms`
`1709572232123456`	`unix_us`
`1709572232123456789`	`unix_ns`

There are some named formats available as well:

Timestamp	Named Format
`Mon Jan _2 15:04:05 2006`	`ANSIC`
`Mon Jan _2 15:04:05 MST 2006`	`UnixDate`
`Mon Jan 02 15:04:05 -0700 2006`	`RubyDate`
`02 Jan 06 15:04 MST`	`RFC822`
`02 Jan 06 15:04 -0700`	`RFC822Z`
`Monday, 02-Jan-06 15:04:05 MST`	`RFC850`
`Mon, 02 Jan 2006 15:04:05 MST`	`RFC1123`
`Mon, 02 Jan 2006 15:04:05 -0700`	`RFC1123Z`
`2006-01-02T15:04:05Z07:00`	`RFC3339`
`2006-01-02T15:04:05.999999999Z07:00`	`RFC3339Nano`
`Jan _2 15:04:05`	`Stamp`
`Jan _2 15:04:05.000`	`StampMilli`
`Jan _2 15:04:05.000000`	`StampMicro`
`Jan _2 15:04:05.000000000`	`StampNano`

If the timestamp does not conform to any of the above, then the user can specify a custom timestamp format, in which the user must provide the timestamp in Go reference time notation. Here are a few example timestamps and their Go reference time equivalent:

Timestamp	Go reference time
`2024-03-04T17:10:32`	`2006-01-02T15:04:05`
`04 Mar 24 10:10 -0700`	`02 Jan 06 15:04 -0700`
`2024-03-04T10:10:32Z07:00`	`2006-01-02T15:04:05Z07:00`
`2024-03-04 17:10:32.123+00`	`2006-01-02 15:04:05.999+00`
`2024-03-04T10:10:32.123456Z`	`2006-01-02T15:04:05.000000Z`
`2024-03-04T10:10:32.123456Z`	`2006-01-02T15:04:05.999999999Z`

Note for fractional second values, the user can use either a 9 or 0. Using a 0 forces a certain length, but using 9s do not.

Please note, that timezone abbreviations are ambiguous! For example MST, can stand for either Mountain Standard Time (UTC-07) or Malaysia Standard Time (UTC+08). As such, avoid abbreviated timezones if possible.

Unix timestamps use UTC, there is no concept of a timezone for a Unix timestamp.

Some devices report timestamp as a number, similar to Unix timestamp format, but in local timezone not UTC. The formats below provide support for these cases by means of computing offset between local time and UTC:

Timestamp	Timestamp Format
`1709572232`	`timestamp_tz`
`1709572232123`	`timestamp_tz_ms`
`1709572232123456`	`timestamp_tz_us`
`1709572232123456789`	`timestamp_tz_ns`

Examples

Below are a few basic examples to get users started.

CSV

Given the following data:

csv

node,temp,humidity,alarm,time
node1,32.3,23,false,2023-03-06T16:52:23Z
node2,22.6,44,false,2023-03-06T16:52:23Z
node3,17.9,56,true,2023-03-06T16:52:23Z

Here is corresponding parser configuration and result:

toml

[[inputs.file]]
files = ["test.csv"]
data_format = "csv"

csv_header_row_count = 1
csv_column_names = ["node","temp","humidity","alarm","time"]
csv_tag_columns = ["node"]
csv_timestamp_column = "time"
csv_timestamp_format = "2006-01-02T15:04:05Z"

text

file,node=node1 temp=32.3,humidity=23i,alarm=false 1678121543000000000
file,node=node2 temp=22.6,humidity=44i,alarm=false 1678121543000000000
file,node=node3 temp=17.9,humidity=56i,alarm=true 1678121543000000000

CSV with Local Timestamp

Given the following data:

csv

node,temp,humidity,alarm,time
node1,32.3,23,false,1568338208
node2,22.6,44,false,1568338208

Here is corresponding parser configuration and result:

toml

[[inputs.file]]
files = ["test.csv"]
data_format = "csv"

csv_header_row_count = 1
csv_column_names = ["node","temp","humidity","alarm","time"]
csv_tag_columns = ["node"]
csv_timestamp_column = "time"
csv_timestamp_format = "timestamp_tz"
csv_timezone = "Pacific/Fiji"

text

file,node=node1 temp=32.3,humidity=23i,alarm=false 1568295008000000000
file,node=node2 temp=22.6,humidity=44i,alarm=false 1568295008000000000
file,node=node3 temp=17.9,humidity=56i,alarm=true 1568295008000000000

Pay attention that the timestamp in CSV is 12 hours later than the metrics timestamp because Pacific/Fiji is +12:00 Timezone.

JSON flat data

Given the following data:

json

{ "node": "node", "temp": 32.3, "humidity": 23, "alarm": false, "time": "1709572232123456789"}

Here is corresponding parser configuration:

toml

[[inputs.file]]
files = ["test.json"]
precision = "1ns"
data_format = "json"

tag_keys = ["node"]
json_time_key = "time"
json_time_format = "unix_ns"

text

file,node=node temp=32.3,humidity=23 1709572232123456789

JSON Objects

Given the following data:

json

{
    "metrics": [
        { "node": "node1", "temp": 32.3, "humidity": 23, "alarm": "false", "time": "1678121543"},
        { "node": "node2", "temp": 22.6, "humidity": 44, "alarm": "false", "time": "1678121543"},
        { "node": "node3", "temp": 17.9, "humidity": 56, "alarm": "true", "time": "1678121543"}
    ]
}

Here is corresponding parser configuration:

toml

[[inputs.file]]
files = ["test.json"]
data_format = "json_v2"

[[inputs.file.json_v2]]
[[inputs.file.json_v2.object]]
  path = "metrics"
  timestamp_key = "time"
  timestamp_format = "unix"
  [[inputs.file.json_v2.object.tag]]
    path = "#.node"
  [[inputs.file.json_v2.object.field]]
    path = "#.temp"
    type = "float"
  [[inputs.file.json_v2.object.field]]
    path = "#.humidity"
    type = "int"
  [[inputs.file.json_v2.object.field]]
    path = "#.alarm"
    type = "bool"

text

file,node=node1 temp=32.3,humidity=23i,alarm=false 1678121543000000000
file,node=node2 temp=22.6,humidity=44i,alarm=false 1678121543000000000
file,node=node3 temp=17.9,humidity=56i,alarm=true 1678121543000000000

JSON Line Protocol

Given the following data:

json

{
  "fields": {"temp": 32.3, "humidity": 23, "alarm": false},
  "name": "measurement",
  "tags": {"node": "node1"},
  "time": "2024-03-04T10:10:32.123456Z"
}

Here is corresponding parser configuration:

toml

[[inputs.file]]
files = ["test.json"]
precision = "1us"
data_format = "xpath_json"

[[inputs.file.xpath]]
  metric_name = "/name"
  field_selection = "fields/*"
  tag_selection = "tags/*"
  timestamp = "/time"
  timestamp_format = "2006-01-02T15:04:05.999999999Z"

text

measurement,node=node1 alarm="false",humidity="23",temp="32.3" 1709547032123456000