docs/tutorials/lognamespacing.md
This walks through the steps required to add log namespacing to a given source.
Log Namespacing is a new feature in Vector that allows different fields of the Log event to be kept under separate namespaces, thus avoiding conflicts where two different fields try to use the same name. Log Namespacing does not apply to Metric or Trace events.
Add the following field to the Config struct:
/// The namespace to use for logs. This overrides the global setting.
#[configurable(metadata(docs::hidden))]
#[serde(default)]
pub log_namespace: Option<bool>,
Currently, because log namespacing is an unreleased feature we add the docs::hidden
attribute so it doesn't appear in the documentation.
The configuration is currently just a bool, switching it on or off. When we come to
retrieve the actual namespace to use we merge it in with the globally configured one.
This is passed in via the SourceContext parameter.
impl SourceConfig for DnstapConfig {
async fn build(&self, cx: SourceContext) -> Result<super::Source> {
let log_namespace = cx.log_namespace(self.log_namespace);
The cx.log_namespace function gives us a LogNamespace enum that we can use to
set the fields in the appropriate section of the Event.
This log_namespace variable needs to be passed to any functions that will insert
data into the log event that is emitted by the source.
The Vector namespace contains data pertinent to how the event was ingested into
Vector. Currently two fields need to be added to this namespace - ingest_timestamp
and source_type:
self.log_namespace.insert_vector_metadata(
&mut log_event,
path!(self.timestamp_key()),
path!("ingest_timestamp"),
chrono::Utc::now(),
);
self.log_namespace.insert_vector_metadata(
&mut log_event,
path!(self.source_type_key()),
path!("source_type"),
DnstapConfig::NAME,
);
If we look at the parameters to insert_vector_metadata:
pub fn insert_vector_metadata<'a>(
&self,
log: &mut LogEvent,
legacy_key: impl ValuePath<'a>,
metadata_key: impl ValuePath<'a>,
value: impl Into<Value>,
)
This needs to be the log event that is being populated.
This is the name of the field the timestamp is to be inserted into when using the Legacy Namespace.
The value for this field comes from a number of different places.
log_schema() eg. log_schema().source_type_key()
or log_schema().timestamp_key().kafka source will allow the
user to specify the topic_key - the field name that will contain the
kafka topic the event was consumed from.dnstap source
creates an event with an object where most of the field names are hard coded.The name of the field when it is inserted into the Vector namespace. This
will be path!("ingest_timestamp") or path!("source_type"). The field names
can be hard coded since they are going into the Vector namespace, so conflicts
with other field names cannot occur.
It should be noted that the values for these field names are typically
hard coded. With the kafka source, for example, it was possible to configure
the field name that the topic was inserted into. In the Vector namespace
this field name is just hard coded to topic. Allowing the user to configure
the fieldname was only necessary to prevent name conflicts with other values
from the event. This is no longer an issue as these values are now placed in a
separate namespace to the event data.
The actual value to be placed into the field.
For the ingest timestamp this will be chrono::Utc::now(). Source type will be
the NAME property of the Config struct. NAME is provided by the
configurable_component macro. You may need to include use vector_config::NamedComponent;.
For batches of events, each event in the batch should use a precalculated
Utc::now() so they all share the same timestamp.
A utility function has been provided that can be used in a lot of cases to insert both these fields into the Vector namespace:
log_namespace.insert_standard_vector_source_metadata(
log,
KafkaSourceConfig::NAME,
Utc::now(),
);
Other fields that describe the event - but are not the actual data for the event should go into the source metadata. Examples of source metadata are:
To insert source metadata:
log_namespace.insert_source_metadata(
SyslogConfig::NAME,
log,
Some(LegacyKey::Overwrite("source_id")),
path!("source_id"),
default_host.clone(),
);
Let's look at the parameters:
pub fn insert_source_metadata<'a>(
&self,
source_name: &'a str,
log: &mut LogEvent,
legacy_key: Option<LegacyKey<impl ValuePath<'a>>>,
metadata_key: impl ValuePath<'a>,
value: impl Into<Value>,
)
The name of the source. This will be eg. KafkaSourceConfig::NAME.
The log event to populate.
The field name to populate for the legacy namespace. Pass None if
this field should not be inserted for Legacy. Because there is a
possibility that the field might conflict with another field that
is already in the event what to do in the case of conflicts must
also be specified. LegacyKey::Overwrite will overwrite the existing
value with this value. LegacyKey::InsertIfEmpty keeps the original
value.
The name of the path to insert into the Source metadata when in
the Vector namespace. Because there is no chance of conflicting names
here, this is typically just a hard coded value. eg. path!("topic")
The actual value that is to be inserted into the metadata.
The main log event should contain only the real log message that the event is representing.
For the Vector namespace the data should be at the top level and not
contained in any subfields. For an event that is a single String value -
typically, in the Legacy namespace this will be inserted in a field
called message. In the Vector namespace the event will be just this
String value.
In this case code that creates an event typically looks similar to:
let mut log = match log_namespace {
LogNamespace::Vector => LogEvent::from(message),
LogNamespace::Legacy => {
let mut log = LogEvent::default();
// Add message
log.insert(log_schema().message_key(), message);
log
}
};
Other fields should be inserted into the event like:
log_event.insert(event_path!("path"), value);
We need to talk about timestamps. A timestamp can represent a number of different things:
It is worth recognising that existing sources have not always been consistent with this. Some sources would insert a timestamp that is extracted from the event but default to the ingest timestamp if it didn't exist. Others insert the timestamp extracted from the event and don't insert a timestamp at all if it didn't exist. Others will always insert the ingest timestamp. To maintain backward compatibility there is a few areas in the code base that do some seemingly overly complicated things with timestamps. It is worth bearing this in mind when looking through existing new code.
All new sources should work like the above and should not permit users to configure custom field names for metadata.
All sources need to specify their schema - a definition of the shape of the event that it will create.
The schema definition is returned from the outputs function defined
by the SourceConfig trait.
fn outputs(&self, global_log_namespace: LogNamespace) -> Vec<Output> {
let log_namespace = global_log_namespace.merge(self.log_namespace);
Most sources have a decoder option that will specify the initial schema. One can retrieve the schema by calling:
let schema_definition = self
.decoding
.schema_definition(log_namespace)
We need to add the metadata that has been adding to the Vector namespace:
.with_standard_vector_source_metadata()
Next we need to add any source metadata that is created by the source.
.with_source_metadata(
NatsSourceConfig::NAME,
legacy_subject_key_field,
&owned_value_path!("subject"),
Kind::bytes(),
None,
);
Let's look at the parameters:
pub fn with_source_metadata(
self,
source_name: &str,
legacy_path: Option<LegacyKey<OwnedValuePath>>,
vector_path: &OwnedValuePath,
kind: Kind,
meaning: Option<&str>,
) -> Self
The name of the source - typically something like NatsSourceConfig::NAME
The pathname of the field when inserting in the Legacy namespace. This should be the
same value as used when inserting the data with insert_source_metadata.
The pathname of the field when inserting in the Vector namespace. This should be the
same value as used when inserting the data with insert_source_metadata.
This is the type the data will be. This is covered in detail below.
Some fields are given a meaning. It is possible in VRL to refer to a field by it's meaning regardless of what name has been given to it. Fields with the following meaning are used in Vector:
This list is not definitive and likely to be updated over time.
Most fields will not have a given meaning, in which case just pass None.
The core principle behind schemas is defining the type, or kind, of data that will exist in this field. The following kinds are supported:
Any string value.
An integer value - in Vector this will be a signed 64 bit integer.
A 64 bit float value.
Boolean value - either true or false.
A timestamp in the UTC timezone.
An array of values. It is possible to specify the type for any element within the array eg. this array will be an array of strings.
Kind::array(Collection::empty().with_unknown(Kind::bytes()))
It is also possible to specify the type for specific indexes in the array eg. this array will have a string at index 0 and an integer at index 1:
Kind::array(Collection::empty()
.with_known(0, Kind::bytes())
.with_known(1, Kind::integer()))
These can also be combined. For example an array of strings apart from the third index, which will be a timestamp:
Kind::array(Collection::empty().with_unknown(Kind::bytes())
.with_known(3, Kind::timestamp()))
An object is a map of keys to values. Similar to an array, an object can specify the type for all fields as well as the type for specific fields.
An object where all fields will be strings, but doesn't specify what those field names are:
Kind::object(Collection::empty().with_unknown(Kind::bytes()))
An object with two fields - reason containing a string and value
containing an integer:
Kind::object(Collection::empty()
.with_known("reason", Kind::bytes())
.with_known("value", Kind::integer()))
It is possible to represent a field that could be one of several types.
For example, a string or an integer:
Kind::bytes().or_integer()
Often a field may not exist at all, for that we have or_undefined().
For example, an object with a field called reason that may not exist,
but if it does it will be a string:
Kind::object(Collection::empty()
.with_known("reason", Kind::bytes().or_undefined()))