rfcs/2021-10-12-9568-automatic-namespacing.md
The RFC covers the ability to implicitly namespace Vector configuration based on the configuration directory structure. This provides an easy mechanism for organizing large Vector configuration, something that will become more pronounced as Vector introduces the upcoming Pipelines feature.
None
As Vector evolves and introduces configuration-heavy functionality, like the aggregator role, and the upcoming Pipelines feature, the amount of configuration necessary to program Vector grows large. The ability to organize Vector across multiple files is non-obvious and includes a heavy amount of boilerplate, making the configuration difficult for collaboration and navigation.
To solve for above, we'd like to introduce implicit configuration namespacing based on Vector's configuration directory structure. This aligns the community behind an opinionated method for organizing Vector's configuration, making it easy for users to split up their configuration files and collaborate with others on their team.
--config-dir (let say --config-dir /etc/vector), it will look in every subfolder for any component configuration file, with their filenames being their component ID.# /etc/vector/vector.toml
[sinks.foo]
type = "anything"
can become
# /etc/vector/sinks/foo.toml
type = "anything"
yml, yaml, json, or toml extensions of throw an error.sinks/foo.toml and sinks/foo.json) will error.--config-dir /etc/vector for example), Vector will keep its default behavior and only load the specified configuration file.., like /etc/vector/.data or /etc/vector/.foo.toml), the file/folder will be ignored./etc/vector/foo) with a name that doesn't refer to a component type (like sources, transforms, sinks, enrichment_tables, tests), an error will be thrown./etc/vector/sinks/foo.toml) doesn't have a proper sink configuration structure, Vector will error.fn load_builder_from_dir(path: &Path) -> Result<(ConfigBuilder, Vec<String>), Vec<String>> {
let mut builder = ConfigBuilder::default();
let mut errors = Vec::new();
for child in path.children() {
if child.is_dir() {
match child.name() {
// same with other component types like transforms, sources, tests, enrichment_tables
"sinks" => load_sinks_from_dir(child, &mut builder, &mut errors),
other => tracing::debug!("ignoring folder {}", other),
}
} else {
load_builder_from_file(child, &mut builder, &mut errors);
}
}
}
// same with transforms, sources, tests, enrichment_tables
fn load_sinks_from_dir(path: &Path, builder: &mut ConfigBuilder, errors: &mut Vec<String>) {
for child in path.children() {
if child.is_file() {
match load_sink_from_file(child) {
Ok(sink) => builder.add_sink(child.name(), sink.inputs, sink.inner),
Err(msg) => errors.push(msg),
};
}
}
}
Why is this change worth it?
transforms folder and not read the files in sinks folder.How does this position us for success in the future?
This feature is a nice-to-have for users with large configuration files, but it is not required for any upcoming development. The new Pipelines feature can still function with a single configuration file, but it exacerbates the problem stated in the pain.
Very little. This will have minimal impact to the Vector codebase. Configuration will be loaded and built in one step before configuration is validated. I don't foresee this introduces any meaningful maintenance burden on the team.
Incremental steps to execute this change. These will be converted to issues after the RFC is approved:
transforms from subfoldersinks from subfoldersources from subfolderenrichment_tables from subfoldertests from subfolderNote: This can be filled out during the review process.