rfc/20251105-refactor-tofu-cli.md
Due to multiple compounding legacy reasons, at the moment of writing this RFC, the whole startup procedure of OpenTofu is heavy with too many responsibilities. It also performs several initialisations just to drop them in case of early exit requests:
metadata functions commandNext, we would like to talk shortly about the main reasons of this RFC:
Meta structure grew over years into a hard to maintain layer, containing tons of logic that is used
by different commands in different ways. We want to alleviate some of that maintenance cost from the future.This RFC attempts to propose some approaches to rework this and will tackle different challenges that we might encounter in doing so. This RFC is not meant to be an exhaustive guide on how to approach such a change, but it is meant to provide the rules and draw the boundaries on which this refactor should stay in.
Before kicking this off, there are several requirements that need to be noted and some things that we need to be mindful about:
realMain function performs some common initialisations, the general UX of OpenTofu
prompts the user right away about the issues with its environment (like invalid CLI configuration) which helps
with pin pointing from the get go issues that user can tackle right away.
Even though this is having the advantage of having "one central place" to validate basic environment configuration,
it might change in terms that the commands that act on nothing to do related to providers, config or state, might
not error on the wrong environment configuration of the user. This change should be adopted mostly for
the commands that are purely informative or acting on other resources (like fmt, --version, metadata functions).
The current behavior should be kept for any command that acts on critical parts of a user configuration like providers,
configuration, modules, state, etc.This proposal wants to tackle 2 main things:
In the end, the idea is to have a more streamlined flow of operations:
mainrealMain functionLooking at the realMain function,
can be seen that there is some logic that should reside inside the commands, instead of being executed
before running the layer that decides what command is invoked.
What I propose in this particular case, is to keep this function with critical logic untouched and extract the specific logic bits in their own abstractions to be used later.
[!NOTE] Even though we should keep the "critical logic" bits in
realMain, would be advisable to extract these in different methods for a more concise content of therealMainfunction.
To make things clear, let's categorise these bits:
The bits under "OpenTofu Specific functionalities" must be extracted in their own abstractions and moved in a different layer where will be executed only when actually required.
Some examples of commands that don't need to execute some (or none) of the bits in the aforementioned category are:
metadata functionsfmtversionworkspacegetOne particular functionality that we need to experiment with to understand better how we could handle it
in isolation, is the processing of TF_CLI_ARGS env var(s). But for the moment it can stay in realMain.
[!NOTE] One idea that looks promising is to use the prefixed environment variables from sfp13/viper, but this is out of scope of this RFC.
Meta structureThe current Meta structure contains way too many configurations and responsibilities, being used as a
container to carry common information and logic between main and other parts of the system.
Due to the high complexity of the implementation around flags, there is hardly a specific "recipe" on how to refactor all of this and it will have to be approached different case by case. This chapter will try to highlight mostly the end goals that we strive for.
In the end, all the logic that today lives in the Meta structure should be extracted and used from its
own abstractions.
The way flags are configured and parsed today (generally) includes the following steps:
-backend vs -cloud)[!NOTE] Before creating new structs to hold the functionality and the flags for a particular logic bit, first, should be checked if the already existing implementation in
command/argumentscould be used moving forward.If possible, we should build on that, that package responsibility being to do specifically the "validate" part of the points above.
Since many Meta arguments are used as containers for flag values, we want to extract logically grouped
flags in specific functional structs and implement in those structs the steps listed above.
In many cases, the flags extraction will force the logic around those to be extracted too, in the same iteration, in a way to allow building the associated components based on the given arguments.
To be able to extract all functionality out of the Meta without breaking anything, this needs to be done incrementally
starting from the components with no dependencies and find the way through the entire functionality.
At a first look, there are already some bits that can be extracted in their own components:
-chdir logic from the realMain and be given as dependency into the logic
that relies on itThen, there are some bits that depend mostly (if not solely) on flags and default configuration:
init) that still use both, UI and View that will make this particular
proposal a little bit convoluted on this area, but nothing impossibleUI and View will be extracted in its component,
we can continue with the old refactor to unify the way json/human views are builtTherefore, starting with the least dependent bits will allow walking down the chain and extract everything in separate
components that can be instantiated inside the Run method of the commands.
[!NOTE] About the
ui/view. After did some work proposed by this RFC, became clearer that we can continue without the migration of all commands to the Views concept from the old UI, but in doing so will add more shim code to make things work properly.Therefore, as part of this RFC, we can carry on the migration of all the commands to the Views abstraction, which will make any subsequent work way easier.
During this work, all logically grouped flags should be moved to these structs and allow the struct logic to record these on a particular FlagSet or parse the values for those directly.
E.g.:
type Workdir struct {
chdir *string
}
func (w *Workdir) ParseFlags(args []string) error { ... }
// or
func (w *Workdir) RecordFlags(in *flag.FlagSet) {
in.StringVar(&w.chdir, "chdir", "", "Switch to a different working directory before executing the given subcommand.")
}
[!NOTE]
To make it easier to be reviewed and iterate on the implementation, we can make use of the
Metastruct as the container of each component instance. Later, those can be created in the commandsRun()method directly once we manage to extract all the logic out of theMetastruct.
As this will progress, more and more components will expose their dependencies on other components, so those will have to be chained accordingly, leading in the end to multiple components, with specific responsibility and visible dependencies between one another.
[!NOTE] It's really important to have the components as single purpose as possible to allow easier composition later when we would have to instantiate only the required components by each command.
This is suggested as such because we might be actually able to improve the startup performance if the refactor will allow initialisation only of the components needed by each command.
The most complex and sensitive part of the Meta structure is the backend implementation.
The implementation visible in meta_backend.go
and in meta_backend_migrate.go
tackles a lot of edge cases and legacy concerns that the reasons behind it is not that clear.
Therefore, this RFC would want only to extract these 2 files in their own components (or just only one component),
to make things clearer on what it depends on. Some clear dependencies are:
-state, -state-out, -backup, -lock, -lock-timeout)-reconfigure, -migrate-state, -force-copy)-input)
Before even attempting backend implementation isolation, at least the list of dependencies above should be handled.
chdir where CLI
configuration needs to be loaded before executing chdir to reference the initial workdir. Do we have other
bits similar to that that you are aware of?We need to ensure that the current unit tests keep working and the changes on those is minimal and as non-functional as possible.
As long as the requirements on top of this RFC are respected any other way to do this refactor can be considered a "potential alternative".