docs-v2/design_proposals/event-api-v2.md
After multiple days invested in this direction we still feel that it is not worthwhile to go down this direction: the complexity of adding go channels adds a lot of complexity and the design more error prone for subtle bugs. The current arguments for going towards this direction is the Event API redesign to centralize event management in the Runner. This in itself is a valuable idea but at this point the current design is simpler to reason about this new design with the go channels, plus if we want to extend the events sent by builders, e.g. progress bar events, we will have to send data through the go channels anyway that represents those events and convert them to events, or directly call the event package - at which point the complexity of "handling events from the builders" is going to be the same.
We are open to introduce a streaming/reactive design in Skaffold on the long run if we have a strong enough argument/featureset for it. For this redesign I don't see that it is worthwhile. I am closing this and the related PRs.
Thank you @nkubala and @tejal29 for all the hard work in exploring this avenue in detail in code with testing and thoughtful conversations!
Currently there are two major pain points with the event API:
The second issue here comes more from bad execution rather than bad design. Some fixes for this have already been proposed and/or merged (#1786 and #1801), so progress has already been made, but we can probably improve on this more.
The first issue is what I'll be focusing on in this issue. @dgageot has already merged one improvement to the design here, but this is only improving on an already flawed design. My goal for this redesign will be to reduce and potentially eliminate the overhead the event API has on maintainers and contributors to continue evolving the skaffold codebase.
Right now, events are handled directly through the main loop for each builder and deployer. The internal state is directly updated through the the individual builder/deployer, and anyone adding a new one is responsible for ensuring that these events are passed to the event handler correctly. Examples:
if err != nil {
event.BuildFailed(artifact.ImageName, err)
return nil, fmt.Errorf("building [%s]: %w", artifact.ImageName, err)
}
event.BuildComplete(artifact.ImageName)
and
event.DeployInProgress()
manifests, err := k.readManifests(ctx)
if err != nil {
event.DeployFailed(err)
return fmt.Errorf("reading manifests: %w", err)
}
if len(manifests) == 0 {
return nil
}
Even though the event handling functions have been reduced to one line, the fact that they still have to be manually called from the build or deploy code is a lot of overhead for contributors, especially people who are considering adding a new builder and deployer.
Every build or deploy that skaffold performs always comes through the runner, so it would follow that the runner could handle all of the eventing for builds and deploys, as long as it can retrieve the necessary information from the builder or deployer.
To address this, I propose that we change the Build() interface to return a list of build results, one for each artifact, with information about the original artifact skaffold attempted to build, and either the built artifact, or the error the builder encountered.
This will allow the runner to:
With this information, we could move all of the event handling code into a decorator that we wrap the builders and deployers in during the creation of the runner, similar to the way we handle timings currently. The semantics for each state (Not Started, In Progress, Complete, Failed) and the events that transition the states should all be the same between builders and deployers, and we can share the logic for handling each of these events in a separate location. This has the added bonus of not requiring any event-related code to be written by contributors if/when they add new builders/deployers, or make changes to existing ones.
Implementing this should be pretty straightforward:
The current method signature is
Build(ctx context.Context, out io.Writer, tags tag.ImageTags, artifacts []*latest.Artifact) ([]Artifact, error)
This will be changed to return a list of build results for each artifact, along with the artifact information itself. The artifact information can be embedded in the build result:
type BuildResult struct {
Target *latest.Artifact
Result *build.Artifact
Error error
}
and the new method signature will be
Build(ctx context.Context, out io.Writer, tags tag.ImageTags, artifacts []*latest.Artifact) ([]BuildResult, error)
The runner can then take control of all the eventing code. Deploy results are already returned by the Deploy() interface implementations, so with the Build results we have all the information we need. Using Build() as an example (Deploy() will look similar):
BuildInProgress event is triggeredBuildFailed event for results with errors, and a BuildComplete event for those without.for _, a := range artifactsToBuild {
event.BuildInProgress(a)
}
bRes, err := r.Build(ctx, out, tags, artifactsToBuild)
for _, res := range bRes {
if res.Error != nil {
event.BuildFailed(res.Target, res.Error)
} else {
event.BuildComplete(res.Target)
}
}
The main drawback here is that for builds run in parallel, the statuses will not truly reflect reality until all builds are completed, but I don't believe users will see much of an effect from this.
The current integration tests around eventing and retrieving skaffold state should be sufficient to cover this new change. Skaffold runs are triggered, the state is retrieved at various times during the run, and the tests verify the correct results are retrieved.
Additional unit tests can be added as necessary to ensure the correct event handling methods are being called from the runner.