design/defunct/design-doc-reconciler-patterns.md
While Kubernetes declarative style API probably one of the best features of Kubernetes as a platform, Kubernetes API extensibility is arguably one of the most exciting ones. Using Kubernetes Operators patterns user can extend Kubernetes API types to support Kubernetes internal or external resources. Controller-Runtime (defacto standard for Kubernetes operator frameworks: Kubebuilder, Operator SDK) does a great job simplifying the development process for Kubernetes operator by providing default project scaffolding and infrastructure libraries plumbing. Nevertheless, using a Declarative Style API (Kubernetes) to perform Imperative API operation (Managed Cloud Resources) is less than a straight-forward process.
Crossplane developers is the expected audience for this document, and as such it is expected that readers have some familiarity with the controller-runtime.
The primary goal of this document is to capture and communicate the lessons learned during the development of many Crossplane controllers and reconcile loops. This document aims to help Crossplane contributors write and extend controllers that:
This document is not blindly prescriptive, but where patterns are recommended they should be followed unless there is clear and obvious reason not to. The recommendations made in this document should be seen as a "golden path"; a beneficial collection of principals and patterns that the Crossplane community have found facilitate the creation of successful controllers. While there is no one pattern that is strictly applicable to all controllers the Crossplane maintainers feel the project benefits from consistency in implementation where possible.
Kubernetes Object is defined with status sub-resource, with crossplane common as well as a provider-resource specific set of attributes.
Most (if not all) managed cloud resources have a common requirement for unique resource identification (resource id).
Crossplane by itself is not opinionated as for specific value or format of the resource id, as long as the value represents a unique identifier.
Kubernetes API assigns a unique UID metadata value to every single Kubernetes.
It became a common practice in crossplane (with few exception) to incorporate object UID value in managed resource naming convention.
For example: gke-d8b86f26-67b8-11e9-ac04-9cb6d08bde99 where gke is a prefix value, followed by the object UID.
There are, however, exceptions to this rule for some cloud provider, resource combination. For example, there could be a restriction on resource name in terms of length or regex value.
The status of the managed resource reported by the cloud provider API.
The enumeration of current and past object condition states as assigned by the controller. Each condition has the following attributes:
Supported object status condition type: None (no conditions), Pending, Ready, and Failed.
One can think of Condition value as a state in which the controller was as it was reconciling a given object. Conditions are expected to be unique in terms of the condition type values, i.e. there should be only one entry for a given condition type irrespective of the condition status. Creating and Deleting conditions are a clear example since one can create and delete any given object only once. However, the case with Ready, Updating and Failed conditions is less clear since the controller can encounter (enter/exit) those conditions multiple times throughout the object lifecycle. Conditions are expected to reflect the object’s current and “some” past states, i.e. Conditions can be viewed as a partial (or most recent) history of the object processing.
Since the main intent for Status Conditions to have an informational purpose the controllers <span style="text-decoration:underline;">should not</span> use condition values as a determinant for the reconciler execution path. Instead, in most (if not all) cases controllers <span style="text-decoration:underline;">should</span> use managed Resource Status and Properties values in reconciliation against the desired state and specs.
Special condition type typically indicating that the controller sees this object for the very first time. In the future, None could be replaced with a default condition value representing the initial state by the mutating webhook.
This condition is activated by the controller when it processed a create managed resource request. The Creating condition is deactivated by the controller when the resource creation process has been completed and resulted in the activation of either:
Note, there could be other causes for the Failed condition to be activated as a result of intermittent failures. In those cases, it is expected for the Creating condition to remain active (see Failed Condition for more details).
In addition to the create request completion, this condition can be preempted and deactivated by the Delete call, i.e. Deleting condition, since it appears most cloud providers allow resource deletion irrespective of the current state.
This condition is activated by the controller when it has processed a delete managed resource request. Most commonly, this condition is very short lived due to the object being garbage collected by the Kubernetes API. However, there could be cases involving multiple finalizers when the object may remain in deleting stated for quite some time (this also could be an indication of the problem).
The Deleting condition is a terminal condition. In most cases all subsequent updates to the object specification should be ignored once this condition is activated, i.e. a user cannot issue an update to the object that previously has been deleted, even if it is still returned by the Kubernetes API. Similar to the previous conditions, in cases of intermittent failures it is possible for Deleting condition to be active at the same time as Failed condition (see Failed Condition for more details)
This condition is activated by the controller when it needs to adjust the managed resource configuration or settings:
Similar to Create condition, the Update condition can be preempted by the Delete call. Also similar to Create condition, an active Update condition can coexist with an active Failed condition (see Failed Condition for more details).
This condition is activated when the managed resource status reflects that this resource is ready to be consumed. Note there could be multiple discrete values for managed resource state that match Ready condition. It is up to the resource type/controller implementation to decide the set of resource states that match this condition.
Upon reaching this condition the controller is expected to generate or update the resource connection secret with most current values: endpoints, credentials, etc. In addition, controllers are expected to perform check for the spec update, if update functionality is implemented (see Updating Condition section). There could be some additional tasks performed by the controller for objects in Ready condition, which could vary per type/controller.
For Kubernetes internal types, once the object reaches the Ready state, typically, there are no further reconciliation cycles required and the External Update or SyncPeriod are the only requeue triggers.
The situation is slightly different from the external resources point of view, where resource could be changed outside of Kubernetes API scope (manually, programmatically or by the resource cloud API).
Thus, it is important to perform routine checks on the external resource.
The downside of using SyncPeriod to manage resource states is - it has to be set to aggressively short duration period to be effective. For example, the default value 10h is hardly optimal when trying to detect and correct changes in managed resource state. The downside of the short duration period is it creates unnecessary churn since it will result in requeue of <span style="text-decoration:underline;">all objects managed by all controllers</span> under the controller manager.
As an alternative to both reconcile.Result{} + SyncPeriod, we can a delay duration (configurable or hardcoded) for a specific type. Thus, upon successful reconciliation instead of:
return reconcile.Result{Requeue: false}, nil
We can use:
return reconcile.Result{RequeueAfter: reqeueuDelayOnSuccess}, nil
Where requeueDelayOnSuccess is specifically tailored for a given Type/Controller
Similar to all conditions above, the active _Ready _condition can coexist with the active Failed condition (see Failed Condition for more details).
Also similar to conditions above, the active Ready condition can be preempted by the Delete call.
This is the most tricky condition since it can be activated in multiple places and by multiple reasons.
All errors encountered by the controller during reconciliation loop could be grouped into two categories:
In addition to Handled errors, the Failed condition can be activated by the controller to reflect current resource status. The managed resource could enter a failed state as defined by the cloud provider API for multiple reasons:
These are all possible status values for RDS DB Instance and RDS DB Cluster. As you can see, there are multiple statuses that correspond to the resource being in a Failed state. Moreover, some failed states may be recoverable while others are not.
Crossplane implementation is based on the controller-runtime framework where controllers are expected to provide an implementation for a Reconcile Interface
func (r *Reconciler) Reconcile(req reconcile.Request) (reconcile.Result, error){
...
}
<sub>Fig 1. Reconcile function</sub>
reconcile.Request with key property for Kubernetes object retrieval
The Reconcile functions return reconcile.Result and error which are used by the controller to handle this object in terms of further processing.
That is a trick question. The answer is “Yes, Requeue”. The real question is “When to Requeue?”, and the answer to this question is determined by the combination of both result and error values, as well as the state of the object, and the global controller manager sync period.
Per controller-runtime documentation:
SyncPeriod determines the minimum frequency at which watched resources are
reconciled. A lower period will correct entropy more quickly, but reduce
responsiveness to change if there are many watched resources. Change this
value only if you know what you are doing. Defaults to 10 hours if unset.
Object state can be viewed in a binary “Dirty/Clean” format, with:
Description: the controller is done with the reconciliation of this object and there are no additional operation steps needed (ever), i.e. the only ways for this object to be queued again are via SyncPeriod or External Update.
Best fit:
Description: the controller encountered and handled the error during the object reconciliation.
Best fit: The object processing which resulted in an active Failed status condition. The reason why we want to explicitly set {Requeue: true} is to handle the case of multiple (or perpetual) failures. For example: when the controller attempts to create the managed resource for the first time and gets resource API error, it will set (activate) Failed condition, update the status, and return “{Requeue: true}, nil”. In this specific case, since the state of the object status has changed (i.e. “dirty state”), the value of “{Requeue: true}” is superfluous, since the object will be immediately requeued anyways. However, if during the second (and all following) reconciliation attempt the controller receives the same error, it will set (activate) Failed condition which is already active, thus updating resource status will result in “noop”, i.e. (“clean state”). This is the reason why we want to explicitly instruct the controller to “{Requeue: true}” with exponential back-off.
Description: the controller successfully processed reconciliation of the object.
Best fit:
Description: this is the case when we are returning “unhandled” error, at which point the value of “reconcile.Result{}” is irrelevant, thus all three: “reconcile.[Result{},Result{Requeue: true},Result{RequeueAfter: duration}, err” are semantically equivalent.
<sub>Fig 2. Reconcile Loop</sub>
One of the first steps of every reconcile function it to retrieve the kubernetes object based on request key property. If the object retrieval results in error, this could be either:
Establish a resource client base on the cloud provider credentials. If a connection results in failure, update the object status sub-resource and exit reconcile loop. If the connection is successful, the Connect should return a Resource Operation Handler which provides support for the full set of Resource and Object related operations.
Check for DeletionTimestamp property to determine if this object needs to be deleted. If yes - call Delete function, update the status and return. For more details see Delete function in SyncDeleter interface section.
Depending on the current state of the object the controller will perform a set of tasks needed in an attempt to bring the managed resource to the desired state. For more details see Update function in SyncDeleter interface section.
Controller-runtime does a great job by abstracting all operator paradigm infrastructure: watchers and listers and provides a developer with a clean and simple Reconciler interface with a single function: Reconcile(Request) (Result, error). Perhaps, writing a Kubernetes internal resource reconciliation could be achieved in a single function call (although I doubt that is possible especially with the manageable cognitive complexity), it is proven hands down impossible for external (managed) resources. Crossplaine reconciliation paradigm has organically evolved (and still evolving). As a result, one can observe multiple organizational patterns:
This document will focus and propose and expand on the latter, however, as mentioned earlier, the search for “perfect reconciler” is not complete.
The resource operation interface is a union of all resource operations across multiple functional areas. In Fig 2. “Connect” operation returns Resource [Operations] Handler, which implements the Resource Operations interface.
Sample of the Resource Operations interface:
type operations interface {
// Resource object operations
addFinalizer()
removeFinalizer()
isReclaimDelete() bool
getSpecAttrs() v1alpha1.ResourceSpecAttrs
setSpecAttrs(*group.ResourceAttrs)
setStatusAttrs(*group.ResouceAttrs)
setReady()
failReconcile(ctx context.Context, reason, msg string) error
// Controller-runtime operations
updateObject(ctx context.Context) error
updateStatus(ctx context.Context) error
updateSecret(ctx context.Context) error
// Managed Resource Client operations
createResource(ctx context.Context, args string) error
deleteResource(ctx context.Context) error
updateResource(ctx context.Context, args string) error
getAttributes(ctx context.Context) (*group.ResourceAttrs, error)
}
Provide a definition for operations against the given resource type (Kubernetes object). Note while there are a common set of operations across all resources (“setReady”, “isReclaimDelete”, etc), some operations may be resource type specific (“getSpecAttr”, etc). Resource operations are implemented by the Resource Handler.
Provide a definition for the object runtime operations. Typically those are limited to above three.
Provide a definition for managed resource handling operations and considered as integration layer with the managed resource client libraries.
This interface is responsible for integrating the Resource Operations Creation process with Reconciler, with the main purpose of test enablement.
Example:
type maker interface {
newSyncDeleter(context.Context, *v1alpha1.Resource) (syncdeleter, error)
}
This interface provides high-level object handling. As per Fig 2. Reconcile function performs two types of object operations: Sync or Delete. Those operations are facilitated by SyncDeleter interface.
Example:
type syncdeleter interface {
delete(context.Context) (reconcile.Result, error)
sync(context.Context) (reconcile.Result, error)
}
Both “sync” and “delete” function are expected to handle object status update and return reconcile.Result + error.
<sub>Fig 4. Sync</sub>
</td> </tr> </table>This interface provides resource Sync functionality, i.e. controller should either create or update a given managed resource.
type createupdater interface {
create(context.Context) (reconcile.Result, error)
update(context.Context, *storage.BucketAttrs) (reconcile.Result, error)
}
Create function starts with password retrieval (see <em>Password</em> section below for more details).
<p> If password retrieval fails - Failed condition is activated and reconcile.Result{Requeue: true} is returned to the caller. <p> Note, depending on the resource there could be additional preparation <p> ion steps required: <ul> <li>Multiple credentials <li>Other additional properties <h6>Resource Creation</h6> <p> When all required preparation steps are successfully completed, controller issues Create Resource call using cloud provider Resource Client. Note, most commonly this is a non-blocking call. <p> If Create Resource request resulted in an error, this means that the resource creation did not start. Most commonly the errors are due to invalid parameter/values combinations. <p> If Create Resource was successful, the client typically returns a tracking object reference, using which we can further interrogate the cloud provider API about the status of this operation. <p> <strong>Note</strong>: it is up to the controller and/or client implementation whether to use tracking operation object. Since the managed resource state should reflect resource status, this could be a sufficient mechanism to track managed resource Readiness or Failure. However, in case of the operation failed and depending on the cloud provider the failure details could be only available via tracking operation object status. <h6>Spec Update</h6> <p> When applicable, upon successful creation, the controller can perform managed resource properties retrieval in order to save them back to the object spec. This is done to capture any default property values that were set by the managed resource API. </li> </ul> </td> <td><sub>Fig 5. Create</sub>
</td> </tr> <tr> <td> <h6>Password</h6>Most managed resources require credentials properties upon creation (typically administrator password, etc.)
<p> Managed resource password is generated by the crossplane controller and persisted into the Kubernetes Secret object (Secret object reference should be reflected in the resource object status sub-resource). <p> First, controller checks if the connection secret already exists and if it contains a password value. If so, the controller returns that password value. Otherwise, the controller generates new password value and saves it into the connection secret before returning it to the caller. <p> Note, while password generation errors are treated as Handled Errors, in reality, those are non-recoverable system errors. <p> Secret upsert error is a Handled Error. <p> All errors are returned to the caller. </td> <td><sub>Fig 6.Credentials</sub>
</td> </tr> </table>The controller captures managed resource status and updates object status sub-resource (typically “state” field).
<h6>Update Check</h6> <p> The controller compares the managed resource attributes with the object spec to detect if there is a need for an update operation. If there is, the controller performs the update on the resource, followed by managed resource properties retrieval and object spec update (similar to create operation). If there is no need for an update, the controller should return: <p> <strong>reconcile.Result{RequeueAfter: successDuration}</strong> <p> Where successDuration is requeue delay upon successful. </td> <td><sub>Fig 7. Update</sub>`
</td> </tr> </table>The closest examples of the controller following this design paradigm are: