docs/proposals/post-deployment-hooks.md
A proposal for a deployment code execution API integrated with the deployment lifecycle.
Deployment hooks are needed to provide users with a way to execute arbitrary commands necessary to complete a deployment.
Goals of this design:
There are two fundamental approaches to solving each deployment hook use case: existing upstream support for container lifecycle hooks, and the externalized deployment hooks outlined in this proposal. The following describes the two approaches, and each use case is evaluated in terms of these approaches.
Kubernetes provides container lifecycle hooks for containers within a Pod. Currently, post-start and pre-stop hooks are supported. For deployments, post-start is the most relevant. Because these hooks are implemented by the Kubelet, the post-start hook provides some unique guarantees:
Because deployments are represented as replication controllers, lifecycle hooks defined for containers are executed for every container in the replica set for the deployment. This behavior has complexity implications when applied to deployment use cases:
An alternative to the upstream-provided lifecycle hooks is to have a notion of a hook which is a property of an OpenShift deployment. OpenShift deployment hooks can provide a different set of guarantees:
Hooks can be defined to execute before or after the deployment strategy scales up the deployment. When implementing a hook which runs after a deployment has been scaled up, there are special considerations to make:
New revisions of a Rails application often contain schema or other database updates which must accompany the new code deployment. Users should be able to specify a hook which performs a Rails migration as part of the application code deployment.
Database migrations are complex and introduce downtime concerns. Here are some examples of zero-downtime Rails migration workflows.
Deployments including database migrations must make special considerations:
The workflows which are effective at ensuring zero downtime migrations are typically multi-phased. For a user orchestrating a zero downtime migration deployment, it's likely the user needs to verify each deployment step discretely, with the option to abort and rollback after each phase.
Consider this simple example of a phased deployment which adds a new column:
Container lifecycle hooks introduce problems with Rails migrations:
Deployment hooks satisfy this use case by providing a means to execute the hook only once per logical deployment. The hook is expressed as a run-once pod which provides the migration with its own resource allocation decoupled from the application.
Consider an application whose deployment should result in a cloud API call being invoked to notify it of the newly deployed code.
Container lifecycle hooks aren't ideal for this use case because they will be fired once per pod in the deployment during scale-up rather than following the logical deployment as a whole. Consider an example deployment flow using container lifecycle hooks:
A post-deployment hook would satisfy the use case by ensuring the API call is invoked after the deployment has been rolled out. For example, the flow of this deployment would be:
Deployment hooks are implemented as run-once pods which can be executed at one or both of the following points during the deployment lifecycle:
Hooks designated as mandatory should impact the outcome of the deployment.
There are a few possible ways to handle a failed mandatory pre-deployment hook:
This proposal prescribes the use of option 1 as being the simplest starting point for the hook API.
Failed mandatory post-deployment hooks are more challenging:
Due to the complexities of automated rollback, this proposal limits the scope of failure handling for post-deployment hooks: post-deployment hooks cannot be considered mandatory at this time. This limitation may be lifted in the future by an separate proposal.
When a deployment hook fails:
More reporting capabilities could be addressed in a future proposal.
The DeploymentStrategy gains a new Lifecycle field:
type DeploymentStrategy struct {
// Type is the name of a deployment strategy.
Type DeploymentStrategyType `json:"type,omitempty"`
// CustomParams are the input to the Custom deployment strategy.
CustomParams *CustomDeploymentStrategyParams `json:"customParams,omitempty"`
// Lifecycle provides optional hooks into the deployment process.
Lifecycle *Lifecycle `json:"lifecycle,omitempty"`
}
// Lifecycle describes actions the system should take in response to
// deployment lifecycle events. The deployment process blocks while
// executing lifecycle handlers. A HandleFailurePolicy determines what
// action is taken in response to a failed handler.
type Lifecycle struct {
// Pre is called immediately before the deployment strategy executes.
Pre *Handler `json:"pre,omitempty"`
// Post is called immediately after the deployment strategy executes.
// NOTE: AbortHandlerFailurePolicy is not supported for Post.
Post *Handler `json:"post,omitempty"`
}
Each lifecycle hook is implemented with a Handler:
// Handler defines a specific deployment lifecycle action.
type Handler struct {
// ExecNewPod specifies the action to take.
ExecNewPod *ExecNewPodAction `json:"execNewPod,omitempty"`
// FailurePolicy specifies what action to take if the handler fails.
FailurePolicy HandlerFailurePolicy `json:"failurePolicy"`
}
The first handler implementation is pod-based:
// ExecNewPodAction runs a command in a new pod based on the specified
// container which is assumed to be part of the deployment template.
type ExecNewPodAction struct {
// Command is the action command and its arguments.
Command []string `json:"command"`
// Env is a set of environment variables to supply to the action's container.
Env []EnvVar `json:"env,omitempty"`
// ContainerName is the name of a container in the deployment pod
// template whose container image will be used for the action's container.
ContainerName string `json:"containerName"`
}
Handler failure management is policy driven:
// HandlerFailurePolicy describes the action to take if a handler fails.
type HandlerFailurePolicy string
const(
// RetryHandlerFailurePolicy means retry the handler until it succeeds.
RetryHandlerFailurePolicy HandlerFailurePolicy = "Retry"
// AbortHandlerFailurePolicy means abort the deployment (if possible).
AbortHandlerFailurePolicy HandlerFailurePolicy = "Abort"
// ContinueHandlerFailurePolicy means continue the deployment.
ContinueHandlerFailurePolicy HandlerFailurePolicy = "Continue"
)
ExecNewPodAction pods will be associated with deployments using new annotations:
const (
// PreExecNewPodActionPodAnnotation is the name of a pre-deployment
// ExecNewPodAction pod.
PreExecNewPodActionPodAnnotation = "openshift.io/deployment.lifecycle.pre.execnewpod.pod"
// PreExecNewPodActionPodPhaseAnnotation is the phase of a pre-deployment
// ExecNewPodAction pod and is used to track its status and outcome.
PreExecNewPodActionPodPhaseAnnotation = "openshift.io/deployment.lifecycle.pre.execnewpod.phase"
// PostExecNewPodActionPodAnnotation is the name of a post-deployment
// ExecNewPodAction pod.
PostExecNewPodActionPodAnnotation = "openshift.io/deployment.lifecycle.post.execnewpod.pod"
// PostDeploymentHookPodPhaseAnnotation is the phase of a post-deployment
// ExecNewPodAction pod and is used to track its status and outcome.
PostExecNewPodActionPodPhaseAnnotation = "openshift.io/deployment.lifecycle.post.execnewpod.phase"
)
Initially, valid values for Lifecycle.Post.FailurePolicy will be Retry and Continue. This may change in the future if deployments can be safely rolled back automatically.
TODO: ExecNewPodAction.ContainerName
The status of a deployment hook is distinct from the status of the deployment iteself. The deployment status may be updated in response to a change in hook status.
Pre hook executes while the deployment has a New status, and the hook will have a terminal status prior to the deployment transitioning past New.Post hook executes while the deployment has a Running status, and the hook will have a terminal status prior to the deployment transitioning past Running.Here's an example deployment which demonstrates how to apply deployment hooks to a Rails application which uses migrations.
The application image example/rails is built with a Dockerfile based on the rails image from Docker Hub:
FROM rails:onbuild
A database is exposed to the application using a service:
{
"kind": "Service",
"apiVersion": "v1",
"metadata": {
"name": "mysql"
},
"spec": {
"ports": [
{
"protocol": "TCP",
"port": 5434,
"targetPort": 3306,
"nodePort": 0
}
],
"selector": {
"name": "mysql"
},
"clusterIP": "",
"type": "ClusterIP",
"sessionAffinity": "None"
}
}
A deployment configuration describes the template for application deployments:
{
"kind": "DeploymentConfig",
"apiVersion": "v1",
"metadata": {
"name": "rails"
},
"spec": {
"strategy": {
"type": "Recreate",
"resources": {}
},
"triggers": [
{
"type": "ConfigChange"
}
],
"replicas": 1,
"selector": {
"name": "rails"
},
"template": {
"metadata": {
"labels": {
"name": "rails"
}
},
"spec": {
"containers": [
{
"name": "rails",
"image": "example/rails",
"ports": [
{
"containerPort": 8080,
"protocol": "TCP"
}
],
"resources": {},
"terminationMessagePath": "/dev/termination-log",
"imagePullPolicy": "IfNotPresent",
"capabilities": {},
"securityContext": {
"capabilities": {},
"privileged": false
}
}
],
"restartPolicy": "Always",
"dnsPolicy": "ClusterFirst"
}
}
}
}
Let's consider a hypothetical timeline of events for this deployment, assuming that the initial version of the application is already deployed as rails-1.
example/rails image triggers a deployment of the rails deployment configuration.rails-2 is created with 0 replicas; the deployment is not yet live.pre hook command rake db:migrate is executed in a container using the example/rails image as specified in the rails container.rake command connects to the database using environment variables provided for the mysql service.rake db:migrate finishes successfully, the Recreate strategy executes, causing the rails-2 deployment to become live and rails-1 to be disabled.failurePolicy is set to Retry, if the rake command fails, it will be retried and the deployment will not proceed until the command succeeds.post hook, the deployment is now complete.