docs/boot-sequence.md
This is an overview of how a Kubernetes cluster comes up, when using kOps.
The kOps tool itself takes the (minimal) spec of a cluster that the user specifies, and computes a complete configuration, setting defaults where values are not specified, and deriving appropriate dependencies. The "complete" specification includes the set of all flags that will be passed to all components. All decisions about how to install the cluster are made at this stage, and thus every decision can in theory be changed if the user specifies a value in the spec.
This complete specification is set in the LaunchTemplate for the AutoScaling Group (on AWS), or the Managed Instance Group (on GCE).
On both AWS & GCE, everything (nodes & masters) runs in an ASG/MIG; this means that failures (or the user) can terminate machines and the system will self-heal.
nodeup is the component that installs packages and sets up the OS, sufficiently for Kubelet. The core requirements are:
In addition, nodeup installs:
kubelet starts pods as controlled by the files in /etc/kubernetes/manifests. These files are created by nodeup and protokube (ideally all by protokube, but currently split between the two).
These pods are declared using the standard k8s manifests, just as if they were stored in the API. But these are used to break the circular dependency for the bring-up of our core components, such as etcd & kube-apiserver.
On masters:
On nodes:
It is possible to add custom static pods by using fileAssets in the
cluster spec. This might be useful for any custom bootstraping that
doesn't fit into additionalUserData or hooks.
Kubelet starts up, starts (and restarts) all the containers in /etc/kubernetes/manifests.
It also tries to contact the API server (which the master kubelet will itself eventually start), register the node. Once a node is registered, kube-controller-manager will allocate it a PodCIDR, which is an allocation of the k8s-network IP range. kube-controller-manager updates the node, setting the PodCIDR field. Once kubelet sees this allocation, it will set up the local bridge with this CIDR, which allows docker to start. Before this happens, only pods that have hostNetwork will work - so all the "core" containers run with hostNetwork=true.
APIServer also listens on the HTTPS port (443) on all interfaces. This is a secured endpoint, and requires valid authentication/authorization to use it. This is the endpoint that node kubelets will reach, and also that end-users will reach.
kOps uses DNS to allow nodes and end-users to discover the api-server. The apiserver pod manifest (in
/etc/kubernetes/manifests) includes annotations that will cause the dns-controller to create the
records. It creates api.internal.mycluster.com for use inside the cluster (using InternalIP addresses),
and it creates api.mycluster.com for use outside the cluster (using ExternalIP addresses).
etcd is where we have put all of our synchronization logic, so it is more complicated than most other pieces, and we must be really careful when bringing it up.
kOps follows CoreOS's recommend procedure for bring-up of etcd on clouds:
Because the data is persistent and the cluster membership is also a static set of DNS names, this means we don't need to manage etcd directly. We just try to make sure that some master always have each volume mounted with etcd running and DNS set correctly. That is the job of protokube.
Protokube:
Most of this has focused on things that happen on the master, but the node bringup is very similar but simplified:
So kubelet will start up, as will kube-proxy. It will try to reach the api-server on the internal DNS name, and once the master is up it will succeed. Then: