summaryrefslogtreecommitdiff
path: root/keps
diff options
context:
space:
mode:
authorfabriziopandini <fabrizio.pandini@gmail.com>2018-03-14 21:40:59 +0100
committerfabriziopandini <fabrizio.pandini@gmail.com>2018-03-14 21:40:59 +0100
commita467a79f07fc74ca631423905c77b9684eab2553 (patch)
tree31f6287b8bb4af682e78d7161b264b93b92b5762 /keps
parenta616ab2966ce4caaf5e9ff3f71117e5be5d9d5b4 (diff)
kubeadm-join--master
Diffstat (limited to 'keps')
-rw-r--r--keps/sig-cluster-lifecycle/draft-20180130-kubeadm-join-master.md448
1 files changed, 448 insertions, 0 deletions
diff --git a/keps/sig-cluster-lifecycle/draft-20180130-kubeadm-join-master.md b/keps/sig-cluster-lifecycle/draft-20180130-kubeadm-join-master.md
new file mode 100644
index 00000000..10554064
--- /dev/null
+++ b/keps/sig-cluster-lifecycle/draft-20180130-kubeadm-join-master.md
@@ -0,0 +1,448 @@
+# kubeadm join --master workflow
+
+## Metadata
+
+```yaml
+---
+kep-number: draft-20180130
+title: kubeadm join --master workflow
+status: accepted
+authors:
+ - "@fabriziopandini"
+owning-sig: sig-cluster-lifecycle
+reviewers:
+ - "@errordeveloper"
+ - "@jamiehannaford"
+approvers:
+ - "@luxas"
+ - "@timothysc"
+ - "@roberthbailey"
+editor:
+ - "@fabriziopandini"
+creation-date: 2018-01-28
+last-updated: 2018-01-28
+see-also:
+ - KEP 0004
+
+```
+
+## Table of Contents
+
+ * [kubeadm join --master workflow](#kubeadm-join---master-workflow)
+ * [Metadata](#metadata)
+ * [Table of Contents](#table-of-contents)
+ * [Summary](#summary)
+ * [Motivation](#motivation)
+ * [Goals](#goals)
+ * [Non-goals](#non-goals)
+ * [Challenges and Open Questions](#challenges-and-open-questions)
+ * [Proposal](#proposal)
+ * [User Stories](#user-stories)
+ * [Add a new master node](#add-a-new-master-node)
+ * [Implementation Details](#implementation-details)
+ * [advertise-address = IP/DNS of the external load balancer](#advertise-address--ipdns-of-the-external-load-balancer)
+ * [kubeadm init --feature-gates=HighAvailability=true](#kubeadm-init---feature-gateshighavailabilitytrue)
+ * [kubeadm join --master workflow](#kubeadm-join---master-workflow-1)
+ * [Strategies for deploying control plane components](#strategies-for-deploying-control-plane-components)
+ * [Strategies for distributing cluster certificates](#strategies-for-distributing-cluster-certificates)
+ * [Graduation Criteria](#graduation-criteria)
+ * [Implementation History](#implementation-history)
+ * [Drawbacks](#drawbacks)
+ * [Alternatives](#alternatives)
+
+## Summary
+
+We are extending the kubeadm distinctive `init` and `join` workflow, introducing the
+capability to add more than one master node to an existing cluster by means of the
+new `kubeadm join --master` option.
+
+As a consequence, kubeadm will provide a best-practice, “fast path” for creating a
+minimum viable, conformant Kubernetes cluster with one or more master nodes and
+zero or more worker nodes.
+
+## Motivation
+
+Support for high availability is one of the most requested features for kubeadm.
+
+Even if, as of today, there is already the possibility to create an HA cluster
+using kubeadm in combination with some scripts and/or automation tools (e.g.
+[this](https://kubernetes.io/docs/setup/independent/high-availability/)), this KEP was
+designed with the objective to introduce an upstream simple and reliable solution for
+achieving the same goal.
+
+### Goals
+
+* "Divide and conquer”
+
+ This proposal - at least in its initial release - does not address all the possible
+ user stories for creating an highly available Kubernetes cluster, but instead
+ focuses on:
+
+ * Defining a generic and extensible flow for bootstrapping an HA cluster, the
+ `kubeadm join --master` workflow.
+ * Providing a solution *only* for one, well defined user story. see
+ [User Stories](#user-stories) and [Non-goals](#non-goals).
+
+* Provide support for a dynamic bootstrap flow
+
+ At the time a user is running `kubeadm init`, s/he/an operator might not know what
+ the cluster setup will look like eventually. For instance, the user may start with
+ only one master + n nodes, and then add further master nodes with `kubeadm join --master`
+ or add more worker nodes with `kubeadm join` (in any order).
+
+* Enable higher-level tools integration
+
+ We expect higher-level and more tailored tooling to be built on top of kubeadm,
+ and ideally, using kubeadm as the basis of all deployments will make it easier
+ to create conformant cluster.
+
+ Accordingly, the `kubeadm join --master` workflow should provide support for
+ the following operational practices used by higher level tools:
+
+ * Parallel node creation
+
+ Higher-level tools could create nodes in parallel (both masters and workers)
+ for reducing the overall cluster startup time.
+ `kubeadm join --master` should support natively this practice without requiring
+ the implementation of any synchronization mechanics by higher-level tools.
+
+ * Replace reconciliation strategies
+
+ Especially in case of cloud deployments, higher-level automation tools could
+ decide for any reason to replace existing nodes with new ones (instead of apply
+ changes in-place to existing nodes).
+ `kubeadm join --master` will support this practice by making easier to replace
+ existing master nodes with new ones.
+
+### Non-goals
+
+* By design, kubeadm cares only about bootstrapping, not about provisioning machines.
+ Likewise, installing various nice-to-have addons, like the Kubernetes Dashboard,
+ monitoring solutions, and cloud-specific addons, is not in scope.
+
+* This proposal doesn't include a solution for etcd cluster management\*.
+
+* Nothing in this proposal should prevent users to run master nodes components
+ and etcd on the same machines; however users should be aware that this will
+ introduce limitations for strategies like parallel node creation and In-place
+ vs. Replace reconciliation.
+
+* Nothing in this proposal should prevent kubeadm to implement in future a
+ solution for provisioning an etcd cluster based on static pods/pods.
+
+* This proposal doesn't include a solution for API server load balancing.
+
+* Nothing in this proposal should prevent users from choosing their preferred
+ solution for API server load balancing.
+
+* Nothing in this proposal should prevent practices that exist today.
+
+* Nothing in this proposal should prevent user from pre-provision TLS assets
+ before running `kubeadm init` or `kubeadm join --master`.
+
+\* At the time of writing, the CoreOS recommended approach for etcd is to run
+the etcd cluster outside kubernetes (see discussion in [kubeadm office hours](https://goo.gl/fjyeqo)).
+
+### Challenges and Open Questions
+
+* Keep the UX simple.
+
+ * _What are the acceptable trade-offs between the need to have a clean and simple
+ UX and the complexity of the following challenges and open questions?_
+
+* Create a cluster without knowing its final layout
+
+ Supporting a dynamic workflow implies that some information about the cluster are
+ not available at init time, like e.g. the number of master nodes, the ip of
+ master nodes etc. etc.
+
+ * _How to configure a Kubernetes cluster in order to easily adapt to future change
+ of its own layout like e.g. add a master node, remove a master node?_
+
+ * _What are the "pivotal" cluster settings that must be defined before initialising
+ the cluster?_
+
+ * _What are the mandatory conditions to be verified when executing `kubeadm init`
+ to allow/not allow the execution of `kubeadm join --master` in future?_
+
+* Kubeadm limited scope of action
+
+ * Kubeadm binary can execute actions _only_ on the machine where it is running
+ e.g. it is not possible to execute actions on other nodes, to copy files across
+ nodes etc.
+ * During the join workflow, kubeadm can access the cluster _only_ using identities
+ with limited grants, `system:unauthenticated` or `system:node-bootstrapper`.
+
+* Dependencies graduation
+
+ The solution for `kubeadm join --master` will rely on a set of dependencies/other
+ features which are still in the process for graduating to GA like e.g. dynamic
+ kubelet configuration, self-hosting, component config.
+
+ * _When `kubeadm join --master` should rely entirely on dependencies/features
+ still under graduation vs provide compatibility with older/less convenient but
+ more consolidated approaches?_
+
+ * _Should we support `kubeadm join --master` for cluster with a control plane
+ deployed as static pods? What about cluster with a self-hosted
+ control plane?_
+ * _Should we support `kubeadm join --master` only for cluster storing
+ cluster-certificates on file system? What about cluster
+ storing certificates in secrets?_
+
+* Upgradability
+
+ * How to setup an high available cluster in order to simplify the execution
+ of cluster version upgrades, both manually or with the support of `kubeadm upgrade`?_
+
+## Proposal
+
+### User Stories
+
+#### Add a new master node
+
+As a kubernetes administrator, I want to run `kubeadm join --master` for adding
+a new master node* to an existing Kubernetes cluster**, so that the cluster become
+more resilient to failures of the existing master nodes (high availability).
+
+\* A new "master node" is a new kubernetes node with
+`node-role.kubernetes.io/master=""` label and
+`node-role.kubernetes.io/master:NoSchedule` taint; a new instance of control plane
+components will be deployed on the new master node
+
+> NB. In this first release of the proposal creating a new master node doesn't
+trigger the creation of a new etcd member on the same machine.
+
+\*\* In this first release of the proposal, `kubeadm join --master` could be
+executed _only_ on Kubernetes cluster compliant with following conditions:
+
+* The cluster was initialized with `kubeadm init`.
+* The cluster was initialized with `--feature-gates=HighAvailability=true`.
+* The cluster uses an external etcd.
+* An external load balancer was provisioned and the IP/DNS of the external
+ load balancer is used as advertise-address for the kube-api server.
+
+### Implementation Details
+
+#### advertise-address = IP/DNS of the external load balancer
+
+There are many ways to configure an highly available cluster.
+
+After prototyping and various discussions in
+[kubeadm office hours](https://youtu.be/HcvVi8O_ZGY), it was agreed to implement
+the approach that sets the `--advertise-address` equal to the IP/DNS of the
+external load balancer, without assigning dedicated `--advertise-address` IPs
+for each master nodes.
+
+By excluding the IP of master nodes, kubeadm can create a unique API server
+serving certificate, and share this certificate across many masters nodes;
+no changes will be required to this certificate when adding/removing master nodes.
+
+Such properties make this approach best suited for the initial set up of
+the desired `kubeadm join --master` dynamic workflow.
+
+> Please note that in this scenario the kubernetes service will always resolve
+to the IP/DNS of the external load balancer, instead of resolving to the list
+of master IPs, but this fact was considered an acceptable trade-off at this stage.
+
+> It is expected to add support also for different HA configurations in future releases
+of this KEP.
+
+#### kubeadm init --feature-gates=HighAvailability=true
+
+When executing `kubeadm join --master`, due to current kubeadm limitations, only
+few information about the cluster/about other master nodes are available.
+
+As a consequence, this proposal delegates to the initial `kubeadm init` - when
+executed with `--feature-gates=HighAvailability=true` - all the controls about
+the compliance of the cluster with the supported user story:
+
+* The cluster uses an external etcd.
+* An external load balancer is provisioned and the IP/DNS of the external load balancer is used as advertise-address.
+
+#### kubeadm join --master workflow
+
+The `kubeadm join --master` target workflow is an extension of the
+existing `kubeadm join` flow:
+
+1. Discovery cluster info [No changes to this step]
+
+ Access the `cluster-info` configMap in `kube-public` namespace (or read
+ the same information provided in a file).
+
+ > This step waits for a first instance of the kube-apiserver to become ready;
+ such wait cycle acts as embedded mechanism for handling the sequence
+ `kubeadm init` and `kubeadm join` in case of parallel node creation.
+
+2. In case of `join --master` [New step]
+
+ 1. Using the bootstrap token as identity, read the `kubeadm-config` configMap
+ in `kube-system` namespace.
+
+ > This requires to grant access to the above configMap for
+ `system:node-bootstrapper` group (or to provide the same information
+ provided in a file like in 1.).
+
+ 2. Check if the cluster is ready for joining a new master node:
+
+ a. Check if the cluster was created with `--feature-gates=HighAvailability=true`.
+
+ > We assume that all the necessary conditions where already checked
+ during `kubeadm init`:
+ > * The cluster uses an external etcd.
+ > * An external load balancer is provisioned and the IP/DNS of the external
+ load balancer is used as advertise-address.
+
+ b. In case of cluster certificates stored on file system, check if the
+ expected certificates exists.
+
+ > see "Strategies for distributing cluster certificates" paragraph for
+ additional info about this step.
+
+ 3. Prepare the node for joining as a master node:
+
+ a. In case of control plane deployed as static pods, create kubeconfig files
+ and static pod manifests for control plane components.
+
+ > see "Strategies for deploying control plane components" paragraph
+ for additional info about this step.
+
+ 4. Create the admin.conf kubeconfig file
+
+3. Executes the TLS bootstrap process, including [No changes to this step]:
+
+ 1. Start kubelet using the bootstrap token as identity
+ 2. Request a certificate for the node - with the node identity - and retrieves
+ it after it is automatically approved
+ 3. Restart kubelet with the node identity
+ 4. Eventually, apply the kubelet dynamic configuration
+
+4. In case of `join --master` [New step]
+
+ 1. Apply master taint and label to the node.
+
+ > This action is executed using the admin.conf identity created above;
+ >
+ > This action triggers the deployment of master components in case of
+ self-hosted control plane
+
+#### Strategies for deploying control plane components
+
+As of today kubeadm supports two solutions for deploying control plane components:
+
+1. Control plane deployed as static pods (current kubeadm default)
+2. Self-hosted control plane in case of `--feature-gates=SelfHosting=true`
+
+"Self-hosted control plane" is a solution that we expect - *in the long term* -
+will become mainstream, because it simplifies both deployment and upgrade of control
+plane components due to the fact that Kubernetes itself will take care of deploying
+corresponding pods on nodes.
+
+Unfortunately, at the time of writing it is unknown when this feature will graduate
+to beta/GA or when this feature will become the new kubeadm default; as a consequence,
+this proposal assumes that is still required to provide a solution both for case 1.
+and case 2.
+
+The proposed solution for case 1. "Control plane deployed as static pods", assumes
+that the `kubeadm join --master` flow will take care of creating required kubeconfig
+files and required static pod manifests.
+
+Case 2. "Self-hosted control plane," as described above, does not requires any
+additional steps to be implemented in the `kubeadm join --master` flow.
+
+#### Strategies for distributing cluster certificates
+
+As of today kubeadm supports two solutions for storing cluster certificates:
+
+1. Cluster certificates stored on file system in case of:
+ * Control plane deployed as static pods (current kubeadm default)
+ * Self-hosted control plane in case of `--feature-gates=SelfHosting=true`
+2. Cluster certificates stored in secrets in case of:
+ * Self-hosted control plane + secrets in certs in case of
+ `--feature-gates=SelfHosting=true,StoreCertsInSecrets=true`
+
+"Storing cluster certificates in secrets" is a solution that we expect - *in the
+long term* - will become mainstream, because it simplifies certificates distribution
+and also certificate rotation, due to the fact that Kubernetes itself will take
+care of distributing certs on nodes.
+
+Unfortunately, at the time of writing it is unknown when this feature will graduate
+to beta/GA or when this feature will become the new kubeadm default; as a
+consequence, this proposal assumes it is required to provide a solution for both
+for case 1 and case 2.
+
+The proposed solution for case 1. "Cluster certificates stored on file system",
+requires the user/the higher level tools to execute an additional action _before_
+invoking `kubeadm join --master` (NB. kubeadm is limited to execute actions *only*
+in the machine where it is running, so it is not possible to copy automatically
+certificates from remote locations).
+
+More specifically, in case of cluster with "cluster certificates stored on file
+system", before invoking `kubeadm join --master`, the user/higher level tools
+should copy control plane certificates from an existing node, e.g. the node
+where `kubeadm init` was run, to the joining node.
+
+Then, the `kubeadm join --master` flow will take care of checking certificates
+existence and conformance.
+
+Case 2. "Cluster certificates stored in secrets", as described above, does not
+requires any additional steps to be implemented in the `kubeadm join --master`
+flow .
+
+## Graduation Criteria
+
+* To create a periodic E2E test that bootstraps an HA cluster with kubeadm
+ and exercise the dynamic bootstrap workflow
+* To ensure upgradability of HA clusters (possibly with another E2E test)
+* To document the kubeadm support for HA in kubernetes.io
+
+## Implementation History
+
+* original HA proposals [#1](https://goo.gl/QNtj5T) and [#2](https://goo.gl/C8V8PV)
+* merged [Kubeadm HA design doc](https://goo.gl/QpD5h8)
+* HA prototype [demo](https://goo.gl/2WLUUc) and [notes](https://goo.gl/NmTahy)
+* [PR #58261](https://github.com/kubernetes/kubernetes/pull/58261)
+
+## Drawbacks
+
+This proposal provides support for a single, well defined HA scenario.
+While this is considered a sustainable approach to the complexity of HA in Kubernetes,
+the limited scope of this proposal could be negatively perceived by final users.
+
+## Alternatives
+
+1) Execute `kubeadm init` on many nodes
+
+The approach based on execution of `kubeadm init` on each master was considered as well,
+but not chosen because it seems to have several draw backs:
+
+* There is no real control on parameters passed to `kubeadm init` executed on secondary masters,
+ and this can lead to unpredictable inconsistent configurations.
+* The init sequence for secondary master won't go through the TLS boostrap process,
+ and this can be perceived security concern.
+* The init sequence executes a lot of steps which are un-necessary on a secondary master;
+ now those steps are mostly idempotent, so basically now no harm is done by executing
+ them two or three times. Nevertheless to mantain this contract in future could be complex.
+
+2) Allow HA configurations with `--advertise-address` equal to the master ip address
+(and adding the IP/DNS of the external load balancer as an additional apiServerCertSANs).
+
+After some testing, this option was considered too complex/not
+adequate for the initial set up of the desired `kubeadm join --master` dynamic workflow;
+this can be better explained by looking at two implementation based on this option:
+
+* [kubernetes the hard way](https://github.com/kelseyhightower/kubernetes-the-hard-way)
+ uses the IP address of all master nodes for creating a new API server
+ serving certificate before bootstrapping the cluster, but this approach
+ can't be used if considering the desired dynamic workflow.
+
+* [Creating HA cluster with kubeadm](https://kubernetes.io/docs/setup/independent/high-availability/)
+ uses a different API server serving certificate for each master, and this
+ could increases the complexity of the first implementation because:
+ * the `kubeadm join --master` flow has to generate different certificates for
+ each master node.
+ * self-hosting control plane, should be adapted to mount different certificates
+ for each master.
+ * bootstrap check pointing should be designed to checkpoint a different
+ set of certificates for each master.
+ * upgrades should be adapted to consider master specific settings