diff options
| author | k8s-ci-robot <k8s-ci-robot@users.noreply.github.com> | 2018-07-13 08:29:26 -0700 |
|---|---|---|
| committer | GitHub <noreply@github.com> | 2018-07-13 08:29:26 -0700 |
| commit | ae474acdd72d1f06ddb5b6d57a52ea0c969b7322 (patch) | |
| tree | 6a494cf79bc8b97c14cb0b0896d2324f4aee1f85 | |
| parent | 74d88f25ffc72fce719c5369734624b5c74fafc9 (diff) | |
| parent | e7520a4028447785caee6ca66eaf349b6335203d (diff) | |
Merge pull request #2331 from fabriziopandini/kubeadm-join-master2
Update KEP kubeadm join --master
| -rw-r--r-- | keps/NEXT_KEP_NUMBER | 2 | ||||
| -rw-r--r-- | keps/sig-cluster-lifecycle/0015-kubeadm-join-master.md | 435 | ||||
| -rw-r--r-- | keps/sig-cluster-lifecycle/draft-20180130-kubeadm-join-master.md | 448 |
3 files changed, 436 insertions, 449 deletions
diff --git a/keps/NEXT_KEP_NUMBER b/keps/NEXT_KEP_NUMBER index 8351c193..60d3b2f4 100644 --- a/keps/NEXT_KEP_NUMBER +++ b/keps/NEXT_KEP_NUMBER @@ -1 +1 @@ -14 +15 diff --git a/keps/sig-cluster-lifecycle/0015-kubeadm-join-master.md b/keps/sig-cluster-lifecycle/0015-kubeadm-join-master.md new file mode 100644 index 00000000..be553f4e --- /dev/null +++ b/keps/sig-cluster-lifecycle/0015-kubeadm-join-master.md @@ -0,0 +1,435 @@ +# kubeadm join --master workflow + +## Metadata + +```yaml +--- +kep-number: 15 +title: kubeadm join --master workflow +status: accepted +authors: + - "@fabriziopandini" +owning-sig: sig-cluster-lifecycle +reviewers: + - "@chuckha” + - "@detiber" + - "@luxas" +approvers: + - "@luxas" + - "@timothysc" +editor: + - "@fabriziopandini" +creation-date: 2018-01-28 +last-updated: 2018-06-29 +see-also: + - KEP 0004 +``` + +## Table of Contents + +<!-- TOC --> + +- [kubeadm join --master workflow](#kubeadm-join---master-workflow) + - [Metadata](#metadata) + - [Table of Contents](#table-of-contents) + - [Summary](#summary) + - [Motivation](#motivation) + - [Goals](#goals) + - [Non-goals](#non-goals) + - [Challenges and Open Questions](#challenges-and-open-questions) + - [Proposal](#proposal) + - [User Stories](#user-stories) + - [Create a cluster with more than one master nodes (static workflow)](#create-a-cluster-with-more-than-one-master-nodes-static-workflow) + - [Add a new master node (dynamic workflow)](#add-a-new-master-node-dynamic-workflow) + - [Implementation Details](#implementation-details) + - [Initialize the Kubernetes cluster](#initialize-the-kubernetes-cluster) + - [Preparing for execution of kubeadm join --master](#preparing-for-execution-of-kubeadm-join---master) + - [The kubeadm join --master workflow](#the-kubeadm-join---master-workflow) + - [dynamic workflow (advertise-address == `controlplaneAddress`)](#dynamic-workflow-advertise-address--controlplaneaddress) + - [Static workflow (advertise-address != `controlplaneAddress`)](#static-workflow-advertise-address--controlplaneaddress) + - [Strategies for deploying control plane components](#strategies-for-deploying-control-plane-components) + - [Strategies for distributing cluster certificates](#strategies-for-distributing-cluster-certificates) + - [`kubeadm upgrade` for HA clusters](#kubeadm-upgrade-for-ha-clusters) + - [Graduation Criteria](#graduation-criteria) + - [Implementation History](#implementation-history) + - [Drawbacks](#drawbacks) + - [Alternatives](#alternatives) + +<!-- /TOC --> + +## Summary + +We are extending the kubeadm distinctive `init` and `join` workflow, introducing the +capability to add more than one master node to an existing cluster by means of the +new `kubeadm join --master` option (in alpha release the flag will be named --experimental-master) + +As a consequence, kubeadm will provide a best-practice, “fast path” for creating a +minimum viable, conformant Kubernetes cluster with one or more master nodes and +zero or more worker nodes; as better detailed in following paragraphs, please note that +this proposal doesn't solve every possible use case or even the full end-to-end flow automatically. + +## Motivation + +Support for high availability is one of the most requested features for kubeadm. + +Even if, as of today, there is already the possibility to create an HA cluster +using kubeadm in combination with some scripts and/or automation tools (e.g. +[this](https://kubernetes.io/docs/setup/independent/high-availability/)), this KEP was +designed with the objective to introduce an upstream simple and reliable solution for +achieving the same goal. + +Such solution will provide a consistent and repeatable base for implementing additional +capabilities like e.g. kubeadm upgrade for HA clusters. + +### Goals + +- "Divide and conquer” + + This proposal - at least in its initial release - does not address all the possible + user stories for creating an highly available Kubernetes cluster, but instead + focuses on: + + - Defining a generic and extensible flow for bootstrapping a cluster with multiple masters, + the `kubeadm join --master` workflow. + - Providing a solution *only* for well defined user stories. see + [User Stories](#user-stories) and [Non-goals](#non-goals). + +- Enable higher-level tools integration + + We expect higher-level and tooling will leverage on kubeadm for creating HA clusters; + accordingly, the `kubeadm join --master` workflow should provide support for + the following operational practices used by higher level tools: + + - Parallel node creation + + Higher-level tools could create nodes in parallel (both masters and workers) + for reducing the overall cluster startup time. + `kubeadm join --master` should support natively this practice without requiring + the implementation of any synchronization mechanics by higher-level tools. + +- Provide support both for dynamic and static bootstrap flow + + At the time a user is running `kubeadm init`, they might not know what + the cluster setup will look like eventually. For instance, the user may start with + only one master + n nodes, and then add further master nodes with `kubeadm join --master` + or add more worker nodes with `kubeadm join` (in any order). This kind of workflow, where the + user doesn’t know in advance the final layout of the control plane instances, into this + document is referred as “dynamic bootstrap workflow”. + + Nevertheless, kubeadm should support also more “static bootstrap flow”, where a user knows + in advance the target layout of the controlplane instances (the number, the name and the IP + of master nodes). + +- Support different etcd deployment scenarios, and more specifically run master nodes components + and the etcd cluster on the same machines (stacked control plane nodes) or run the etcd + cluster on dedicated machines. + +### Non-goals + +- Graduating an existing node to master. + The nodes must be created as a master or as workers and then are supposed to stick to the assigned role + for their entire life cycle. + +- This proposal doesn't include a solution for etcd cluster management (but nothing in this proposal should + prevent to address this in future). + +- This proposal doesn't include a solution for API server load balancing (Nothing in this proposal + should prevent users from choosing their preferred solution for API server load balancing). + +- This proposal doesn't address the ongoing discussion about kubeadm self-hosting; in light of + divide and conquer goal stated before, it is not planned to provide support for self-hosted clusters + neither in the initial proposal nor in the foreseeable future (but nothing in this proposal should + explicitly prevent to reconsider this in future as well). + +- This proposal doesn't provide an automated solution for transferring the CA key and other required + certs from one master to the other. More specifically, this proposal doesn't address the ongoing + discussion about storage of kubeadm TLS assets in secrets and it it is not planned + to provide support for clusters with TLS stored in secrets (but nothing in this + proposal should explicitly prevent to reconsider this in future). + +- Nothing in this proposal should prevent practices that exist today. + +### Challenges and Open Questions + +- Keep the UX simple. + + - _What are the acceptable trade-offs between the need to have a clean and simple + UX and the variety/complexity of possible kubernetes HA deployments?_ + +- Create a cluster without knowing its final layout + + Supporting a dynamic workflow implies that some information about the cluster are + not available at init time, like e.g. the number of master nodes, the IP of + master nodes etc. etc. + + - _How to configure a Kubernetes cluster in order to easily adapt to future change + of its own controlplane layout like e.g. add a master node, remove a master node?_ + + - _What are the "pivotal" cluster settings that must be defined before initialising + the cluster?_ + + - _How to combine into a single UX support for both static and dynamic bootstrap + workflows?_ + +- Kubeadm limited scope of action + + - Kubeadm binary can execute actions _only_ on the machine where it is running + e.g. it is not possible to execute actions on other nodes, to copy files across + nodes etc. + - During the join workflow, kubeadm can access the cluster _only_ using identities + with limited grants, namely `system:unauthenticated` or `system:node-bootstrapper`. + +- Upgradability + + - How to setup an high available cluster in order to simplify the execution + of cluster version upgrades, both manually or with the support of `kubeadm upgrade`?_ + +## Proposal + +### User Stories + +#### Create a cluster with more than one master nodes (static workflow) + +As a kubernetes administrator, I want to create a Kubernetes cluster with more than one +master nodes*, of which I know in advance the name and the IP. + +\* A new "master node" is a new kubernetes node with +`node-role.kubernetes.io/master=""` label and +`node-role.kubernetes.io/master:NoSchedule` taint; a new instance of control plane +components will be deployed on the new master node. +As described in goals/non goals, in this first release of the proposal +creating a new master node doesn't trigger the creation of a new etcd member on the +same machine. + +#### Add a new master node (dynamic workflow) + +As a kubernetes administrator, (_at any time_) I want to add a new master node* to an existing +Kubernetes cluster. + +### Implementation Details + +#### Initialize the Kubernetes cluster + +As of today, a Kubernetes cluster should be initialized by running `kubeadm init` on a +first master, afterward referred as the bootstrap master. + +in order to support the `kubeadm join --master` workflow a new Kubernetes cluster is +expected to satisfy following conditions : + +- The cluster must have a stable `controlplaneAddress` endpoint (aka the IP/DNS of the + external load balancer) +- The cluster must use an external etcd. + +All the above conditions/settings could be set by passing a configuration file to `kubeadm init`. + +#### Preparing for execution of kubeadm join --master + +Before invoking `kubeadm join --master`, the user/higher level tools +should copy control plane certificates from an existing master node, e.g. bootstrap master + +> NB. kubeadm is limited to execute actions *only* +> in the machine where it is running, so it is not possible to copy automatically +> certificates from remote locations. + +Please note that strictly speaking only ca, front-proxy-ca certificate and and service account key pair +are required to be equal among all masters. Accordingly: + +- `kubeadm join --master` will check for the mandatory certificates and fail fast if + they are missing +- given the required certificates exists, if some/all of the other certificates are provided + by the user as well, `kubeadm join --master` will use them without further checks. +- If any other certificates are missing, `kubeadm join --master` will create them. + +> see "Strategies for distributing cluster certificates" paragraph for +> additional info about this step. + +#### The kubeadm join --master workflow + +The `kubeadm join --master` workflow will be implemented as an extension of the +existing `kubeadm join` flow. + +`kubeadm join --master` will accept an additional parameter, that is the apiserver advertise +address of the joining node; as details in following paragraphs, the value assigned to +this parameter depends on the user choice between a dynamic bootstrap workflow or a static +bootstrap workflow. + +The updated join workflow will be the following: + +1. Discovery cluster info [No changes to this step] + + > NB This step waits for a first instance of the kube-apiserver to become ready + > (the bootstrap master); And thus it acts as embedded mechanism for handling the sequence + > `kubeadm init` and `kubeadm join` actions in case of parallel node creation. + +2. Executes the kubelet TLS bootstrap process [No changes to this step]: + +3. In case of `join --master` [New step] + + 1. Using the bootstrap token as identity, read the `kubeadm-config` configMap + in `kube-system` namespace. + + > This requires to grant access to the above configMap for + > `system:bootstrappers` group. + + 2. Check if the cluster is ready for joining a new master node: + + a. Check if the cluster has a stable `controlplaneAddress` + a. Check if the cluster uses an external etcd + a. Checks if the mandatory certificates exists on the file system + + 3. Prepare the node for joining as a master node: + + a. Create missing certificates (in any). + > please note that by creating missing certificates kubeadm can adapt seamlessly + > to a dynamic workflow or to a static workflow (and to apiserver advertise address + > of the joining node). see following paragraphs for more details for additional info. + + a. In case of control plane deployed as static pods, create related kubeconfig files + and static pod manifests. + + > see "Strategies for deploying control plane components" paragraph + > for additional info about this step. + + 4. Create the admin.conf kubeconfig file + + > This operation creates an additional root certificate that enables management of the cluster + > from the joining node and allows a simple and clean UX for the final steps of this workflow + > (similar to the what happen for `kubeadm init`). + > However, it is important to notice that this certificate should be treated securely + > for avoiding to compromise the cluster. + + 5. Apply master taint and label to the node. + + 6. Update the `kubeadm-config` configMap with the information about the new master node + +#### dynamic workflow (advertise-address == `controlplaneAddress`) + +There are many ways to configure an highly available cluster. + +Among them, the approach best suited for a dynamic bootstrap workflow requires the +user to set the `--apiserver-advertise-address` of each master, including the bootstrap master +itself, equal to the `controlplaneAddress` endpoint provided during kubeadm init +(the IP/DNS of the external load balancer). + +By using the same advertise address for all the IP masters, `kubeadm init` can create +a unique API server serving certificate that could be shared across many masters nodes; +no changes will be required to this certificate when adding/removing master nodes. + +Please note that: + +- if the user is not planning to distribute the apiserver serving certificate among masters, + kubeadm will generate a new apiserver serving certificate “almost equal” to the certificate + created on the bootstrap master (it differs only for the domain name of the joining master) + +#### Static workflow (advertise-address != `controlplaneAddress`) + +In case of a static bootstrap workflow the final layout of the controlplane - the number, the +name and the IP of master nodes - is know in advance. + +Given such information, the user can choose a different approach where each master has a +specific apiserver advertise address different from the `controlplaneAddress`. + +Please note that: + +- if the user is not planning to distribute the apiserver certificate among masters, kubeadm + will generate a new apiserver serving certificate with the required SANS +- if the user is planning to distribute the apiserver certificate among masters, the + operator is required to provide during `kubeadm init` the list of masters/the list of IP + addresses for all the masters as alternative names for the API servers certificate, thus + allowing the proper functioning of all the API server instances that will join + +#### Strategies for deploying control plane components + +As of today kubeadm supports two solutions for deploying control plane components: + +1. Control plane deployed as static pods (current kubeadm default) +2. Self-hosted control plane (currently alpha) + +The proposed solution for case 1. "Control plane deployed as static pods", assumes +that the `kubeadm join --master` flow will take care of creating required kubeconfig +files and required static pod manifests. + +As stated above, supporting for Self-hosted control plane is non goal for this +proposal. + +#### Strategies for distributing cluster certificates + +As of today kubeadm supports two solutions for storing cluster certificates: + +1. Cluster certificates stored on file system (current kubeadm default) +2. Cluster certificates stored in secrets (currently alpha) + +The proposed solution for case 1. "Cluster certificates stored on file system", +requires the user/the higher level tools to execute an additional action _before_ +invoking `kubeadm join --master`. + +More specifically, in case of cluster with "cluster certificates stored on file +system", before invoking `kubeadm join --master`, the user/higher level tools +should copy control plane certificates from an existing master node, e.g. bootstrap master + +> NB. kubeadm is limited to execute actions *only* +in the machine where it is running, so it is not possible to copy automatically +certificates from remote locations. + +Then, the `kubeadm join --master` flow will take care of checking certificates +existence and conformance. + +As stated above, supporting for Cluster certificates stored in secrets is a non goal +for this proposal. + +#### `kubeadm upgrade` for HA clusters + +Nothing in this proposal prevents implementation of `kubeadm upgrade` for HA cluster. + +Further detail will be provided in a subsequent release of this KEP when all the detail +of the `v1beta1` release of kubeadm api will be available (including a proper modeling +of a multi master cluster). + +## Graduation Criteria + +- To create a periodic E2E test that bootstraps an HA cluster with kubeadm + and exercise the static bootstrap workflow +- To create a periodic E2E test that bootstraps an HA cluster with kubeadm + and exercise the dynamic bootstrap workflow +- To ensure upgradability of HA clusters (possibly with another E2E test) +- To document the kubeadm support for HA in kubernetes.io + +## Implementation History + +- original HA proposals [#1](https://goo.gl/QNtj5T) and [#2](https://goo.gl/C8V8PV) +- merged [Kubeadm HA design doc](https://goo.gl/QpD5h8) +- HA prototype [demo](https://goo.gl/2WLUUc) and [notes](https://goo.gl/NmTahy) +- [PR #58261](https://github.com/kubernetes/kubernetes/pull/58261) with the showcase implementation of the first release of this KEP + +## Drawbacks + +The kubeadm join --master workflow requires that some condition are satisfied at `kubeadm init` time, +that is use a `controlplaneAddress` and use an external etcd. + +Strictly speaking, that's mean that the `kubeadm join --master` defined in this proposal supports +a dynamic workflow _only_ in some cases. + +## Alternatives + +1) Execute `kubeadm init` on many nodes + +The approach based on execution of `kubeadm init` on each master was considered as well, +but not chosen because it seems to have several drawbacks: + +- There is no real control on parameters passed to `kubeadm init` executed on secondary masters, + and this might lead to unpredictable inconsistent configurations. +- The init sequence for secondary master won't go through the TLS bootstrap process, + and this might be perceived as a security concern. +- The init sequence executes a lot of steps which are un-necessary on a secondary master; + now those steps are mostly idempotent, so basically now no harm is done by executing + them two or three times. Nevertheless to maintain this contract in future could be complex. + +Additionally, by having a separated `kubeadm join --master` workflow instead of a single `kubeadm init` +workflow we can provide better support for: + +- Steps that should be done in a slightly different way on a secondary master with respect + to the bootstrap master (e.g. updating the kubeadm-config map adding info about the new master instead + of creating a new configMap from scratch). +- Checking that the cluster/the kubeadm-config is properly configured for multi masters +- Blocking users trying to create multi masters with configurations we don't want to support as a sig + (e.g. HA with self-hosted control plane)
\ No newline at end of file diff --git a/keps/sig-cluster-lifecycle/draft-20180130-kubeadm-join-master.md b/keps/sig-cluster-lifecycle/draft-20180130-kubeadm-join-master.md deleted file mode 100644 index 15fd846a..00000000 --- a/keps/sig-cluster-lifecycle/draft-20180130-kubeadm-join-master.md +++ /dev/null @@ -1,448 +0,0 @@ -# kubeadm join --master workflow - -## Metadata - -```yaml ---- -kep-number: draft-20180130 -title: kubeadm join --master workflow -status: accepted -authors: - - "@fabriziopandini" -owning-sig: sig-cluster-lifecycle -reviewers: - - "@errordeveloper" - - "@jamiehannaford" -approvers: - - "@luxas" - - "@timothysc" - - "@roberthbailey" -editor: - - "@fabriziopandini" -creation-date: 2018-01-28 -last-updated: 2018-01-28 -see-also: - - KEP 0004 - -``` - -## Table of Contents - - * [kubeadm join --master workflow](#kubeadm-join---master-workflow) - * [Metadata](#metadata) - * [Table of Contents](#table-of-contents) - * [Summary](#summary) - * [Motivation](#motivation) - * [Goals](#goals) - * [Non-goals](#non-goals) - * [Challenges and Open Questions](#challenges-and-open-questions) - * [Proposal](#proposal) - * [User Stories](#user-stories) - * [Add a new master node](#add-a-new-master-node) - * [Implementation Details](#implementation-details) - * [advertise-address = IP/DNS of the external load balancer](#advertise-address--ipdns-of-the-external-load-balancer) - * [kubeadm init --feature-gates=HighAvailability=true](#kubeadm-init---feature-gateshighavailabilitytrue) - * [kubeadm join --master workflow](#kubeadm-join---master-workflow-1) - * [Strategies for deploying control plane components](#strategies-for-deploying-control-plane-components) - * [Strategies for distributing cluster certificates](#strategies-for-distributing-cluster-certificates) - * [Graduation Criteria](#graduation-criteria) - * [Implementation History](#implementation-history) - * [Drawbacks](#drawbacks) - * [Alternatives](#alternatives) - -## Summary - -We are extending the kubeadm distinctive `init` and `join` workflow, introducing the -capability to add more than one master node to an existing cluster by means of the -new `kubeadm join --master` option. - -As a consequence, kubeadm will provide a best-practice, “fast path” for creating a -minimum viable, conformant Kubernetes cluster with one or more master nodes and -zero or more worker nodes. - -## Motivation - -Support for high availability is one of the most requested features for kubeadm. - -Even if, as of today, there is already the possibility to create an HA cluster -using kubeadm in combination with some scripts and/or automation tools (e.g. -[this](https://kubernetes.io/docs/setup/independent/high-availability/)), this KEP was -designed with the objective to introduce an upstream simple and reliable solution for -achieving the same goal. - -### Goals - -* "Divide and conquer” - - This proposal - at least in its initial release - does not address all the possible - user stories for creating an highly available Kubernetes cluster, but instead - focuses on: - - * Defining a generic and extensible flow for bootstrapping an HA cluster, the - `kubeadm join --master` workflow. - * Providing a solution *only* for one, well defined user story. see - [User Stories](#user-stories) and [Non-goals](#non-goals). - -* Provide support for a dynamic bootstrap flow - - At the time a user is running `kubeadm init`, s/he/an operator might not know what - the cluster setup will look like eventually. For instance, the user may start with - only one master + n nodes, and then add further master nodes with `kubeadm join --master` - or add more worker nodes with `kubeadm join` (in any order). - -* Enable higher-level tools integration - - We expect higher-level and more tailored tooling to be built on top of kubeadm, - and ideally, using kubeadm as the basis of all deployments will make it easier - to create conformant cluster. - - Accordingly, the `kubeadm join --master` workflow should provide support for - the following operational practices used by higher level tools: - - * Parallel node creation - - Higher-level tools could create nodes in parallel (both masters and workers) - for reducing the overall cluster startup time. - `kubeadm join --master` should support natively this practice without requiring - the implementation of any synchronization mechanics by higher-level tools. - - * Replace reconciliation strategies - - Especially in case of cloud deployments, higher-level automation tools could - decide for any reason to replace existing nodes with new ones (instead of apply - changes in-place to existing nodes). - `kubeadm join --master` will support this practice by making easier to replace - existing master nodes with new ones. - -### Non-goals - -* By design, kubeadm cares only about bootstrapping, not about provisioning machines. - Likewise, installing various nice-to-have addons, like the Kubernetes Dashboard, - monitoring solutions, and cloud-specific addons, is not in scope. - -* This proposal doesn't include a solution for etcd cluster management\*. - -* Nothing in this proposal should prevent users to run master nodes components - and etcd on the same machines; however users should be aware that this will - introduce limitations for strategies like parallel node creation and In-place - vs. Replace reconciliation. - -* Nothing in this proposal should prevent kubeadm to implement in future a - solution for provisioning an etcd cluster based on static pods/pods. - -* This proposal doesn't include a solution for API server load balancing. - -* Nothing in this proposal should prevent users from choosing their preferred - solution for API server load balancing. - -* Nothing in this proposal should prevent practices that exist today. - -* Nothing in this proposal should prevent user from pre-provision TLS assets - before running `kubeadm init` or `kubeadm join --master`. - -\* At the time of writing, the CoreOS recommended approach for etcd is to run -the etcd cluster outside kubernetes (see discussion in [kubeadm office hours](https://goo.gl/fjyeqo)). - -### Challenges and Open Questions - -* Keep the UX simple. - - * _What are the acceptable trade-offs between the need to have a clean and simple - UX and the complexity of the following challenges and open questions?_ - -* Create a cluster without knowing its final layout - - Supporting a dynamic workflow implies that some information about the cluster are - not available at init time, like e.g. the number of master nodes, the ip of - master nodes etc. etc. - - * _How to configure a Kubernetes cluster in order to easily adapt to future change - of its own layout like e.g. add a master node, remove a master node?_ - - * _What are the "pivotal" cluster settings that must be defined before initialising - the cluster?_ - - * _What are the mandatory conditions to be verified when executing `kubeadm init` - to allow/not allow the execution of `kubeadm join --master` in future?_ - -* Kubeadm limited scope of action - - * Kubeadm binary can execute actions _only_ on the machine where it is running - e.g. it is not possible to execute actions on other nodes, to copy files across - nodes etc. - * During the join workflow, kubeadm can access the cluster _only_ using identities - with limited grants, `system:unauthenticated` or `system:node-bootstrapper`. - -* Dependencies graduation - - The solution for `kubeadm join --master` will rely on a set of dependencies/other - features which are still in the process for graduating to GA like e.g. dynamic - kubelet configuration, self-hosting, component config. - - * _When `kubeadm join --master` should rely entirely on dependencies/features - still under graduation vs provide compatibility with older/less convenient but - more consolidated approaches?_ - - * _Should we support `kubeadm join --master` for cluster with a control plane - deployed as static pods? What about cluster with a self-hosted - control plane?_ - * _Should we support `kubeadm join --master` only for cluster storing - cluster-certificates on file system? What about cluster - storing certificates in secrets?_ - -* Upgradability - - * How to setup an high available cluster in order to simplify the execution - of cluster version upgrades, both manually or with the support of `kubeadm upgrade`?_ - -## Proposal - -### User Stories - -#### Add a new master node - -As a kubernetes administrator, I want to run `kubeadm join --master` for adding -a new master node* to an existing Kubernetes cluster**, so that the cluster become -more resilient to failures of the existing master nodes (high availability). - -\* A new "master node" is a new kubernetes node with -`node-role.kubernetes.io/master=""` label and -`node-role.kubernetes.io/master:NoSchedule` taint; a new instance of control plane -components will be deployed on the new master node - -> NB. In this first release of the proposal creating a new master node doesn't -trigger the creation of a new etcd member on the same machine. - -\*\* In this first release of the proposal, `kubeadm join --master` could be -executed _only_ on Kubernetes cluster compliant with following conditions: - -* The cluster was initialized with `kubeadm init`. -* The cluster was initialized with `--feature-gates=HighAvailability=true`. -* The cluster uses an external etcd. -* An external load balancer was provisioned and the IP/DNS of the external - load balancer is used as advertise-address for the kube-api server. - -### Implementation Details - -#### advertise-address = IP/DNS of the external load balancer - -There are many ways to configure an highly available cluster. - -After prototyping and various discussions in -[kubeadm office hours](https://youtu.be/HcvVi8O_ZGY), it was agreed to implement -the approach that sets the `--advertise-address` equal to the IP/DNS of the -external load balancer, without assigning dedicated `--advertise-address` IPs -for each master nodes. - -By excluding the IP of master nodes, kubeadm can create a unique API server -serving certificate, and share this certificate across many masters nodes; -no changes will be required to this certificate when adding/removing master nodes. - -Such properties make this approach best suited for the initial set up of -the desired `kubeadm join --master` dynamic workflow. - -> Please note that in this scenario the kubernetes service will always resolve -to the IP/DNS of the external load balancer, instead of resolving to the list -of master IPs, but this fact was considered an acceptable trade-off at this stage. - -> It is expected to add support also for different HA configurations in future releases -of this KEP. - -#### kubeadm init --feature-gates=HighAvailability=true - -When executing `kubeadm join --master`, due to current kubeadm limitations, only -few information about the cluster/about other master nodes are available. - -As a consequence, this proposal delegates to the initial `kubeadm init` - when -executed with `--feature-gates=HighAvailability=true` - all the controls about -the compliance of the cluster with the supported user story: - -* The cluster uses an external etcd. -* An external load balancer is provisioned and the IP/DNS of the external load balancer is used as advertise-address. - -#### kubeadm join --master workflow - -The `kubeadm join --master` target workflow is an extension of the -existing `kubeadm join` flow: - -1. Discovery cluster info [No changes to this step] - - Access the `cluster-info` configMap in `kube-public` namespace (or read - the same information provided in a file). - - > This step waits for a first instance of the kube-apiserver to become ready; - such wait cycle acts as embedded mechanism for handling the sequence - `kubeadm init` and `kubeadm join` in case of parallel node creation. - -2. In case of `join --master` [New step] - - 1. Using the bootstrap token as identity, read the `kubeadm-config` configMap - in `kube-system` namespace. - - > This requires to grant access to the above configMap for - `system:bootstrappers` group (or to provide the same information - provided in a file like in 1.). - - 2. Check if the cluster is ready for joining a new master node: - - a. Check if the cluster was created with `--feature-gates=HighAvailability=true`. - - > We assume that all the necessary conditions where already checked - during `kubeadm init`: - > * The cluster uses an external etcd. - > * An external load balancer is provisioned and the IP/DNS of the external - load balancer is used as advertise-address. - - b. In case of cluster certificates stored on file system, check if the - expected certificates exists. - - > see "Strategies for distributing cluster certificates" paragraph for - additional info about this step. - - 3. Prepare the node for joining as a master node: - - a. In case of control plane deployed as static pods, create kubeconfig files - and static pod manifests for control plane components. - - > see "Strategies for deploying control plane components" paragraph - for additional info about this step. - - 4. Create the admin.conf kubeconfig file - -3. Executes the TLS bootstrap process, including [No changes to this step]: - - 1. Start kubelet using the bootstrap token as identity - 2. Request a certificate for the node - with the node identity - and retrieves - it after it is automatically approved - 3. Restart kubelet with the node identity - 4. Eventually, apply the kubelet dynamic configuration - -4. In case of `join --master` [New step] - - 1. Apply master taint and label to the node. - - > This action is executed using the admin.conf identity created above; - > - > This action triggers the deployment of master components in case of - self-hosted control plane - -#### Strategies for deploying control plane components - -As of today kubeadm supports two solutions for deploying control plane components: - -1. Control plane deployed as static pods (current kubeadm default) -2. Self-hosted control plane in case of `--feature-gates=SelfHosting=true` - -"Self-hosted control plane" is a solution that we expect - *in the long term* - -will become mainstream, because it simplifies both deployment and upgrade of control -plane components due to the fact that Kubernetes itself will take care of deploying -corresponding pods on nodes. - -Unfortunately, at the time of writing it is unknown when this feature will graduate -to beta/GA or when this feature will become the new kubeadm default; as a consequence, -this proposal assumes that is still required to provide a solution both for case 1. -and case 2. - -The proposed solution for case 1. "Control plane deployed as static pods", assumes -that the `kubeadm join --master` flow will take care of creating required kubeconfig -files and required static pod manifests. - -Case 2. "Self-hosted control plane," as described above, does not requires any -additional steps to be implemented in the `kubeadm join --master` flow. - -#### Strategies for distributing cluster certificates - -As of today kubeadm supports two solutions for storing cluster certificates: - -1. Cluster certificates stored on file system in case of: - * Control plane deployed as static pods (current kubeadm default) - * Self-hosted control plane in case of `--feature-gates=SelfHosting=true` -2. Cluster certificates stored in secrets in case of: - * Self-hosted control plane + secrets in certs in case of - `--feature-gates=SelfHosting=true,StoreCertsInSecrets=true` - -"Storing cluster certificates in secrets" is a solution that we expect - *in the -long term* - will become mainstream, because it simplifies certificates distribution -and also certificate rotation, due to the fact that Kubernetes itself will take -care of distributing certs on nodes. - -Unfortunately, at the time of writing it is unknown when this feature will graduate -to beta/GA or when this feature will become the new kubeadm default; as a -consequence, this proposal assumes it is required to provide a solution for both -for case 1 and case 2. - -The proposed solution for case 1. "Cluster certificates stored on file system", -requires the user/the higher level tools to execute an additional action _before_ -invoking `kubeadm join --master` (NB. kubeadm is limited to execute actions *only* -in the machine where it is running, so it is not possible to copy automatically -certificates from remote locations). - -More specifically, in case of cluster with "cluster certificates stored on file -system", before invoking `kubeadm join --master`, the user/higher level tools -should copy control plane certificates from an existing node, e.g. the node -where `kubeadm init` was run, to the joining node. - -Then, the `kubeadm join --master` flow will take care of checking certificates -existence and conformance. - -Case 2. "Cluster certificates stored in secrets", as described above, does not -requires any additional steps to be implemented in the `kubeadm join --master` -flow . - -## Graduation Criteria - -* To create a periodic E2E test that bootstraps an HA cluster with kubeadm - and exercise the dynamic bootstrap workflow -* To ensure upgradability of HA clusters (possibly with another E2E test) -* To document the kubeadm support for HA in kubernetes.io - -## Implementation History - -* original HA proposals [#1](https://goo.gl/QNtj5T) and [#2](https://goo.gl/C8V8PV) -* merged [Kubeadm HA design doc](https://goo.gl/QpD5h8) -* HA prototype [demo](https://goo.gl/2WLUUc) and [notes](https://goo.gl/NmTahy) -* [PR #58261](https://github.com/kubernetes/kubernetes/pull/58261) - -## Drawbacks - -This proposal provides support for a single, well defined HA scenario. -While this is considered a sustainable approach to the complexity of HA in Kubernetes, -the limited scope of this proposal could be negatively perceived by final users. - -## Alternatives - -1) Execute `kubeadm init` on many nodes - -The approach based on execution of `kubeadm init` on each master was considered as well, -but not chosen because it seems to have several draw backs: - -* There is no real control on parameters passed to `kubeadm init` executed on secondary masters, - and this can lead to unpredictable inconsistent configurations. -* The init sequence for secondary master won't go through the TLS boostrap process, - and this can be perceived security concern. -* The init sequence executes a lot of steps which are un-necessary on a secondary master; - now those steps are mostly idempotent, so basically now no harm is done by executing - them two or three times. Nevertheless to mantain this contract in future could be complex. - -2) Allow HA configurations with `--advertise-address` equal to the master ip address -(and adding the IP/DNS of the external load balancer as an additional apiServerCertSANs). - -After some testing, this option was considered too complex/not -adequate for the initial set up of the desired `kubeadm join --master` dynamic workflow; -this can be better explained by looking at two implementation based on this option: - -* [kubernetes the hard way](https://github.com/kelseyhightower/kubernetes-the-hard-way) - uses the IP address of all master nodes for creating a new API server - serving certificate before bootstrapping the cluster, but this approach - can't be used if considering the desired dynamic workflow. - -* [Creating HA cluster with kubeadm](https://kubernetes.io/docs/setup/independent/high-availability/) - uses a different API server serving certificate for each master, and this - could increases the complexity of the first implementation because: - * the `kubeadm join --master` flow has to generate different certificates for - each master node. - * self-hosting control plane, should be adapted to mount different certificates - for each master. - * bootstrap check pointing should be designed to checkpoint a different - set of certificates for each master. - * upgrades should be adapted to consider master specific settings |
