diff options
| author | Quinton Hoole <quinton.hoole@huawei.com> | 2017-11-16 15:08:35 -0800 |
|---|---|---|
| committer | Quinton Hoole <quinton.hoole@huawei.com> | 2017-11-16 15:08:35 -0800 |
| commit | 3d1ceff2386e798c46117e01ccace5acb387a99d (patch) | |
| tree | 55ae3182ecef961f107ea8efedc799835a4fa1c0 | |
| parent | ea84a92a0bc0e539744c2e45ecba0a1b0f664756 (diff) | |
Add automatic line wrapping (used Emacs markdown-mode).
| -rw-r--r-- | contributors/devel/architectural-roadmap.md | 920 |
1 files changed, 698 insertions, 222 deletions
diff --git a/contributors/devel/architectural-roadmap.md b/contributors/devel/architectural-roadmap.md index 01c2cd2e..5d50a2b9 100644 --- a/contributors/devel/architectural-roadmap.md +++ b/contributors/devel/architectural-roadmap.md @@ -19,101 +19,230 @@ Let’s start as ordinary doc text and export to markdown later. ## Background -Kubernetes is a platform for deploying and managing containers. For more information about the mission, scope, and design of Kubernetes, see [What Is Kubernetes](http://kubernetes.io/docs/whatisk8s/) and the [architectural overview](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/architecture.md). The latter also describes the current breakdown of the system into components/processes. - -Contributors to Kubernetes need to know what functionality they can rely upon when adding new features to different parts of the system. - -Additionally, one of the problems that faces platforms like Kubernetes is to define what is "in" and what is “out”. While Kubernetes must offer some base functionality on which users can rely when running their containerized applications or building their extensions, Kubernetes cannot and should not try to solve every problem that users have. Adding to the difficulty is that, unlike some other types of infrastructure services such as databases, load balancers, or messaging systems, there are few obvious, natural boundaries for the “built-in” functionality. Consequently, reasonable minds can disagree on exactly where the boundaries lie or what principles guide the decisions. - -This document, which was inspired by [similar efforts from the community](https://docs.google.com/document/d/1J6yCsPtggsSx_yfqNenb3xxBK22k43c5XZkVQmS38Mk/edit), aims to clarify the intentions of Kubernetes’s lead architects. It is currently somewhat aspirational, and is intended to be a blueprint for ongoing and future development. NIY marks items not yet implemented. +Kubernetes is a platform for deploying and managing containers. For +more information about the mission, scope, and design of Kubernetes, +see [What Is Kubernetes](http://kubernetes.io/docs/whatisk8s/) and the +[architectural +overview](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/architecture.md). The +latter also describes the current breakdown of the system into +components/processes. + +Contributors to Kubernetes need to know what functionality they can +rely upon when adding new features to different parts of the system. + +Additionally, one of the problems that faces platforms like Kubernetes +is to define what is "in" and what is “out”. While Kubernetes must +offer some base functionality on which users can rely when running +their containerized applications or building their extensions, +Kubernetes cannot and should not try to solve every problem that users +have. Adding to the difficulty is that, unlike some other types of +infrastructure services such as databases, load balancers, or +messaging systems, there are few obvious, natural boundaries for the +“built-in” functionality. Consequently, reasonable minds can disagree +on exactly where the boundaries lie or what principles guide the +decisions. + +This document, which was inspired by [similar efforts from the +community](https://docs.google.com/document/d/1J6yCsPtggsSx_yfqNenb3xxBK22k43c5XZkVQmS38Mk/edit), +aims to clarify the intentions of Kubernetes’s lead architects. It is +currently somewhat aspirational, and is intended to be a blueprint for +ongoing and future development. NIY marks items not yet implemented. [Presentation version](https://docs.google.com/presentation/d/1oPZ4rznkBe86O4rPwD2CWgqgMuaSXguIBHIE7Y0TKVc/edit#slide=id.p) ## System Layers -Just as Linux has a kernel, core system libraries, and optional userland tools, Kubernetes also has "layers" of functionality and tools. An understanding of these layers is important for developers of Kubernetes functionality to determine which cross-concept dependencies should be allowed and which should not. +Just as Linux has a kernel, core system libraries, and optional +userland tools, Kubernetes also has "layers" of functionality and +tools. An understanding of these layers is important for developers of +Kubernetes functionality to determine which cross-concept dependencies +should be allowed and which should not. -Kubernetes APIs, concepts, and functionality can be sorted into the following layers. +Kubernetes APIs, concepts, and functionality can be sorted into the +following layers. ### The Nucleus: API and Execution Essential API and execution machinery. -These APIs and functions, implemented by the upstream Kubernetes codebase, comprise the bare minimum set of features and concepts needed to build up the higher-order layers of the system. These pieces are thoroughly specified and documented, and every containerized application will use them. Developers can safely assume they are present. +These APIs and functions, implemented by the upstream Kubernetes +codebase, comprise the bare minimum set of features and concepts +needed to build up the higher-order layers of the system. These +pieces are thoroughly specified and documented, and every +containerized application will use them. Developers can safely assume +they are present. -They should eventually become stable and "boring". However, Linux has continuously evolved over its 25-year lifetime and major changes, including NPTL (which required rebuilding all applications) and features that enable containers (which are changing how all applications are run), have been added over the past 10 years. It will take some time for Kubernetes to stabilize, as well. +They should eventually become stable and "boring". However, Linux has +continuously evolved over its 25-year lifetime and major changes, +including NPTL (which required rebuilding all applications) and +features that enable containers (which are changing how all +applications are run), have been added over the past 10 years. It will +take some time for Kubernetes to stabilize, as well. #### The API and cluster control plane -Kubernetes clusters provide a collection of similar REST APIs, exposed by the Kubernetes [API server](https://kubernetes.io/docs/admin/kube-apiserver/), supporting primarily CRUD operations on (mostly) persistent resources. These APIs serve as the hub of its control plane. - -REST APIs that follow Kubernetes API conventions (path conventions, standard metadata, …) are automatically able to benefit from shared API services (authorization, authentication, audit logging) and generic client code can interact with them (CLI and UI discoverability, generic status reporting in UIs and CLIs, generic waiting conventions for orchestration tools, watching, label selection). - -The lowest layer of the system also needs to support extension mechanisms necessary to add the functionality provided by the higher layers. Additionally, this layer must be suitable for use both in single-purpose clusters and highly tenanted clusters. The nucleus should provide sufficient flexibility that higher-level APIs could introduce new scopes (sets of resources) without compromising the security model of the cluster. +Kubernetes clusters provide a collection of similar REST APIs, exposed +by the Kubernetes [API +server](https://kubernetes.io/docs/admin/kube-apiserver/), supporting +primarily CRUD operations on (mostly) persistent resources. These APIs +serve as the hub of its control plane. + +REST APIs that follow Kubernetes API conventions (path conventions, +standard metadata, …) are automatically able to benefit from shared +API services (authorization, authentication, audit logging) and +generic client code can interact with them (CLI and UI +discoverability, generic status reporting in UIs and CLIs, generic +waiting conventions for orchestration tools, watching, label +selection). + +The lowest layer of the system also needs to support extension +mechanisms necessary to add the functionality provided by the higher +layers. Additionally, this layer must be suitable for use both in +single-purpose clusters and highly tenanted clusters. The nucleus +should provide sufficient flexibility that higher-level APIs could +introduce new scopes (sets of resources) without compromising the +security model of the cluster. Kubernetes cannot function without this basic API machinery and semantics, including: -* [Authentication](https://kubernetes.io/docs/admin/authentication/): The authentication scheme is a critical function that must be agreed upon by both the server and clients. The API server supports basic auth (username/password) (NOTE: We will likely deprecate basic auth eventually.), X.509 client certificates, OpenID Connect tokens, and bearer tokens, any of which (but not all) may be disabled. Clients should support all forms supported by [kubeconfig](https://kubernetes.io/docs/user-guide/kubeconfig-file/). Third-party authentication systems may implement the TokenReview API and configure the authentication webhook to call it, though choice of a non-standard authentication mechanism may limit the number of usable clients. - - * The TokenReview API (same schema as the hook) enables external authentication checks, such as by Kubelet +* [Authentication](https://kubernetes.io/docs/admin/authentication/): + The authentication scheme is a critical function that must be agreed + upon by both the server and clients. The API server supports basic + auth (username/password) (NOTE: We will likely deprecate basic auth + eventually.), X.509 client certificates, OpenID Connect tokens, and + bearer tokens, any of which (but not all) may be disabled. Clients + should support all forms supported by + [kubeconfig](https://kubernetes.io/docs/user-guide/kubeconfig-file/). Third-party + authentication systems may implement the TokenReview API and + configure the authentication webhook to call it, though choice of a + non-standard authentication mechanism may limit the number of usable + clients. + + * The TokenReview API (same schema as the hook) enables external + authentication checks, such as by Kubelet * Pod identity is provided by "[service accounts](https://kubernetes.io/docs/user-guide/service-accounts/)" - * The ServiceAccount API, including default ServiceAccount secret creation via a controller and injection via an admission controller. + * The ServiceAccount API, including default ServiceAccount + secret creation via a controller and injection via an + admission controller. -* [Authorization](https://kubernetes.io/docs/admin/authorization/): Third-party authorization systems may implement the SubjectAccessReview API and configure the authorization webhook to call it. +* [Authorization](https://kubernetes.io/docs/admin/authorization/): + Third-party authorization systems may implement the + SubjectAccessReview API and configure the authorization webhook to + call it. - * The SubjectAccessReview (same schema as the hook), LocalSubjectAccessReview, and SelfSubjectAccessReview APIs enable external permission checks, such as by Kubelet and other controllers + * The SubjectAccessReview (same schema as the hook), + LocalSubjectAccessReview, and SelfSubjectAccessReview APIs + enable external permission checks, such as by Kubelet and other + controllers -* REST semantics, watch, durability and consistency guarantees, API versioning, defaulting, and validation +* REST semantics, watch, durability and consistency guarantees, API + versioning, defaulting, and validation * NIY: API deficiencies that need to be addressed: - * [Confusing defaulting behavior](https://github.com/kubernetes/kubernetes/issues/34292) - - * [Lack of guarantees](https://github.com/kubernetes/kubernetes/issues/30698) - - * [Orchestration support](https://github.com/kubernetes/kubernetes/issues/34363) - - * [Support for event-driven automation](https://github.com/kubernetes/kubernetes/issues/3692) - - * [Clean teardown](https://github.com/kubernetes/kubernetes/issues/4630) - -* NIY: Built-in admission-control semantics, [synchronous admission-control hooks, and asynchronous resource initialization](https://github.com/kubernetes/community/pull/132) -- it needs to be possible for distribution vendors, system integrators, and cluster administrators to impose additional policies and automation - -* NIY: API registration and discovery, including API aggregation, to register additional APIs, to find out which APIs are supported, and to get the details of supported operations, payloads, and result schemas - -* NIY: ThirdPartyResource and ThirdPartyResourceData APIs (or their successors), to support third-party storage and extension APIs - -* NIY: An extensible and HA-compatible replacement for the /componentstatuses API to determine whether the cluster is fully turned up and operating correctly: ExternalServiceProvider (component registration) - -* The Endpoints API (and future evolutions thereof), which is needed for component registration, self-publication of API server endpoints, and HA rendezvous, as well as application-layer target discovery - -* The Namespace API, which is the means of scoping user resources, and namespace lifecycle (e.g., bulk deletion) - -* The Event API, which is the means of reporting significant occurrences, such as status changes and errors, and Event garbage collection - -* NIY: Cascading-deletion garbage collector, finalization, and orphaning - -* NIY: We need a built-in [add-on manager](https://github.com/kubernetes/kubernetes/issues/23233) (not unlike [static pod manifests](https://kubernetes.io/docs/admin/static-pods/), but at the cluster level) so that we can automatically add self-hosted components and dynamic configuration to the cluster, and so we can factor out functionality from existing components in running clusters. At its core would be a pull-based declarative reconciler, as provided by the [current add-on manager](https://github.com/kubernetes/kubernetes/tree/master/cluster/addons/addon-manager) and as described in the [whitebox app management doc](https://docs.google.com/document/d/1S3l2F40LCwFKg6WG0srR6056IiZJBwDmDvzHWRffTWk/edit#heading=h.gh6cf96u8mlr). This would be easier once we have [apply support in the API](https://github.com/kubernetes/kubernetes/issues/17333). - - * Add-ons should be cluster services that are managed as part of the cluster and that provide the same degree of [multi-tenancy](https://docs.google.com/document/d/148Lbe1w1xmUjMx7cIMWTVQmSjJ8qA77HIGCrdB-ugoc/edit) as that provided by the cluster. - - * They may, but are not required to, run in the kube-system namespace, but the chosen namespace needs to be chosen such that it won't conflict with users' namespaces. - -* The API server acts as the gateway to the cluster. By definition, the API server must be accessible by clients from outside the cluster, whereas the nodes, and certainly pods, may not be. Clients authenticate the API server and also use it as a bastion and proxy/tunnel to nodes and pods (and services), using /proxy and /portforward APIs. - -* TBD: The [CertificateSigningRequest](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/kubelet-tls-bootstrap.md) API, to enable credential generation, in particular to mint Kubelet credentials - -Ideally, this nuclear API server would only support the minimum required APIs, and additional functionality would be added via [aggregation](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/aggregated-api-servers.md), hooks, initializers, and other extension mechanisms. - -Note that the centralized asynchronous controllers, such as garbage collection, are currently run by a separate process, called the [Controller Manager](https://kubernetes.io/docs/admin/kube-controller-manager/). - -/healthz and /metrics endpoints may be used for cluster management mechanisms, but are not considered part of the supported API surface, and should not be used by clients generally in order to detect cluster presence. The /version endpoint should be used instead. + * [Confusing defaulting + behavior](https://github.com/kubernetes/kubernetes/issues/34292) + * [Lack of + guarantees](https://github.com/kubernetes/kubernetes/issues/30698) + * [Orchestration + support](https://github.com/kubernetes/kubernetes/issues/34363)\ + * [Support for event-driven + automation](https://github.com/kubernetes/kubernetes/issues/3692) + * [Clean + teardown](https://github.com/kubernetes/kubernetes/issues/4630) + +* NIY: Built-in admission-control semantics, [synchronous + admission-control hooks, and asynchronous resource + initialization](https://github.com/kubernetes/community/pull/132) -- + it needs to be possible for distribution vendors, system + integrators, and cluster administrators to impose additional + policies and automation + +* NIY: API registration and discovery, including API aggregation, to + register additional APIs, to find out which APIs are supported, and + to get the details of supported operations, payloads, and result + schemas + +* NIY: ThirdPartyResource and ThirdPartyResourceData APIs (or their + successors), to support third-party storage and extension APIs + +* NIY: An extensible and HA-compatible replacement for the + /componentstatuses API to determine whether the cluster is fully + turned up and operating correctly: ExternalServiceProvider + (component registration) + +* The Endpoints API (and future evolutions thereof), which is needed + for component registration, self-publication of API server + endpoints, and HA rendezvous, as well as application-layer target + discovery + +* The Namespace API, which is the means of scoping user resources, and + namespace lifecycle (e.g., bulk deletion) + +* The Event API, which is the means of reporting significant + occurrences, such as status changes and errors, and Event garbage + collection + +* NIY: Cascading-deletion garbage collector, finalization, and + orphaning + +* NIY: We need a built-in [add-on + manager](https://github.com/kubernetes/kubernetes/issues/23233) (not + unlike [static pod + manifests](https://kubernetes.io/docs/admin/static-pods/), but at + the cluster level) so that we can automatically add self-hosted + components and dynamic configuration to the cluster, and so we can + factor out functionality from existing components in running + clusters. At its core would be a pull-based declarative reconciler, + as provided by the [current add-on + manager](https://github.com/kubernetes/kubernetes/tree/master/cluster/addons/addon-manager) + and as described in the [whitebox app management + doc](https://docs.google.com/document/d/1S3l2F40LCwFKg6WG0srR6056IiZJBwDmDvzHWRffTWk/edit#heading=h.gh6cf96u8mlr). This + would be easier once we have [apply support in the + API](https://github.com/kubernetes/kubernetes/issues/17333). + + * Add-ons should be cluster services that are managed as part of + the cluster and that provide the same degree of + [multi-tenancy](https://docs.google.com/document/d/148Lbe1w1xmUjMx7cIMWTVQmSjJ8qA77HIGCrdB-ugoc/edit) + as that provided by the cluster. + + * They may, but are not required to, run in the kube-system + namespace, but the chosen namespace needs to be chosen such that + it won't conflict with users' namespaces. + +* The API server acts as the gateway to the cluster. By definition, + the API server must be accessible by clients from outside the + cluster, whereas the nodes, and certainly pods, may not be. Clients + authenticate the API server and also use it as a bastion and + proxy/tunnel to nodes and pods (and services), using /proxy and + /portforward APIs. + +* TBD: The + [CertificateSigningRequest](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/kubelet-tls-bootstrap.md) + API, to enable credential generation, in particular to mint Kubelet + credentials + +Ideally, this nuclear API server would only support the minimum +required APIs, and additional functionality would be added via +[aggregation](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/aggregated-api-servers.md), +hooks, initializers, and other extension mechanisms. + +Note that the centralized asynchronous controllers, such as garbage +collection, are currently run by a separate process, called the +[Controller +Manager](https://kubernetes.io/docs/admin/kube-controller-manager/). + +/healthz and /metrics endpoints may be used for cluster management +mechanisms, but are not considered part of the supported API surface, +and should not be used by clients generally in order to detect cluster +presence. The /version endpoint should be used instead. The API server depends on the following external components: -* Persistent state store (etcd, or equivalent; perhaps multiple instances) +* Persistent state store (etcd, or equivalent; perhaps multiple + instances) The API server may depend on: @@ -127,21 +256,38 @@ The API server may depend on: #### Execution -The most important and most prominent controller in Kubernetes is the [Kubelet](https://kubernetes.io/docs/admin/kubelet/), which is the primary implementer of the Pod and Node APIs that drive the container execution layer. Without these APIs, Kubernetes would just be a CRUD-oriented REST application framework backed by a key-value store (and perhaps the API machinery will eventually be spun out as an independent project). +The most important and most prominent controller in Kubernetes is the +[Kubelet](https://kubernetes.io/docs/admin/kubelet/), which is the +primary implementer of the Pod and Node APIs that drive the container +execution layer. Without these APIs, Kubernetes would just be a +CRUD-oriented REST application framework backed by a key-value store +(and perhaps the API machinery will eventually be spun out as an +independent project). -Kubernetes executes isolated application containers as its default, native mode of execution. Kubernetes provides [Pods](https://kubernetes.io/docs/user-guide/pods/) that can host multiple containers and storage volumes as its fundamental execution primitive. +Kubernetes executes isolated application containers as its default, +native mode of execution. Kubernetes provides +[Pods](https://kubernetes.io/docs/user-guide/pods/) that can host +multiple containers and storage volumes as its fundamental execution +primitive. The Kubelet API surface and semantics include: * The Pod API, the Kubernetes execution primitive, including: - * Pod feasibility-based admission control based on policies in the Pod API (resource requests, node selector, node/pod affinity and anti-affinity, taints and tolerations). API admission control may reject pods or add additional scheduling constraints to them, but Kubelet is the final arbiter of what pods can and cannot run on a given node, not the schedulers or DaemonSets. + * Pod feasibility-based admission control based on policies in the + Pod API (resource requests, node selector, node/pod affinity and + anti-affinity, taints and tolerations). API admission control + may reject pods or add additional scheduling constraints to + them, but Kubelet is the final arbiter of what pods can and + cannot run on a given node, not the schedulers or DaemonSets. * Container and volume semantics and lifecycle - * Pod IP address allocation (a routable IP address per pod is required) + * Pod IP address allocation (a routable IP address per pod is + required) - * A mechanism (i.e., ServiceAccount) to tie a Pod to a specific security scope + * A mechanism (i.e., ServiceAccount) to tie a Pod to a specific + security scope * Volume sources: @@ -155,37 +301,51 @@ The Kubelet API surface and semantics include: * downwardAPI - * NIY: [Container and image volumes](http://issues.k8s.io/831) (and deprecate gitRepo) + * NIY: [Container and image volumes](http://issues.k8s.io/831) + (and deprecate gitRepo) - * NIY: Claims against local storage, so that complex templating or separate configs are not needed for dev vs. prod application manifests + * NIY: Claims against local storage, so that complex + templating or separate configs are not needed for dev + vs. prod application manifests - * flexVolume (which should replace built-in cloud-provider-specific volumes) + * flexVolume (which should replace built-in + cloud-provider-specific volumes) - * Subresources: binding, status, exec, logs, attach, portforward, proxy + * Subresources: binding, status, exec, logs, attach, portforward, + proxy -* NIY: [Checkpointing of API resources](https://github.com/kubernetes/kubernetes/issues/489) for availability and bootstrapping +* NIY: [Checkpointing of API + resources](https://github.com/kubernetes/kubernetes/issues/489) for + availability and bootstrapping * Container image and log lifecycles -* The Secret API, and mechanisms to enable third-party secret management +* The Secret API, and mechanisms to enable third-party secret + management -* The ConfigMap API, for [component configuration](https://groups.google.com/forum/#!searchin/kubernetes-dev/component$20config%7Csort:relevance/kubernetes-dev/wtXaoHOiSfg/QFW5Ca9YBgAJ) as well as Pod references +* The ConfigMap API, for [component + configuration](https://groups.google.com/forum/#!searchin/kubernetes-dev/component$20config%7Csort:relevance/kubernetes-dev/wtXaoHOiSfg/QFW5Ca9YBgAJ) + as well as Pod references * The Node API, hosts for Pods - * May only be visible to cluster administrators in some configurations + * May only be visible to cluster administrators in some + configurations * Node and pod networks and their controllers (route controller) * Node inventory, health, and reachability (node controller) - * Cloud-provider-specific node inventory functions should be split into a provider-specific controller. + * Cloud-provider-specific node inventory functions should be split + into a provider-specific controller. * Terminated-pod garbage collection * Volume controller - * Cloud-provider-specific attach/detach logic should be split into a provider-specific controller. Need a way to extract provider-specific volume sources from the API. + * Cloud-provider-specific attach/detach logic should be split into + a provider-specific controller. Need a way to extract + provider-specific volume sources from the API. * The PersistentVolume API @@ -193,9 +353,23 @@ The Kubelet API surface and semantics include: * The PersistentVolumeClaim API -Again, centralized asynchronous functions, such as terminated-pod garbage collection, are performed by the Controller Manager. - -The Controller Manager and Kubelet currently call out to a "cloud provider" interface to query information from the infrastructure layer and/or to manage infrastructure resources. However, [we’re working to extract those touchpoints](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/cloud-provider-refactoring.md) ([issue](https://github.com/kubernetes/kubernetes/issues/2770)) into external components. The intended model is that unsatisfiable application/container/OS-level requests (e.g., Pods, PersistentVolumeClaims) serve as a signal to external “dynamic provisioning” systems, which would make infrastructure available to satisfy those requests and represent them in Kubernetes using infrastructure resources (e.g., Nodes, PersistentVolumes), so that Kubernetes could bind the requests and infrastructure resources together. +Again, centralized asynchronous functions, such as terminated-pod +garbage collection, are performed by the Controller Manager. + +The Controller Manager and Kubelet currently call out to a "cloud +provider" interface to query information from the infrastructure layer +and/or to manage infrastructure resources. However, [we’re working to +extract those +touchpoints](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/cloud-provider-refactoring.md) +([issue](https://github.com/kubernetes/kubernetes/issues/2770)) into +external components. The intended model is that unsatisfiable +application/container/OS-level requests (e.g., Pods, +PersistentVolumeClaims) serve as a signal to external “dynamic +provisioning” systems, which would make infrastructure available to +satisfy those requests and represent them in Kubernetes using +infrastructure resources (e.g., Nodes, PersistentVolumes), so that +Kubernetes could bind the requests and infrastructure resources +together. The Kubelet depends on the following external components: @@ -209,7 +383,8 @@ The Kubelet depends on the following external components: And may depend on: -* NIY: Cloud-provider node plug-in, to provide node identity, topology, etc. +* NIY: Cloud-provider node plug-in, to provide node identity, + topology, etc. * NIY: Third-party secret management system (e.g., Vault) @@ -223,45 +398,88 @@ Accepted layering violations: ### The Application Layer: Deployment and Routing -The application management and composition layer, providing self-healing, scaling, application lifecycle management, service discovery, load balancing, and routing -- also known as orchestration and the service fabric. - -These APIs and functions are REQUIRED for any distribution of Kubernetes. Kubernetes should provide default implementations for these APIs, but replacements of the implementations of any or all of these functions are permitted, provided the conformance tests pass. Without these, most containerized applications will not run, and few, if any, published examples will work. The vast majority of containerized applications will use one or more of these. - -Kubernetes’s API provides IaaS-like container-centric primitives and also lifecycle controllers to support orchestration (self-healing, scaling, updates, termination) of all major categories of workloads. These application management, composition, discovery, and routing APIs and functions include: - -* A default scheduler, which implements the scheduling policies in the Pod API: resource requests, nodeSelector, node and pod affinity/anti-affinity, taints and tolerations. The scheduler runs as a separate process, on or outside the cluster. - -* NIY: A [rescheduler](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/rescheduling.md), to reactively and proactively delete scheduled pods so that they can be replaced and rescheduled to other nodes. - -* Continuously running applications: These application types should all support rollouts (and rollbacks) via declarative updates, cascading deletion, and orphaning/adoption. Other than DaemonSet, all should support horizontal scaling. - - * The Deployment API, which orchestrates updates of stateless applications, including subresources (status, scale, rollback) - - * The ReplicaSet API, for simple fungible/stateless applications, especially specific versions of Deployment pod templates, including subresources (status, scale) - - * The DaemonSet API, for cluster services, including subresources (status) - - * The StatefulSet API, for stateful applications, including subresources (status, scale) +The application management and composition layer, providing +self-healing, scaling, application lifecycle management, service +discovery, load balancing, and routing -- also known as orchestration +and the service fabric. + +These APIs and functions are REQUIRED for any distribution of +Kubernetes. Kubernetes should provide default implementations for +these APIs, but replacements of the implementations of any or all of +these functions are permitted, provided the conformance tests +pass. Without these, most containerized applications will not run, and +few, if any, published examples will work. The vast majority of +containerized applications will use one or more of these. + +Kubernetes’s API provides IaaS-like container-centric primitives and +also lifecycle controllers to support orchestration (self-healing, +scaling, updates, termination) of all major categories of +workloads. These application management, composition, discovery, and +routing APIs and functions include: + +* A default scheduler, which implements the scheduling policies in the + Pod API: resource requests, nodeSelector, node and pod + affinity/anti-affinity, taints and tolerations. The scheduler runs + as a separate process, on or outside the cluster. + +* NIY: A + [rescheduler](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/rescheduling.md), + to reactively and proactively delete scheduled pods so that they can + be replaced and rescheduled to other nodes. + +* Continuously running applications: These application types should + all support rollouts (and rollbacks) via declarative updates, + cascading deletion, and orphaning/adoption. Other than DaemonSet, + all should support horizontal scaling. + + * The Deployment API, which orchestrates updates of stateless + applications, including subresources (status, scale, rollback) + + * The ReplicaSet API, for simple fungible/stateless + applications, especially specific versions of Deployment pod + templates, including subresources (status, scale) + + * The DaemonSet API, for cluster services, including subresources + (status) + + * The StatefulSet API, for stateful applications, including + subresources (status, scale) * The PodTemplate API, used by DaemonSet and StatefulSet to record change history -* Terminating batch applications: These should include support for automatic culling of terminated jobs (NIY). +* Terminating batch applications: These should include support for + automatic culling of terminated jobs (NIY). - * The Job API ([GC discussion](https://github.com/kubernetes/kubernetes/issues/30243)) + * The Job API ([GC + discussion](https://github.com/kubernetes/kubernetes/issues/30243)) * The CronJob API * Discovery, load balancing, and routing - * The Service API, including allocation of cluster IPs, repair on service allocation maps, load balancing via kube-proxy or equivalent, and automatic Endpoints generation, maintenance, and deletion for services. NIY: LoadBalancer service support is OPTIONAL, but conformance tests must pass if it is supported. If/when they are added, support for [LoadBalancer and LoadBalancerClaim APIs](https://github.com/kubernetes/community/pull/275) should be present if and only if the distribution supports LoadBalancer services. - - * The Ingress API, including [internal L7](https://docs.google.com/document/d/1ILXnyU5D5TbVRwmoPnC__YMO9T5lmiA_UiU8HtgSLYk/edit?ts=585421fc) (NIY) - - * Service DNS. DNS, using the [official Kubernetes schema](https://github.com/kubernetes/dns/blob/master/docs/specification.md), is required. + * The Service API, including allocation of cluster IPs, repair on + service allocation maps, load balancing via kube-proxy or + equivalent, and automatic Endpoints generation, maintenance, and + deletion for services. NIY: LoadBalancer service support is + OPTIONAL, but conformance tests must pass if it is + supported. If/when they are added, support for [LoadBalancer and + LoadBalancerClaim + APIs](https://github.com/kubernetes/community/pull/275) should + be present if and only if the distribution supports LoadBalancer + services. + + * The Ingress API, including [internal + L7](https://docs.google.com/document/d/1ILXnyU5D5TbVRwmoPnC__YMO9T5lmiA_UiU8HtgSLYk/edit?ts=585421fc) + (NIY) + + * Service DNS. DNS, using the [official Kubernetes + schema](https://github.com/kubernetes/dns/blob/master/docs/specification.md), + is required. The application layer may depend on: -* Identity provider (to-cluster identities and/or to-application identities) +* Identity provider (to-cluster identities and/or to-application + identities) * NIY: Cloud-provider controller implementation @@ -273,19 +491,29 @@ The application layer may depend on: * Replacement for kube-proxy -* Replacement and/or [auxiliary](https://github.com/kubernetes/kubernetes/issues/31571) workload controllers, especially for extended rollout strategies +* Replacement and/or + [auxiliary](https://github.com/kubernetes/kubernetes/issues/31571) + workload controllers, especially for extended rollout strategies ### The Governance Layer: Automation and Policy Enforcement Policy enforcement and higher-level automation. -These APIs and functions should be optional for running applications, and should be achievable via other solutions. +These APIs and functions should be optional for running applications, +and should be achievable via other solutions. -Each supported API/function should be applicable to a large fraction of enterprise operations, security, and/or governance scenarios. +Each supported API/function should be applicable to a large fraction +of enterprise operations, security, and/or governance scenarios. -It needs to be possible to configure and discover default policies for the cluster (perhaps similar to [Openshift’s new project template mechanism](https://docs.openshift.org/latest/admin_guide/managing_projects.html#template-for-new-projects), but supporting multiple policy templates, such as for system namespaces vs. user ones), to support at least the following use cases: +It needs to be possible to configure and discover default policies for +the cluster (perhaps similar to [Openshift’s new project template +mechanism](https://docs.openshift.org/latest/admin_guide/managing_projects.html#template-for-new-projects), +but supporting multiple policy templates, such as for system +namespaces vs. user ones), to support at least the following use +cases: -* Is this a: (source: [multi-tenancy working doc](https://docs.google.com/document/d/1IoINuGz8eR8Awk4o7ePKuYv9wjXZN5nQM_RFlVIhA4c/edit?usp=sharing)) +* Is this a: (source: [multi-tenancy working + doc](https://docs.google.com/document/d/1IoINuGz8eR8Awk4o7ePKuYv9wjXZN5nQM_RFlVIhA4c/edit?usp=sharing)) * Single tenant / single user cluster @@ -317,27 +545,37 @@ Automation APIs and functions: * NIY: The vertical pod autoscaling API(s) -* [Cluster autoscaling and/or node provisioning](https://github.com/kubernetes/contrib/tree/master/cluster-autoscaler) +* [Cluster autoscaling and/or node + provisioning](https://github.com/kubernetes/contrib/tree/master/cluster-autoscaler) * The PodDisruptionBudget API * Dynamic volume provisioning, for at least one volume source type - * The StorageClass API, implemented at least for the default volume type + * The StorageClass API, implemented at least for the default + volume type * Dynamic load-balancer provisioning -* NIY: The [PodPreset](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/pod-preset.md) API +* NIY: The + [PodPreset](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/pod-preset.md) + API -* NIY: The [service broker/catalog](https://github.com/kubernetes-incubator/service-catalog) APIs +* NIY: The [service + broker/catalog](https://github.com/kubernetes-incubator/service-catalog) + APIs -* NIY: The [Template](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/templates.md) and TemplateInstance APIs +* NIY: The + [Template](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/templates.md) + and TemplateInstance APIs Policy APIs and functions: -* [Authorization](https://kubernetes.io/docs/admin/authorization/): The ABAC and RBAC authorization policy schemes. +* [Authorization](https://kubernetes.io/docs/admin/authorization/): + The ABAC and RBAC authorization policy schemes. - * RBAC, if used, is configured using a number of APIs: Role, RoleBinding, ClusterRole, ClusterRoleBinding + * RBAC, if used, is configured using a number of APIs: Role, + RoleBinding, ClusterRole, ClusterRoleBinding * The LimitRange API @@ -353,7 +591,8 @@ The management layer may depend on: * Network policy enforcement mechanism -* Replacement and/or additional horizontal and vertical pod autoscalers +* Replacement and/or additional horizontal and vertical pod + autoscalers * [Cluster autoscaler and/or node provisioner](https://github.com/kubernetes/contrib/tree/master/cluster-autoscaler) @@ -367,11 +606,23 @@ The management layer may depend on: ### The Interface Layer: Libraries and Tools -These mechanisms are suggested for distributions, and also are available for download and installation independently by users. They include commonly used libraries, tools, systems, and UIs developed by official Kubernetes projects, though other tools may be used to accomplish the same tasks. They may be used by published examples. +These mechanisms are suggested for distributions, and also are +available for download and installation independently by users. They +include commonly used libraries, tools, systems, and UIs developed by +official Kubernetes projects, though other tools may be used to +accomplish the same tasks. They may be used by published examples. -Commonly used libraries, tools, systems, and UIs developed under some Kubernetes-owned GitHub org. +Commonly used libraries, tools, systems, and UIs developed under some +Kubernetes-owned GitHub org. -* Kubectl -- We see kubectl as one of many client tools, rather than as a privileged one. Our aim is to make kubectl thinner, by [moving commonly used non-trivial functionality into the API](https://github.com/kubernetes/kubernetes/issues/12143). This is necessary in order to facilitate correct operation across Kubernetes releases, to facilitate API extensibility, to preserve the API-centric Kubernetes ecosystem model, and to simplify other clients, especially non-Go clients. +* Kubectl -- We see kubectl as one of many client tools, rather than + as a privileged one. Our aim is to make kubectl thinner, by [moving + commonly used non-trivial functionality into the + API](https://github.com/kubernetes/kubernetes/issues/12143). This is + necessary in order to facilitate correct operation across Kubernetes + releases, to facilitate API extensibility, to preserve the + API-centric Kubernetes ecosystem model, and to simplify other + clients, especially non-Go clients. * Client libraries (e.g., client-go, client-python) @@ -391,79 +642,212 @@ These components may depend on: These things are not really "part of" Kubernetes at all. -There are a number of areas where we have already defined [clear-cut boundaries](https://kubernetes.io/docs/whatisk8s#kubernetes-is-not) for Kubernetes. - -While Kubernetes must offer functionality commonly needed to deploy and manage containerized applications, as a general rule, we preserve user choice in areas complementing Kubernetes’s general-purpose orchestration functionality, especially areas that have their own competitive landscapes comprised of numerous solutions satisfying diverse needs and preferences. Kubernetes may provide plug-in APIs for such solutions, or may expose general-purpose APIs that could be implemented by multiple backends, or expose APIs that such solutions can target. Sometimes, the functionality can compose cleanly with Kubernetes without explicit interfaces. - -Additionally, to be considered part of Kubernetes, a component would need to follow Kubernetes design conventions. For instance, systems whose primary interfaces are domain-specific languages (e.g., [Puppet](https://docs.puppet.com/puppet/4.9/lang_summary.html), [Open Policy Agent](http://www.openpolicyagent.org/)) aren’t compatible with the Kubernetes API-centric approach, and are perfectly fine to use with Kubernetes, but wouldn’t be considered to be part of Kubernetes. Similarly, solutions designed to support multiple platforms likely wouldn’t follow Kubernetes API conventions, and therefore wouldn’t be considered to be part of Kubernetes. - -* Inside container images: Kubernetes is not opinionated about the contents of container images -- if it lives inside the container image, it lives outside Kubernetes. This includes, for example, language-specific application frameworks. +There are a number of areas where we have already defined [clear-cut +boundaries](https://kubernetes.io/docs/whatisk8s#kubernetes-is-not) +for Kubernetes. + +While Kubernetes must offer functionality commonly needed to deploy +and manage containerized applications, as a general rule, we preserve +user choice in areas complementing Kubernetes’s general-purpose +orchestration functionality, especially areas that have their own +competitive landscapes comprised of numerous solutions satisfying +diverse needs and preferences. Kubernetes may provide plug-in APIs for +such solutions, or may expose general-purpose APIs that could be +implemented by multiple backends, or expose APIs that such solutions +can target. Sometimes, the functionality can compose cleanly with +Kubernetes without explicit interfaces. + +Additionally, to be considered part of Kubernetes, a component would +need to follow Kubernetes design conventions. For instance, systems +whose primary interfaces are domain-specific languages (e.g., +[Puppet](https://docs.puppet.com/puppet/4.9/lang_summary.html), [Open +Policy Agent](http://www.openpolicyagent.org/)) aren’t compatible with +the Kubernetes API-centric approach, and are perfectly fine to use +with Kubernetes, but wouldn’t be considered to be part of +Kubernetes. Similarly, solutions designed to support multiple +platforms likely wouldn’t follow Kubernetes API conventions, and +therefore wouldn’t be considered to be part of Kubernetes. + +* Inside container images: Kubernetes is not opinionated about the + contents of container images -- if it lives inside the container + image, it lives outside Kubernetes. This includes, for example, + language-specific application frameworks. * On top of Kubernetes - * Continuous integration and deployment: Kubernetes is unopinionated in the source-to-image space. It does not deploy source code and does not build your application. Continuous Integration (CI) and continuous deployment workflows are areas where different users and projects have their own requirements and preferences, so we aim to facilitate layering CI/CD workflows on Kubernetes but don't dictate how they should work. - - * Application middleware: Kubernetes does not provide application middleware, such as message queues and SQL databases, as built-in infrastructure. It may, however, provide general-purpose mechanisms, such as service-broker integration, to make it easier to provision, discover, and access such components. Ideally, such components would just run on Kubernetes. - - * Logging and monitoring: Kubernetes does not provide logging aggregation, comprehensive application monitoring, nor telemetry analysis and alerting systems, though such mechanisms are essential to production clusters. - - * Data-processing platforms: Spark and Hadoop are well known examples, but there are [many such systems](https://hadoopecosystemtable.github.io/). - - * [Application-specific operators](https://coreos.com/blog/introducing-operators.html): Kubernetes supports workload management for common categories of applications, but not for specific applications. - - * Platform as a Service: Kubernetes [provides a foundation](http://blog.kubernetes.io/2017/02/caas-the-foundation-for-next-gen-paas.html) for a multitude of focused, opinionated PaaSes, including DIY ones. - - * Functions as a Service: Similar to PaaS, but FaaS additionally encroaches into containers and language-specific application frameworks. - - * [Workflow orchestration](https://github.com/kubernetes/kubernetes/pull/24781#issuecomment-215914822): "Workflow" is a very broad, diverse area, with solutions typically tailored to specific use cases (data-flow graphs, data-driven processing, deployment pipelines, event-driven automation, business-process execution, iPaaS) and specific input and event sources, and often requires arbitrary code to evaluate conditions, actions, and/or failure handling. - - * [Configuration DSLs](https://github.com/kubernetes/kubernetes/pull/1007/files): Domain-specific languages do not facilitate layering higher-level APIs and tools, they usually have limited expressibility, testability, familiarity, and documentation, they promote complex configuration generation, they tend to compromise interoperability and composability, they complicate dependency management, and uses often subvert abstraction and encapsulation. - - * [Kompose](https://github.com/kubernetes-incubator/kompose): Kompose is a project-supported adaptor tool that facilitates migration to Kubernetes from Docker Compose and enables simple use cases, but doesn’t follow Kubernetes conventions and is based on a manually maintained DSL. - - * [ChatOps](https://github.com/harbur/kubebot): Also adaptor tools, for the multitude of chat services. + * Continuous integration and deployment: Kubernetes is + unopinionated in the source-to-image space. It does not deploy + source code and does not build your application. Continuous + Integration (CI) and continuous deployment workflows are areas + where different users and projects have their own requirements + and preferences, so we aim to facilitate layering CI/CD + workflows on Kubernetes but don't dictate how they should work. + + * Application middleware: Kubernetes does not provide application + middleware, such as message queues and SQL databases, as + built-in infrastructure. It may, however, provide + general-purpose mechanisms, such as service-broker integration, + to make it easier to provision, discover, and access such + components. Ideally, such components would just run on + Kubernetes. + + * Logging and monitoring: Kubernetes does not provide logging + aggregation, comprehensive application monitoring, nor telemetry + analysis and alerting systems, though such mechanisms are + essential to production clusters. + + * Data-processing platforms: Spark and Hadoop are well known + examples, but there are [many such + systems](https://hadoopecosystemtable.github.io/). + + * [Application-specific + operators](https://coreos.com/blog/introducing-operators.html): + Kubernetes supports workload management for common categories of + applications, but not for specific applications. + + * Platform as a Service: Kubernetes [provides a + foundation](http://blog.kubernetes.io/2017/02/caas-the-foundation-for-next-gen-paas.html) + for a multitude of focused, opinionated PaaSes, including DIY + ones. + + * Functions as a Service: Similar to PaaS, but FaaS additionally + encroaches into containers and language-specific application + frameworks. + + * [Workflow + orchestration](https://github.com/kubernetes/kubernetes/pull/24781#issuecomment-215914822): + "Workflow" is a very broad, diverse area, with solutions + typically tailored to specific use cases (data-flow graphs, + data-driven processing, deployment pipelines, event-driven + automation, business-process execution, iPaaS) and specific + input and event sources, and often requires arbitrary code to + evaluate conditions, actions, and/or failure handling. + + * [Configuration + DSLs](https://github.com/kubernetes/kubernetes/pull/1007/files): + Domain-specific languages do not facilitate layering + higher-level APIs and tools, they usually have limited + expressibility, testability, familiarity, and documentation, + they promote complex configuration generation, they tend to + compromise interoperability and composability, they complicate + dependency management, and uses often subvert abstraction and + encapsulation. + + * [Kompose](https://github.com/kubernetes-incubator/kompose): + Kompose is a project-supported adaptor tool that facilitates + migration to Kubernetes from Docker Compose and enables simple + use cases, but doesn’t follow Kubernetes conventions and is + based on a manually maintained DSL. + + * [ChatOps](https://github.com/harbur/kubebot): Also adaptor + tools, for the multitude of chat services. * Underlying Kubernetes - * Container runtime: Kubernetes does not provide its own container runtime, but provides an interface for plugging in the container runtime of your choice. + * Container runtime: Kubernetes does not provide its own container + runtime, but provides an interface for plugging in the container + runtime of your choice. * Image registry: Kubernetes pulls container images to the nodes. * Cluster state store: Etcd - * Network: As with the container runtime, we support an interface (CNI) that facilitates pluggability. + * Network: As with the container runtime, we support an interface + (CNI) that facilitates pluggability. * File storage: Local filesystems and network-attached storage. - * Node management: Kubernetes neither provides nor adopts any comprehensive machine configuration, maintenance, management, or self-healing systems, which typically are handled differently in different public/private clouds, for different operating systems, for mutable vs. immutable infrastructure, for shops already using tools outside of their Kubernetes clusters, etc. + * Node management: Kubernetes neither provides nor adopts any + comprehensive machine configuration, maintenance, management, or + self-healing systems, which typically are handled differently in + different public/private clouds, for different operating + systems, for mutable vs. immutable infrastructure, for shops + already using tools outside of their Kubernetes clusters, etc. * Cloud provider: IaaS provisioning and management. - * Cluster creation and management: The community has developed numerous tools, such as minikube, kubeadm, bootkube, kube-aws, kops, kargo, kubernetes-anywhere, and so on. As can be seen from the diversity of tools, there is no one-size-fits-all solution for cluster deployment and management (e.g., upgrades). There's a spectrum of possible solutions, each with different tradeoffs. That said, common building blocks (e.g., [secure Kubelet registration](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/kubelet-tls-bootstrap.md)) and approaches (in particular, [self-hosting](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/self-hosted-kubernetes.md#what-is-self-hosted)) would reduce the amount of custom orchestration needed in such tools. - -We would like to see the ecosystem build and/or integrate solutions to fill these needs. + * Cluster creation and management: The community has developed + numerous tools, such as minikube, kubeadm, bootkube, kube-aws, + kops, kargo, kubernetes-anywhere, and so on. As can be seen from + the diversity of tools, there is no one-size-fits-all solution + for cluster deployment and management (e.g., upgrades). There's + a spectrum of possible solutions, each with different + tradeoffs. That said, common building blocks (e.g., [secure + Kubelet + registration](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/kubelet-tls-bootstrap.md)) + and approaches (in particular, + [self-hosting](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/self-hosted-kubernetes.md#what-is-self-hosted)) + would reduce the amount of custom orchestration needed in such + tools. + +We would like to see the ecosystem build and/or integrate solutions to +fill these needs. Eventually, most Kubernetes development should fall in the ecosystem. ## Managing the matrix -Options, Configurable defaults, Extensions, Plug-ins, Add-ons, Provider-specific functionality, Version skew, Feature discovery, and Dependency management. - -Kubernetes is not just an open-source toolkit, but is typically consumed as a running, easy-to-run, or ready-to-run cluster or service. We would like most users and use cases to be able to use stock upstream releases. This means Kubernetes needs sufficient extensibility without rebuilding to handle such use cases. - -While gaps in extensibility are the primary drivers of code forks and gaps in upstream cluster lifecycle management solutions are currently the primary drivers of the proliferation of Kubernetes distributions, the existence of optional features (e.g., alpha APIs, provider-specific APIs), configurability, pluggability, and extensibility make the concept inevitable. - -However, to make it possible for users to deploy and manage their applications and for developers to build Kubernetes extensions on/for arbitrary Kubernetes clusters, they must be able to make assumptions about what a cluster or distribution of Kubernetes provides. Where functionality falls out of these base assumptions, there needs to be a way to discover what functionality is available and to express functionality requirements (dependencies) for usage. - -Cluster components, including add-ons, should be registered via the [component registration API](https://github.com/kubernetes/kubernetes/issues/18610) and discovered via /componentstatuses. - -Enabled built-in APIs, aggregated APIs, and registered third-party resources should be discoverable via the discovery and OpenAPI (swagger.json) endpoints. As mentioned above, cloud-provider support for LoadBalancer-type services should be determined by whether the LoadBalancer API is present. - -Extensions and their options should be registered via FooClass resources, similar to [StorageClass](https://github.com/kubernetes/kubernetes/blob/master/pkg/apis/storage/v1beta1/types.go#L31), but with parameter descriptions, types (e.g., integer vs string), constraints (e.g., range or regexp) for validation, and default values, with a reference to fooClassName from the extended API. These APIs should also configure/expose the presence of related features, such as dynamic volume provisioning (indicated by a non-empty storageclass.provisioner field), as well as identifying the responsible [controller](https://github.com/kubernetes/kubernetes/issues/31571). We need to add such APIs for at least scheduler classes, ingress controller classes, flex volume classes, and compute resource classes (e.g., GPUs, other accelerators). - -Assuming we transitioned existing network-attached volume sources to flex volumes, this approach would cover volume sources. In the future, the API should provide only [general-purpose abstractions](https://docs.google.com/document/d/1QVxD---9tHXYj8c_RayLY9ClrFpqfuejN7p0vtv2kW0/edit#heading=h.mij1ubfelvar), even if, as with LoadBalancer services, the abstractions are not implemented in all environments (i.e., the API does not need to cater to the lowest common denominator). - -NIY: We also need to develop mechanisms for registering and discovering the following: +Options, Configurable defaults, Extensions, Plug-ins, Add-ons, +Provider-specific functionality, Version skew, Feature discovery, and +Dependency management. + +Kubernetes is not just an open-source toolkit, but is typically +consumed as a running, easy-to-run, or ready-to-run cluster or +service. We would like most users and use cases to be able to use +stock upstream releases. This means Kubernetes needs sufficient +extensibility without rebuilding to handle such use cases. + +While gaps in extensibility are the primary drivers of code forks and +gaps in upstream cluster lifecycle management solutions are currently +the primary drivers of the proliferation of Kubernetes distributions, +the existence of optional features (e.g., alpha APIs, +provider-specific APIs), configurability, pluggability, and +extensibility make the concept inevitable. + +However, to make it possible for users to deploy and manage their +applications and for developers to build Kubernetes extensions on/for +arbitrary Kubernetes clusters, they must be able to make assumptions +about what a cluster or distribution of Kubernetes provides. Where +functionality falls out of these base assumptions, there needs to be a +way to discover what functionality is available and to express +functionality requirements (dependencies) for usage. + +Cluster components, including add-ons, should be registered via the +[component registration +API](https://github.com/kubernetes/kubernetes/issues/18610) and +discovered via /componentstatuses. + +Enabled built-in APIs, aggregated APIs, and registered third-party +resources should be discoverable via the discovery and OpenAPI +(swagger.json) endpoints. As mentioned above, cloud-provider support +for LoadBalancer-type services should be determined by whether the +LoadBalancer API is present. + +Extensions and their options should be registered via FooClass +resources, similar to +[StorageClass](https://github.com/kubernetes/kubernetes/blob/master/pkg/apis/storage/v1beta1/types.go#L31), +but with parameter descriptions, types (e.g., integer vs string), +constraints (e.g., range or regexp) for validation, and default +values, with a reference to fooClassName from the extended API. These +APIs should also configure/expose the presence of related features, +such as dynamic volume provisioning (indicated by a non-empty +storageclass.provisioner field), as well as identifying the +responsible +[controller](https://github.com/kubernetes/kubernetes/issues/31571). We +need to add such APIs for at least scheduler classes, ingress +controller classes, flex volume classes, and compute resource classes +(e.g., GPUs, other accelerators). + +Assuming we transitioned existing network-attached volume sources to +flex volumes, this approach would cover volume sources. In the future, +the API should provide only [general-purpose +abstractions](https://docs.google.com/document/d/1QVxD---9tHXYj8c_RayLY9ClrFpqfuejN7p0vtv2kW0/edit#heading=h.mij1ubfelvar), +even if, as with LoadBalancer services, the abstractions are not +implemented in all environments (i.e., the API does not need to cater +to the lowest common denominator). + +NIY: We also need to develop mechanisms for registering and +discovering the following: * Admission-control plugins and hooks (including for built-in APIs) @@ -473,43 +857,93 @@ NIY: We also need to develop mechanisms for registering and discovering the foll * Initializers and finalizers -* [Scheduler extensions](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/scheduler_extender.md) +* [Scheduler + extensions](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/scheduler_extender.md) -* Node labels and [cluster topology](https://github.com/kubernetes/kubernetes/issues/41442) (topology classes?) +* Node labels and [cluster + topology](https://github.com/kubernetes/kubernetes/issues/41442) + (topology classes?) -NIY: Activation/deactivation of both individual APIs and finer-grain features could be addressed by the following mechanisms: +NIY: Activation/deactivation of both individual APIs and finer-grain +features could be addressed by the following mechanisms: -* [The configuration for all components is being converted from command-line flags to versioned configuration.](https://github.com/kubernetes/kubernetes/issues/12245) +* [The configuration for all components is being converted from + command-line flags to versioned + configuration.](https://github.com/kubernetes/kubernetes/issues/12245) -* [We intend to store most of that configuration data in ](https://github.com/kubernetes/kubernetes/issues/1627)[ConfigMap](https://github.com/kubernetes/kubernetes/issues/1627)[s, to facilitate dynamic reconfiguration, progressive rollouts, and introspectability.](https://github.com/kubernetes/kubernetes/issues/1627) +* [We intend to store most of that configuration data in + ](https://github.com/kubernetes/kubernetes/issues/1627)[ConfigMap](https://github.com/kubernetes/kubernetes/issues/1627)[s, + to facilitate dynamic reconfiguration, progressive rollouts, and + introspectability.](https://github.com/kubernetes/kubernetes/issues/1627) -* [Configuration common to all/multiple components should be factored out into its own configuration object(s).](https://github.com/kubernetes/kubernetes/issues/19831) This should include the [feature-gate mechanism](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/runtimeconfig.md). +* [Configuration common to all/multiple components should be factored + out into its own configuration + object(s).](https://github.com/kubernetes/kubernetes/issues/19831) + This should include the [feature-gate + mechanism](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/runtimeconfig.md). -* An API should be added for semantically meaningful settings, such as the default length of time to wait before deleting pods on unresponsive nodes. +* An API should be added for semantically meaningful settings, such as + the default length of time to wait before deleting pods on + unresponsive nodes. -NIY: The problem of [version-skewed operation](https://github.com/kubernetes/kubernetes/issues/4855), for features dependent on upgrades of multiple components (including replicas of the same component in HA clusters), should be addressed by: +NIY: The problem of [version-skewed +operation](https://github.com/kubernetes/kubernetes/issues/4855), for +features dependent on upgrades of multiple components (including +replicas of the same component in HA clusters), should be addressed +by: 1. Creating flag gates for all new such features, -2. Always disabling the features by default in the first minor release in which they appear, +2. Always disabling the features by default in the first minor release + in which they appear, 3. Providing configuration patches to enable the features, and 4. Enabling them by default in the next minor release. -NIY: We additionally need a mechanism to [warn about out of date nodes](https://github.com/kubernetes/kubernetes/issues/23874), and/or potentially prevent master upgrades (other than to patch releases) until/unless the nodes have been upgraded. - -NIY: [Field-level versioning](https://github.com/kubernetes/kubernetes/issues/34508) would facilitate solutions to bulk activation of new and/or alpha API fields, prevention of clobbering of new fields by poorly written out-of-date clients, and evolution of non-alpha APIs without a proliferation of full-fledged API definitions. - -The Kubernetes API server silently ignores unsupported resource fields and query parameters, but not unknown/unregistered APIs (note that unimplemented/inactive APIs should be disabled). This can facilitate the reuse of configuration across clusters of multiple releases, but more often leads to surprises. Kubectl supports optional validation using the Swagger/OpenAPI specification from the server. Such optional validation should be [provided by the server](https://github.com/kubernetes/kubernetes/issues/5889) (NIY). Additionally, shared resource manifests should specify the minimum required Kubernetes release, for user convenience, which could potentially be verified by kubectl and other clients. - -Additionally, unsatisfiable Pod scheduling constraints and PersistentVolumeClaim criteria silently go unmet, which can useful as demand signals to automatic provisioners, but also makes the system more error prone. It should be possible to configure rejection of unsatisfiable requests, using FooClass-style APIs, as described above ([NIY](https://github.com/kubernetes/kubernetes/issues/17324)). - -The Service Catalog mechanism (NIY) should make it possible to assert the existence of application-level services, such as S3-compatible cluster storage. +NIY: We additionally need a mechanism to [warn about out of date +nodes](https://github.com/kubernetes/kubernetes/issues/23874), and/or +potentially prevent master upgrades (other than to patch releases) +until/unless the nodes have been upgraded. + +NIY: [Field-level +versioning](https://github.com/kubernetes/kubernetes/issues/34508) +would facilitate solutions to bulk activation of new and/or alpha API +fields, prevention of clobbering of new fields by poorly written +out-of-date clients, and evolution of non-alpha APIs without a +proliferation of full-fledged API definitions. + +The Kubernetes API server silently ignores unsupported resource fields +and query parameters, but not unknown/unregistered APIs (note that +unimplemented/inactive APIs should be disabled). This can facilitate +the reuse of configuration across clusters of multiple releases, but +more often leads to surprises. Kubectl supports optional validation +using the Swagger/OpenAPI specification from the server. Such optional +validation should be [provided by the +server](https://github.com/kubernetes/kubernetes/issues/5889) +(NIY). Additionally, shared resource manifests should specify the +minimum required Kubernetes release, for user convenience, which could +potentially be verified by kubectl and other clients. + +Additionally, unsatisfiable Pod scheduling constraints and +PersistentVolumeClaim criteria silently go unmet, which can useful as +demand signals to automatic provisioners, but also makes the system +more error prone. It should be possible to configure rejection of +unsatisfiable requests, using FooClass-style APIs, as described above +([NIY](https://github.com/kubernetes/kubernetes/issues/17324)). + +The Service Catalog mechanism (NIY) should make it possible to assert +the existence of application-level services, such as S3-compatible +cluster storage. ## Layering of the system as it relates to security -In order to properly secure a Kubernetes cluster and enable [safe extension](https://github.com/kubernetes/kubernetes/issues/17456), a few fundamental concepts need to be defined and agreed on by the components of the system. It’s best to think of Kubernetes as a series of rings from a security perspective, with each layer granting the successive layer capabilities to act. +In order to properly secure a Kubernetes cluster and enable [safe +extension](https://github.com/kubernetes/kubernetes/issues/17456), a +few fundamental concepts need to be defined and agreed on by the +components of the system. It’s best to think of Kubernetes as a series +of rings from a security perspective, with each layer granting the +successive layer capabilities to act. 1. One or more data storage systems (etcd) for the nuclear APIs @@ -517,43 +951,76 @@ In order to properly secure a Kubernetes cluster and enable [safe extension](htt 3. APIs for highly trusted resources (system policies) -4. Delegated trust APIs and controllers (users grant access to the API / controller to perform actions on their behalf) either at the cluster scope or smaller scopes - -5. Untrusted / scoped APIs and controllers and user workloads that run at various scopes - -When a lower layer depends on a higher layer, it collapses the security model and makes defending the system more complicated - an administrator may *choose* to do so to gain operational simplicity, but that must be a conscious choice. A simple example is etcd: any component that can write data to etcd is now root on the entire cluster, and any actor that can corrupt a highly trusted resource can almost certainly escalate. It is useful to divide the layers above into separate sets of machines for each layer of processes (etcd -> apiservers + controllers -> nuclear security extensions -> delegated extensions -> user workloads), even if some may be collapsed in practice. - -If the layers described above define concentric circles, then it should also be possible for overlapping or independent circles to exist - for instance, administrators may choose an alternative secret storage solution that cluster workloads have access to yet the platform does not implicitly have access to. The point of intersection for these circles tends to be the machines that run the workloads, and nodes must have no more privileges than those required for proper function. - -Finally, adding a new capability via extension at any layer should follow best practices for communicating the impact of that action. - -When a capability is added to the system via extension, what purpose does it have? +4. Delegated trust APIs and controllers (users grant access to the API + / controller to perform actions on their behalf) either at the + cluster scope or smaller scopes + +5. Untrusted / scoped APIs and controllers and user workloads that run + at various scopes + +When a lower layer depends on a higher layer, it collapses the +security model and makes defending the system more complicated - an +administrator may *choose* to do so to gain operational simplicity, +but that must be a conscious choice. A simple example is etcd: any +component that can write data to etcd is now root on the entire +cluster, and any actor that can corrupt a highly trusted resource can +almost certainly escalate. It is useful to divide the layers above +into separate sets of machines for each layer of processes (etcd -> +apiservers + controllers -> nuclear security extensions -> delegated +extensions -> user workloads), even if some may be collapsed in +practice. + +If the layers described above define concentric circles, then it +should also be possible for overlapping or independent circles to +exist - for instance, administrators may choose an alternative secret +storage solution that cluster workloads have access to yet the +platform does not implicitly have access to. The point of intersection +for these circles tends to be the machines that run the workloads, and +nodes must have no more privileges than those required for proper +function. + +Finally, adding a new capability via extension at any layer should +follow best practices for communicating the impact of that action. + +When a capability is added to the system via extension, what purpose +does it have? * Make the system more secure -* Enable a new "production quality" API for consumption by everyone in the cluster +* Enable a new "production quality" API for consumption by everyone in + the cluster * Automate a common task across a subset of the cluster -* Run a hosted workload that offers apis to consumers (spark, a database, etcd) +* Run a hosted workload that offers apis to consumers (spark, a + database, etcd) * These fall into three major groups: - * Required for the cluster (and hence must run close to the core, and cause operational tradeoffs in the presence of failure) + * Required for the cluster (and hence must run close to the core, + and cause operational tradeoffs in the presence of failure) * Exposed to all cluster users (must be properly tenanted) - * Exposed to a subset of cluster users (runs more like traditional "app" workloads) + * Exposed to a subset of cluster users (runs more like traditional + "app" workloads) -If an administrator can easily be tricked into installing a new cluster level security rule during extension, then the layering is compromised and the system is vulnerable. +If an administrator can easily be tricked into installing a new +cluster level security rule during extension, then the layering is +compromised and the system is vulnerable. ## Next Steps -In addition to completing the technical mechanisms described and/or implied above, we need to apply the principles in this document to a set of more focused documents that answer specific practical questions. Here are some suggested documents and the questions they answer: +In addition to completing the technical mechanisms described and/or +implied above, we need to apply the principles in this document to a +set of more focused documents that answer specific practical +questions. Here are some suggested documents and the questions they +answer: * **Kubernetes API Conventions For Extension API Developers** - * Audience: someone planning to build a Kubernetes-like API extension (TPR or Aggregated API, etc…) + * Audience: someone planning to build a Kubernetes-like API + extension (TPR or Aggregated API, etc…) * Answers Questions: @@ -567,33 +1034,42 @@ In addition to completing the technical mechanisms described and/or implied abov * **Kubernetes API Conventions For CLI and UI Developers** - * Audience: someone working on kubectl, dashboard, or another CLI or UI. + * Audience: someone working on kubectl, dashboard, or another CLI + or UI. * Answers Questions: - * How can I show a user which objects subordinate and should normally be hidden + * How can I show a user which objects subordinate and should + normally be hidden -* **Required and Optional Behaviors for Kubernetes Distributions/Services** +* **Required and Optional Behaviors for Kubernetes + Distributions/Services** - * See also the [certification issue](https://github.com/kubernetes/community/issues/432) + * See also the [certification + issue](https://github.com/kubernetes/community/issues/432) * I just implemented a Hosted version of the Kubernetes API. * Which API groups, versions and Kinds do I have to expose? - * Can I provide my own logging, auth, auditing integrations and so on? + * Can I provide my own logging, auth, auditing integrations + and so on? * Can I call it Kubernetes if it just has pods but no services? * I'm packaging up a curated Kubernetes distro and selling it. - * Which API groups, versions and Kinds do I have to expose in order to call it Kubernetes. + * Which API groups, versions and Kinds do I have to expose in + order to call it Kubernetes. * **Assumptions/conventions for application developers** * I want to write portable config across on-prem and several clouds. - * What API types and behaviors should I assume are always present and fully abstracted. When should I try to detect a feature (by asking the user what cloud provider they have, or in the future a feature discovery API). + * What API types and behaviors should I assume are always + present and fully abstracted. When should I try to detect a + feature (by asking the user what cloud provider they have, + or in the future a feature discovery API). * **Kubernetes Security Models ** |
