diff options
| author | Hannes Hörl <hhorl@pivotal.io> | 2019-02-14 15:10:52 +0000 |
|---|---|---|
| committer | Maria Ntalla <mntalla@pivotal.io> | 2019-02-14 15:10:52 +0000 |
| commit | f87e0f084c2636a8ba8ed8674faa25b59087c533 (patch) | |
| tree | 17349ca8ede8d82b29b36ea8ed356a024a1d445d /contributors | |
| parent | b79961c65f4b19ebbfbd45b1c0d39ba4c71dd249 (diff) | |
| parent | 665611b75a772677978e6a544690d27a20d75565 (diff) | |
Merge remote-tracking branch 'upstream/master' into contrib-test-debug
Signed-off-by: Maria Ntalla <mntalla@pivotal.io>
Diffstat (limited to 'contributors')
154 files changed, 9741 insertions, 8378 deletions
diff --git a/contributors/design-proposals/OWNERS b/contributors/design-proposals/OWNERS index c6a712b8..7bda97c6 100644 --- a/contributors/design-proposals/OWNERS +++ b/contributors/design-proposals/OWNERS @@ -1,3 +1,5 @@ +# See the OWNERS docs at https://go.k8s.io/owners + reviewers: - brendandburns - dchen1107 diff --git a/contributors/design-proposals/api-machinery/OWNERS b/contributors/design-proposals/api-machinery/OWNERS index 0df76e64..ef142b0f 100644 --- a/contributors/design-proposals/api-machinery/OWNERS +++ b/contributors/design-proposals/api-machinery/OWNERS @@ -1,3 +1,5 @@ +# See the OWNERS docs at https://go.k8s.io/owners + reviewers: - sig-api-machinery-leads approvers: diff --git a/contributors/design-proposals/api-machinery/add-new-patchStrategy-to-clear-fields-not-present-in-patch.md b/contributors/design-proposals/api-machinery/add-new-patchStrategy-to-clear-fields-not-present-in-patch.md index 5f035f9b..d2b894d2 100644 --- a/contributors/design-proposals/api-machinery/add-new-patchStrategy-to-clear-fields-not-present-in-patch.md +++ b/contributors/design-proposals/api-machinery/add-new-patchStrategy-to-clear-fields-not-present-in-patch.md @@ -44,7 +44,7 @@ that does not contain a discriminator. |---|---| | non-inlined non-discriminated union | Yes | | non-inlined discriminated union | Yes | -| inlined union with [patchMergeKey](/contributors/devel/api-conventions.md#strategic-merge-patch) only | Yes | +| inlined union with [patchMergeKey](/contributors/devel/sig-architecture/api-conventions.md#strategic-merge-patch) only | Yes | | other inlined union | No | For the inlined union with patchMergeKey, we move the tag to the parent struct's instead of diff --git a/contributors/design-proposals/api-machinery/aggregated-api-servers.md b/contributors/design-proposals/api-machinery/aggregated-api-servers.md index d436c6b9..3c3310b0 100644 --- a/contributors/design-proposals/api-machinery/aggregated-api-servers.md +++ b/contributors/design-proposals/api-machinery/aggregated-api-servers.md @@ -80,7 +80,7 @@ There are two configurations in which it makes sense to run `kube-aggregator`. `api.mycompany.com/v1/grobinators` from different apiservers. This restriction allows us to limit the scope of `kube-aggregator` to a manageable level. * Follow API conventions: APIs exposed by every API server should adhere to [kubernetes API - conventions](../../devel/api-conventions.md). + conventions](/contributors/devel/sig-architecture/api-conventions.md). * Support discovery API: Each API server should support the kubernetes discovery API (list the supported groupVersions at `/apis` and list the supported resources at `/apis/<groupVersion>/`) diff --git a/contributors/design-proposals/api-machinery/customresource-conversion-webhook.md b/contributors/design-proposals/api-machinery/customresource-conversion-webhook.md index 2b4aeb25..54991fd6 100644 --- a/contributors/design-proposals/api-machinery/customresource-conversion-webhook.md +++ b/contributors/design-proposals/api-machinery/customresource-conversion-webhook.md @@ -148,7 +148,7 @@ in *CRD v1* (apiextensions.k8s.io/v1), there will be only version list with no t #### Alternative approaches considered -First a defaulting approach is considered which per-version fields would be defaulted to top level fields. but that breaks backward incompatible change; Quoting from API [guidelines](https://github.com/kubernetes/community/blob/master/contributors/devel/api_changes.md#backward-compatibility-gotchas): +First a defaulting approach is considered which per-version fields would be defaulted to top level fields. but that breaks backward incompatible change; Quoting from API [guidelines](/contributors/devel/sig-architecture/api_changes.md#backward-compatibility-gotchas): > A single feature/property cannot be represented using multiple spec fields in the same API version simultaneously diff --git a/contributors/design-proposals/api-machinery/extending-api.md b/contributors/design-proposals/api-machinery/extending-api.md index f5e2de6a..9a0c9263 100644 --- a/contributors/design-proposals/api-machinery/extending-api.md +++ b/contributors/design-proposals/api-machinery/extending-api.md @@ -31,7 +31,7 @@ The `Version` object currently only specifies: ## Expectations about third party objects Every object that is added to a third-party Kubernetes object store is expected -to contain Kubernetes compatible [object metadata](../devel/api-conventions.md#metadata). +to contain Kubernetes compatible [object metadata](/contributors/devel/sig-architecture/api-conventions.md#metadata). This requirement enables the Kubernetes API server to provide the following features: * Filtering lists of objects via label queries. diff --git a/contributors/design-proposals/apps/OWNERS b/contributors/design-proposals/apps/OWNERS index 12723930..f36b2fcd 100644 --- a/contributors/design-proposals/apps/OWNERS +++ b/contributors/design-proposals/apps/OWNERS @@ -1,3 +1,5 @@ +# See the OWNERS docs at https://go.k8s.io/owners + reviewers: - sig-apps-leads approvers: diff --git a/contributors/design-proposals/apps/controller_history.md b/contributors/design-proposals/apps/controller_history.md index 2e1213ad..d7140bea 100644 --- a/contributors/design-proposals/apps/controller_history.md +++ b/contributors/design-proposals/apps/controller_history.md @@ -343,7 +343,7 @@ provided that the associated version tracking information is updated as well. current target Object state. ### Kubernetes Upgrades -During the upgrade process form a version of Kubernetes that does not support +During the upgrade process from a version of Kubernetes that does not support controller history to a version that does, controllers that implement history based update mechanisms may find that they have specification type Objects with no history and with generated Objects. For instance, a StatefulSet may exist diff --git a/contributors/design-proposals/apps/deployment.md b/contributors/design-proposals/apps/deployment.md index 3f757144..30392c4a 100644 --- a/contributors/design-proposals/apps/deployment.md +++ b/contributors/design-proposals/apps/deployment.md @@ -132,7 +132,7 @@ The DeploymentController will process Deployments and crud ReplicaSets. For each creation or update for a Deployment, it will: 1. Find all RSs (ReplicaSets) whose label selector is a superset of DeploymentSpec.Selector. - - For now, we will do this in the client - list all RSs and then filter the + - For now, we will do this in the client - list all RSs and then filter out the ones we want. Eventually, we want to expose this in the API. 2. The new RS can have the same selector as the old RS and hence we add a unique selector to all these RSs (and the corresponding label to their pods) to ensure @@ -153,7 +153,7 @@ For each creation or update for a Deployment, it will: is same as hash of DeploymentSpec.PodTemplateSpec. If it exists already, then this is the RS that will be ramped up. If there is no such RS, then we create a new one using DeploymentSpec and then add a "pod-template-hash" label - to it. The size of the new RS depends on the used DeploymentStrategyType + to it. The size of the new RS depends on the used DeploymentStrategyType. 4. Scale up the new RS and scale down the olds ones as per the DeploymentStrategy. Raise events appropriately (both in case of failure or success). 5. Go back to step 1 unless the new RS has been ramped up to desired replicas diff --git a/contributors/design-proposals/apps/indexed-job.md b/contributors/design-proposals/apps/indexed-job.md index 98d65bdb..9b142b0f 100644 --- a/contributors/design-proposals/apps/indexed-job.md +++ b/contributors/design-proposals/apps/indexed-job.md @@ -144,7 +144,7 @@ Also, as a shortcut, for small worklists, it can be included in an annotation on the Job object, which is then exposed as a volume in the pod via the downward API. -### What Varies Between Pods of a Job +### What Varies Between Pods of an indexed-job Pods need to differ in some way to do something different. (They do not differ in the work-queue style of Job, but that style has ease-of-use issues). @@ -184,7 +184,7 @@ Experience in [similar systems](http://static.googleusercontent.com/media/resear has shown this model to be applicable to a very broad range of problems, despite this restriction. -Therefore we to allow pods in the same Job to differ **only** in the following +Therefore we want to allow pods in the same Job to differ **only** in the following aspects: - command line - environment variables diff --git a/contributors/design-proposals/apps/stateful-apps.md b/contributors/design-proposals/apps/stateful-apps.md index 286f57da..dd7fddbb 100644 --- a/contributors/design-proposals/apps/stateful-apps.md +++ b/contributors/design-proposals/apps/stateful-apps.md @@ -171,7 +171,7 @@ Future work: * Allow more sophisticated identity assignment - instead of `{name}-{0 - replicas-1}`, allow subsets and complex indexing. -### Controller behavior. +### Controller behavior When a StatefulSet is scaled up, the controller must create both pods and supporting resources for each new identity. The controller must create supporting resources for the pod before creating the diff --git a/contributors/design-proposals/architecture/OWNERS b/contributors/design-proposals/architecture/OWNERS index 87364abb..3baa861d 100644 --- a/contributors/design-proposals/architecture/OWNERS +++ b/contributors/design-proposals/architecture/OWNERS @@ -1,3 +1,5 @@ +# See the OWNERS docs at https://go.k8s.io/owners + reviewers: - sig-architecture-leads - jbeda diff --git a/contributors/design-proposals/architecture/architecture.md b/contributors/design-proposals/architecture/architecture.md index e9374235..1fad518e 100644 --- a/contributors/design-proposals/architecture/architecture.md +++ b/contributors/design-proposals/architecture/architecture.md @@ -217,7 +217,7 @@ agent. Each node runs a container runtime, which is responsible for downloading images and running containers. Kubelet does not link in the base container runtime. Instead, we're defining a -[Container Runtime Interface](/contributors/devel/container-runtime-interface.md) to control the +[Container Runtime Interface](/contributors/devel/sig-node/container-runtime-interface.md) to control the underlying runtime and facilitate pluggability of that layer. This decoupling is needed in order to maintain clear component boundaries, facilitate testing, and facilitate pluggability. Runtimes supported today, either upstream or by forks, include at least docker (for Linux and Windows), diff --git a/contributors/design-proposals/architecture/declarative-application-management.md b/contributors/design-proposals/architecture/declarative-application-management.md index a5fbdf24..f1419200 100644 --- a/contributors/design-proposals/architecture/declarative-application-management.md +++ b/contributors/design-proposals/architecture/declarative-application-management.md @@ -6,7 +6,7 @@ Most users will deploy a combination of applications they build themselves, also In the case of the latter, users sometimes have the choice of using hosted SaaS products that are entirely managed by the service provider and are therefore opaque, also known as **_blackbox_** *services*. However, they often run open-source components themselves, and must configure, deploy, scale, secure, monitor, update, and otherwise manage the lifecycles of these **_whitebox_** *COTS applications*. -This document proposes a unified method of managing both bespoke and off-the-shelf applications declaratively using the same tools and application operator workflow, while leveraging developer-friendly CLIs and UIs, streamlining common tasks, and avoiding common pitfalls. The approach is based on observations of several dozen configuration projects and hundreds of configured applications within Google and in the Kubernetes ecosystem, as well as quantitative analysis of Borg configurations and work on the Kubernetes [system architecture](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/architecture.md), [API](https://github.com/kubernetes/community/blob/master/contributors/devel/api-conventions.md), and command-line tool ([kubectl](https://github.com/kubernetes/community/wiki/Roadmap:-kubectl)). +This document proposes a unified method of managing both bespoke and off-the-shelf applications declaratively using the same tools and application operator workflow, while leveraging developer-friendly CLIs and UIs, streamlining common tasks, and avoiding common pitfalls. The approach is based on observations of several dozen configuration projects and hundreds of configured applications within Google and in the Kubernetes ecosystem, as well as quantitative analysis of Borg configurations and work on the Kubernetes [system architecture](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/architecture.md), [API](/contributors/devel/sig-architecture/api-conventions.md), and command-line tool ([kubectl](https://github.com/kubernetes/community/wiki/Roadmap:-kubectl)). The central idea is that a toolbox of composable configuration tools should manipulate configuration data in the form of declarative API resource specifications, which serve as a [declarative data model](https://docs.google.com/document/d/1RmHXdLhNbyOWPW_AtnnowaRfGejw-qlKQIuLKQWlwzs/edit#), not express configuration as code or some other representation that is restrictive, non-standard, and/or difficult to manipulate. @@ -252,7 +252,7 @@ Deployment of bespoke applications involves multiple steps: Step 1, building the image, is out of scope for Kubernetes. Step 3 is covered by kubectl apply. Some tools in the ecosystem, such as [Draft](https://github.com/Azure/draft), combine the 3 steps. -Kubectl contains ["generator" commands](https://github.com/kubernetes/community/blob/master/contributors/devel/kubectl-conventions.md#generators), such as [kubectl run](https://kubernetes.io/docs/user-guide/kubectl/v1.7/#run), expose, various create commands, to create commonly needed Kubernetes resource configurations. However, they also don’t help users understand current best practices and conventions, such as proper label and annotation usage. This is partly a matter of updating them and partly one of making the generated resources suitable for consumption by new users. Options supporting declarative output, such as dry run, local, export, etc., don’t currently produce clean, readable, reusable resource specifications ([example](https://blog.heptio.com/using-kubectl-to-jumpstart-a-yaml-file-heptioprotip-6f5b8a63a3ea))**.** We should clean them up. +Kubectl contains ["generator" commands](/contributors/devel/sig-cli/kubectl-conventions.md#generators), such as [kubectl run](https://kubernetes.io/docs/user-guide/kubectl/v1.7/#run), expose, various create commands, to create commonly needed Kubernetes resource configurations. However, they also don’t help users understand current best practices and conventions, such as proper label and annotation usage. This is partly a matter of updating them and partly one of making the generated resources suitable for consumption by new users. Options supporting declarative output, such as dry run, local, export, etc., don’t currently produce clean, readable, reusable resource specifications ([example](https://blog.heptio.com/using-kubectl-to-jumpstart-a-yaml-file-heptioprotip-6f5b8a63a3ea))**.** We should clean them up. Openshift provides a tool, [oc new-app](https://docs.openshift.com/enterprise/3.1/dev_guide/new_app.html), that can pull source-code templates, [detect](https://github.com/kubernetes/kubernetes/issues/14801)[ application types](https://github.com/kubernetes/kubernetes/issues/14801) and create Kubernetes resources for applications from source and from container images. [podex](https://github.com/kubernetes/contrib/tree/master/podex) was built to extract basic information from an image to facilitate creation of default Kubernetes resources, but hasn’t been kept up to date. Similar resource generation tools would be useful for getting started, and even just [validating that the image really exists](https://github.com/kubernetes/kubernetes/issues/12428) would reduce user error. diff --git a/contributors/design-proposals/architecture/principles.md b/contributors/design-proposals/architecture/principles.md index ae98d660..7bb548d2 100644 --- a/contributors/design-proposals/architecture/principles.md +++ b/contributors/design-proposals/architecture/principles.md @@ -4,7 +4,7 @@ Principles to follow when extending Kubernetes. ## API -See also the [API conventions](../../devel/api-conventions.md). +See also the [API conventions](/contributors/devel/sig-architecture/api-conventions.md). * All APIs should be declarative. * API objects should be complementary and composable, not opaque wrappers. diff --git a/contributors/design-proposals/architecture/resource-management.md b/contributors/design-proposals/architecture/resource-management.md index 5b6d66b8..6573f939 100644 --- a/contributors/design-proposals/architecture/resource-management.md +++ b/contributors/design-proposals/architecture/resource-management.md @@ -16,7 +16,7 @@ In Kubernetes, declarative abstractions are primary, rather than layered on top Kubernetes supports declarative control by recording user intent as the desired state in its API resources. This enables a single API schema for each resource to serve as a declarative data model, as both a source and a target for automated components (e.g., autoscalers), and even as an intermediate representation for resource transformations prior to instantiation. -The intent is carried out by asynchronous [controllers](https://github.com/kubernetes/community/blob/master/contributors/devel/controllers.md), which interact through the Kubernetes API. Controllers don’t access the state store, etcd, directly, and don’t communicate via private direct APIs. Kubernetes itself does expose some features similar to key-value stores such as etcd and [Zookeeper](https://zookeeper.apache.org/), however, in order to facilitate centralized [state and configuration management and distribution](https://sysgears.com/articles/managing-configuration-of-distributed-system-with-apache-zookeeper/) to decentralized components. +The intent is carried out by asynchronous [controllers](/contributors/devel/sig-api-machinery/controllers.md), which interact through the Kubernetes API. Controllers don’t access the state store, etcd, directly, and don’t communicate via private direct APIs. Kubernetes itself does expose some features similar to key-value stores such as etcd and [Zookeeper](https://zookeeper.apache.org/), however, in order to facilitate centralized [state and configuration management and distribution](https://sysgears.com/articles/managing-configuration-of-distributed-system-with-apache-zookeeper/) to decentralized components. Controllers continuously strive to make the observed state match the desired state, and report back their status to the apiserver asynchronously. All of the state, desired and observed, is made visible through the API to users and to other controllers. The API resources serve as coordination points, common intermediate representation, and shared state. @@ -60,7 +60,7 @@ Most resources also contain the [desired state ](https://kubernetes.io/docs/conc A few other subresources (e.g., `/scale`), with their own API types, similarly enable distinct authorization policies for controllers, and also polymorphism, since the same subresource type may be implemented for multiple parent resource types. Where distinct authorization policies are not required, polymorphism may be achieved simply by convention, using patch, akin to duck typing. -Supported data formats include YAML, JSON, and protocol buffers.
+Supported data formats include YAML, JSON, and protocol buffers. Example resource: @@ -89,7 +89,7 @@ API groups may be exposed as a unified API surface while being served by distinc Each API server supports a custom [discovery API](https://github.com/kubernetes/client-go/blob/master/discovery/discovery_client.go) to enable clients to discover available API groups, versions, and types, and also [OpenAPI](https://kubernetes.io/blog/2016/12/kubernetes-supports-openapi/), which can be used to extract documentation and validation information about the resource types. -See the [Kubernetes API conventions](https://github.com/kubernetes/community/blob/master/contributors/devel/api-conventions.md ) for more details. +See the [Kubernetes API conventions](/contributors/devel/sig-architecture/api-conventions.md ) for more details. ## Resource semantics and lifecycle @@ -97,12 +97,12 @@ Each API resource undergoes [a common sequence of behaviors](https://kubernetes. 1. [Authentication](https://kubernetes.io/docs/admin/authentication/) 2. [Authorization](https://kubernetes.io/docs/admin/authorization/): [Built-in](https://kubernetes.io/docs/admin/authorization/rbac/) and/or [administrator-defined](https://kubernetes.io/docs/admin/authorization/webhook/) identity-based policies -3. [Defaulting](https://github.com/kubernetes/community/blob/master/contributors/devel/api-conventions.md#defaulting): API-version-specific default values are made explicit and persisted +3. [Defaulting](/contributors/devel/sig-architecture/api-conventions.md#defaulting): API-version-specific default values are made explicit and persisted 4. Conversion: The apiserver converts between the client-requested [API version](https://kubernetes.io/docs/concepts/overview/kubernetes-api/#API-versioning) and the version it uses to store each resource type in etcd 5. [Admission control](https://kubernetes.io/docs/admin/admission-controllers/): [Built-in](https://kubernetes.io/docs/admin/admission-controllers/) and/or [administrator-defined](https://kubernetes.io/docs/admin/extensible-admission-controllers/) resource-type-specific policies -6. [Validation](https://github.com/kubernetes/community/blob/master/contributors/devel/api-conventions.md#validation): Resource field values are validated. Other than the presence of required fields, the API resource schema is not currently validated, but optional validation may be added in the future +6. [Validation](/contributors/devel/sig-architecture/api-conventions.md#validation): Resource field values are validated. Other than the presence of required fields, the API resource schema is not currently validated, but optional validation may be added in the future 7. Idempotence: Resources are accessed via immutable client-provided, declarative-friendly names -8. [Optimistic concurrency](https://github.com/kubernetes/community/blob/master/contributors/devel/api-conventions.md#concurrency-control-and-consistency): Writes may specify a precondition that the **resourceVersion** last reported for a resource has not changed +8. [Optimistic concurrency](/contributors/devel/sig-architecture/api-conventions.md#concurrency-control-and-consistency): Writes may specify a precondition that the **resourceVersion** last reported for a resource has not changed 9. [Audit logging](https://kubernetes.io/docs/tasks/debug-application-cluster/audit/): Records the sequence of changes to each resource by all actors Additional behaviors are supported upon deletion: @@ -125,4 +125,4 @@ And get: Kubernetes API resource specifications are designed for humans to directly author and read as declarative configuration data, as well as to enable composable configuration tools and automated systems to manipulate them programmatically. We chose this simple approach of using literal API resource specifications for configuration, rather than other representations, because it was natural, given that we designed the API to support CRUD on declarative primitives. The API schema must already well defined, documented, and supported. With this approach, there’s no other representation to keep up to date with new resources and versions, or to require users to learn. [Declarative configuration](https://goo.gl/T66ZcD) is only one client use case; there are also CLIs (e.g., kubectl), UIs, deployment pipelines, etc. The user will need to interact with the system in terms of the API in these other scenarios, and knowledge of the API transfers to other clients and tools. Additionally, configuration, macro/substitution, and templating languages are generally more difficult to manipulate programmatically than pure data, and involve complexity/expressiveness tradeoffs that prevent one solution being ideal for all use cases. Such languages/tools could be layered over the native API schemas, if desired, but they should not assume exclusive control over all API fields, because doing so obstructs automation and creates undesirable coupling with the configuration ecosystem. -The Kubernetes Resource Model encourages separation of concerns by supporting multiple distinct configuration sources and preserving declarative intent while allowing automatically set attributes. Properties not explicitly declaratively managed by the user are free to be changed by other clients, enabling the desired state to be cooperatively determined by both users and systems. This is achieved by an operation, called [**Apply**](https://docs.google.com/document/d/1q1UGAIfmOkLSxKhVg7mKknplq3OTDWAIQGWMJandHzg/edit#heading=h.xgjl2srtytjt) ("make it so"), that performs a 3-way merge of the previous configuration, the new configuration, and the live state. A 2-way merge operation, called [strategic merge patch](https://github.com/kubernetes/community/blob/master/contributors/devel/strategic-merge-patch.md), enables patches to be expressed using the same schemas as the resources themselves. Such patches can be used to perform automated updates without custom mutation operations, common updates (e.g., container image updates), combinations of configurations of orthogonal concerns, and configuration customization, such as for overriding properties of variants. +The Kubernetes Resource Model encourages separation of concerns by supporting multiple distinct configuration sources and preserving declarative intent while allowing automatically set attributes. Properties not explicitly declaratively managed by the user are free to be changed by other clients, enabling the desired state to be cooperatively determined by both users and systems. This is achieved by an operation, called [**Apply**](https://docs.google.com/document/d/1q1UGAIfmOkLSxKhVg7mKknplq3OTDWAIQGWMJandHzg/edit#heading=h.xgjl2srtytjt) ("make it so"), that performs a 3-way merge of the previous configuration, the new configuration, and the live state. A 2-way merge operation, called [strategic merge patch](https:git.k8s.io/community/contributors/devel/sig-api-machinery/strategic-merge-patch.md), enables patches to be expressed using the same schemas as the resources themselves. Such patches can be used to perform automated updates without custom mutation operations, common updates (e.g., container image updates), combinations of configurations of orthogonal concerns, and configuration customization, such as for overriding properties of variants. diff --git a/contributors/design-proposals/auth/OWNERS b/contributors/design-proposals/auth/OWNERS index 3100c753..ef998d7e 100644 --- a/contributors/design-proposals/auth/OWNERS +++ b/contributors/design-proposals/auth/OWNERS @@ -1,3 +1,5 @@ +# See the OWNERS docs at https://go.k8s.io/owners + reviewers: - sig-auth-leads approvers: diff --git a/contributors/design-proposals/auth/apparmor.md b/contributors/design-proposals/auth/apparmor.md index a88154bb..5130a52d 100644 --- a/contributors/design-proposals/auth/apparmor.md +++ b/contributors/design-proposals/auth/apparmor.md @@ -63,7 +63,7 @@ and is supported on several # Alpha Design This section describes the proposed design for -[alpha-level](../devel/api_changes.md#alpha-beta-and-stable-versions) support, although +[alpha-level](/contributors/devel/sig-architecture/api_changes.md#alpha-beta-and-stable-versions) support, although additional features are described in [future work](#future-work). For AppArmor alpha support (targeted for Kubernetes 1.4) we will enable: @@ -268,7 +268,7 @@ already underway for Docker, called ## Container Runtime Interface Other container runtimes will likely add AppArmor support eventually, so the -[Container Runtime Interface](/contributors/devel/container-runtime-interface.md) (CRI) needs to be made compatible +[Container Runtime Interface](/contributors/devel/sig-node/container-runtime-interface.md) (CRI) needs to be made compatible with this design. The two important pieces are a way to report whether AppArmor is supported by the runtime, and a way to specify the profile to load (likely through the `LinuxContainerConfig`). diff --git a/contributors/design-proposals/auth/bound-service-account-tokens.md b/contributors/design-proposals/auth/bound-service-account-tokens.md index c9c6064d..961e17a2 100644 --- a/contributors/design-proposals/auth/bound-service-account-tokens.md +++ b/contributors/design-proposals/auth/bound-service-account-tokens.md @@ -143,7 +143,7 @@ field which the service account authenticator will validate. type TokenReviewSpec struct { // Token is the opaque bearer token. Token string - // Audiences is the identifier that the client identifies as. + // Audiences are the identifiers that the client identifies as. Audiences []string } ``` diff --git a/contributors/design-proposals/auth/encryption.md b/contributors/design-proposals/auth/encryption.md new file mode 100644 index 00000000..121e06b4 --- /dev/null +++ b/contributors/design-proposals/auth/encryption.md @@ -0,0 +1,443 @@ +# Encryption + +## Abstract + +The scope of this proposal is to ensure that resources can be encrypted at the +datastore layer with sufficient metadata support to enable integration with +multiple encryption providers and key rotation. Encryption will be optional for +any resource, but will be used by default for the Secret resource. Secrets are +already protected in transit via TLS. + +Full disk encryption of the volumes storing etcd data is already expected as +standard security hygiene. Adding the proposed encryption at the datastore +layer defends against malicious parties gaining access to: + +- etcd backups; or +- A running etcd instance without access to memory of the etcd process. + +Allowing sensitive data to be encrypted adheres to best practices as well as +other requirements such as HIPAA. + +## High level design + +Before a resource is written to etcd and after it is read, an encryption +provider will take the plaintext data and encrypt/decrypt it. These providers +will be able to be created and turned on depending on the users needs or +requirements and will adhere to an encryption interface. This interface will +provide the abstraction to allow various encryption mechanisms to be +implemented, as well as for the method of encryption to be rotated over time. + +For the first iteration, a default provider that handles encryption in-process +using a locally stored key on disk will be developed. + +## Kubernetes Storage Changes + +Kubernetes requires that an update that does not change the serialized form of +object not be persisted to etcd to prevent other components from seeing no-op +updates. + +This must be done within the Kubernetes storage interfaces - we will introduce a +new API to the Kube storage layer that transforms the serialized object into the +desired at-rest form and provides hints as to whether no-op updates should still +persist (when key rotation is in effect). + +```go +// ValueTransformer allows a string value to be transformed before being read from or written to the underlying store. The methods +// must be able to undo the transformation caused by the other. +type ValueTransformer interface { + // TransformFromStorage may transform the provided data from its underlying + // storage representation or return an error. Stale is true if the object + // on disk is stale (encrypted with an older key) and a write to etcd + // should be issued, even if the contents of the object have not changed. + TransformFromStorage([]byte) (data []byte, stale bool, err error) + // TransformToStorage may transform the provided data into the appropriate form in storage or return an error. + TransformToStorage([]byte) (data []byte, err error) +} +``` + +When the storage layer of Kubernetes is initialized for some resource, an +implementation of this interface that manages encryption will be passed down. +Other resources can use a no-op provider by default. + +## Encryption Provider + +An encryption provider implements the ValueTransformer interface. Out of the box +this proposal will implement encryption using a standard AES-GCM performing +AEAD, using the standard Go library for AES-GCM. + +Each encryption provider will have a unique string identifier to ensure +versioning of the ciphertext in etcd, and to allow future schemes to be added. + +During encryption, only a single provider is required. During decryption, +multiple providers or keys may be in use (when migrating from an older version +of a provider, or when rotating keys), and thus the ValueTransformer +implementation must be able to delegate to the appropriate provider. + +Note that the ValueTransformer is a general storage interface and not related to +encryption directly. The AES implementation linked below combines +ValueTransformer and encryption provider. + +### AES-GCM Encryption provider + +Implemented in [#41939](https://github.com/kubernetes/kubernetes/pull/41939). + +The simplest possible provider is an AES-GCM encrypter/decrypter using AEAD, +where we create a unique nonce on each new write to etcd, use that as the IV for +AES-GCM of the value (the JSON or protobuf data) along with a set of +authenticated data to create the ciphertext, and then on decryption use the +nonce and the authenticated data to decode. + +The provider will be assigned a versioned identifier to uniquely pair the +implementation with the data at rest, such as “k8s-aes-gcm-v1”. Any +implementation that attempts to decode data associated with this provider id +must follow a known structure and apply a specific algorithm. + +Various options for key generation and management are covered in the following +sections. The provider implements one of those schemes to retrieve a set of +keys. One is identified as the write key, all others are used to decrypt data +from previous keys. Keys must be rotated more often than every 2^32 writes. + +The provider will use the recommended Go defaults for all crypto settings +unless otherwise noted. We should use AES-256 keys (32 bytes). + +Process for transforming a value (object encoded as JSON or protobuf) to and +from stable storage will look like the following: + +Layout as written to etcd2 (json safe string only): +``` +NONCE := read(/dev/urandom) +PLAIN_TEXT := <VALUE> +AUTHENTICATED_DATA := ETCD_KEY +CIPHER_TEXT := aes_gcm_encrypt(KEY, IV:NONCE, PLAIN_TEXT, A:AUTHENTICATED_DATA) +BASE64_DATA := base64(<NONCE><CIPHER_TEXT>) +STORED_DATA := <PROVIDER>:<KEY_ID>:<BASE64_DATA> +``` + +Layout as written to etcd3 (bytes): +``` +NONCE := read(/dev/urandom) +PLAIN_TEXT := <VALUE> +AUTHENTICATED_DATA := ETCD_KEY +CIPHER_TEXT := aes_gcm_encrypt(KEY, IV:NONCE, PLAIN_TEXT, A:AUTHENTICATED_DATA) +STORED_DATA := <PROVIDER_ID>:<KEY_ID>:<NONCE><AUTHENTICATED_DATA><CIPHER_TEXT> +``` + +Pseudo-code for encrypt (golang): +```go +block := aes.NewCipher(primaryKeyString) +aead := cipher.NewGCM(c.block) +keyId := primaryKeyId + +// string prefix chosen over a struct to minimize complexity and for write +// serialization performance. +// for each write +nonce := make([]byte, block.BlockSize()) +io.ReadFull(crypto_rand.Reader, nonce) +authenticatedData := ETCD_KEY +cipherText := aead.Seal(nil, nonce, value, authenticatedData) +storedData := providerId + keyId + base64.Encode(nonce + authenticatedData + cipherText) +``` + +Pseudo-code for decrypt (golang): +```go +// for each read +providerId, keyId, base64Encoded := // slice provider and key from value + +// ensure this provider is the one handling providerId +aead := // lookup an aead instance for keyId or error +bytes := base64.Decode(base64Encoded) +nonce, authenticatedData, cipherText := // slice from bytes +out, err := aead.Open(nil, nonce, cipherText, authenticatedData) +``` + +### Alternative Considered: SecretBox + +Using [secretbox](https://godoc.org/golang.org/x/crypto/nacl/secretbox) would +also be a good choice for crypto. We decided to go with AES-GCM for the first +implmentation since: + +- No new library required. +- We'd need to manage AEAD ourselves. +- The cache attack is not much of a concern on x86 with AES-NI, but is more so + on ARM + +There's no problem with adding this as an alternative later. + +## Configuration + +We will add the following options to the API server. At API server startup the +user will specify: + +```yaml +--encryption-provider-config=/path/to/config +--encryption-provider=default +--encrypt-resource=v1/Secrets +``` + +The encryption provider will check it has the keys it needs and if not, generate +them as described in the following section. + +## Key Generation, Distribution and Rotation + +To start with we want to support a simple user-driven key generation, +distribution and rotation scheme. Automatic rotation may be achievable in the +future. + +To enable key rotation a common pattern is to have keys used for resource +encryption encrypted by another set of keys (Key Encryption Keys aka KEK). The +keys used for encrypting kubernetes resources (Data Encryption Keys, aka DEK) +are generated by the apiserver and stored encrypted with one of the KEKs. + +In future versions, storing a KEK off-host and off-loading encryption/decryption +of the DEK to AWS KMS, Google Cloud KMS, Hashicorp Vault etc. should be +possible. The decrypted DEK would be cached locally after boot. + +Using a remote encrypt/decrypt API offered by an external store will be limited +to encrypt/decrypt of keys, not the actual resources for performance reasons. + +Incremental deliverable options are presented below. + +### Option 1: Simple list of keys on disk + +In this solution there is no KEK/DEK scheme, just single keys in a list on disk. +They will live in a file specified by the --encryption-provider-config, which +can be an empty file when encryption is turned on. + +If the key file is empty or the user calls PUT on a /rotate API endpoint keys +are generated as follows: + +1. A new encryption key is created. +1. The key is added to a file on the API master with metadata including an ID + and an expiry time. Subsequent calls to rotate will prepend new keys to the + file such that the first key is always the key to use for encryption. +1. The list of keys being used by the master is updated in memory so that the + new key is in the list of read keys. +1. The list of keys being used by the master is updated in memory so that the + new key is the current write key. +1. All secrets are re-encrypted with the new key. + +Pros: + + - Simplicity. + - The generate/write/read interfaces can be pluggable for later replacement + with external secret management systems. + - A single master shouldn't require API Server downtime for rotation. + - No unseal step on startup since the file is already present. + - Attacker with access to /rotate is a DoS at worst, it doesn't return any + keys. + +Cons: + + - Coordination of keys between a deployment with multiple masters will require + updating the KeyEncryptionKeyDatabase file on disk and forcing a re-read. + - Users will be responsible for backing up the keyfile from the API server + disk. + +### Option 2: User supplied encryption key + +In this solution there is no KEK/DEK scheme, just single keys managed by the +user. To enable encryption a user specifies the "user-supplied-key" encryption +provider at api startup. Nothing is actually encrypted until the user calls PUT +on a /rotate API endpoint: + +1. A new encryption key is created. +1. The key is provided back to the caller for persistent storage. Within the + cluster, it only lives in memory on the master. +1. The list of keys being used by the master is updated in memory so that the + new key is in the list of read keys. +1. The list of keys being used by the master is updated in memory so that the + new key is is the current write key. +1. All secrets are re-encrypted with the new key. + +On master restart the api server will wait until the user supplies the list of +keys needed to decrypt all secrets in the database. In most cases this will be a +single key unless the re-encryption step was incomplete. + +Pros: + + - Simplicity. + - A single master shouldn't require API Server downtime. + - User is explicitly in control of managing and backing up the encryption keys. + +Cons: + + - Coordination of keys between a deployment with multiple masters is not + possible. This would have to be added as a subsequent feature using a + consensus protocol. + - API master needs to refuse to start and wait on a decrypt key from the user. + - /rotate API needs to be strongly protected: if an attacker can cause a + rotation and get the new key, it might as well not be encrypted at all. + +### Option 3: Encrypted DEKs in etcd, KEKs on disk + +In order to take an API driven approach for key rotation, new API objects (not +exposed over REST) will be defined: + +* Key Encryption Key (KEK) - key used to unlock the Data Encryption Key. Stored + on API server nodes. +* Data Encryption Key (DEK) - long-lived secret encrypted with a KEK. Stored in + etcd encrypted. Unencrypted in-memory in API servers. +* KEK Slot - to support KEK rotation there will be an ordered list of KEKs + stored in the KEK DB. The current active KEK slot number, is stored in etcd + for consistency. +* KEK DB - a file with N KEKs in a JSON list. KEK[0], by definition, is null. + +```go +type DataEncryptionKey struct { + ObjectMeta `json:"metadata,omitempty" protobuf:"bytes,1,opt,name=metadata"` + Value string // Encrypted +} +``` + +```go +type KeyEncryptionKeySlot struct { + ObjectMeta `json:"metadata,omitempty" protobuf:"bytes,1,opt,name=metadata"` + Slot int +} +``` + +```go +type KeyEncryptionKeyDatabase struct { + metav1.ObjectMeta `json:"metadata,omitempty" protobuf:"bytes,1,opt,name=metadata"` + Keys []string +} +``` + +To enable encryption a user must first create a KEK DB file and tell the API +server to use it with `--encryption-provider-config=/path/to/config`. The +file will be a simple YAML file that lists all of the keys: + +```yaml +kind: KeyEncryptionKeyDatabase +version: v1 +keys: + - foo + - bar + - baz +``` + +The user will also need to specify the encryption provider and the resources to +encrypt as follows: +```yaml +--encryption-provider-config=/path/to/key-encryption-key/db +--encryption-provider=default +--encrypt-resource=v1/Secrets +--encrypt-resource=v1/ConfigMap +``` + +Then a user calls PUT on a /rotate API endpoint the first time: + +1. A new encryption key (unencrypted DEK) is created. +1. Encrypt DEK with KEK[1] +1. The list of DEKs being used by the master is updated in memory so that the + new key is in the list of read keys. +1. The list of DEKs being used by the master is updated in etcd so that the + new key is in the list of read keys available to all masters. +1. Confirm that all masters have the new DEK for reading. Key point here is that + all readers have the new key before anyone writes with it. +1. The list of DEKs being used by the master is updated in memory so that the + new key is is the current write key. +1. The list of DEKs being used by the master is updated in etcd so that the new + key is the current write key and is available to all masters. It doesn't + matter if there's some masters using the new key and some using the old key, + since we know all masters can read with the new key. Eventually all masters + will be writing with the new key. +1. All secrets are re-encrypted with the new key. + +After N rotation calls: + +1. A new encryption key (unencrypted DEK) is created. +1. Encrypt DEK with KEK[N+1] + +Each rotation generates a new KEK and DEK. Two DEKs will be in-use temporarily +during rotation, but only one at steady-state. + +Pros: + + - Most closely matches the pattern that will be used for integrating with + external encryption systems. Hashicorp Vault, Amazon KMS, Google KMS and HSM + would eventually serve the purpose of KEK storage rather than local disk. + +Cons: + + - End state is still KEKs on disk on the master. This is equivalent to the much + simpler list of keys on disk in terms of key management and security. + Complexity is much higher. + - Coordination of keys between a deployment with multiple masters will require + manually generating and providing a key in the key file then calling rotate + to have the config re-read. Same as keys on disk. + +### Option 4: Protocol for KEK agreement between masters + +TODO: write a proposal for coordinating KEK agreement among multiple masters and +having the KEK be either user supplied or backed by external store. + + +## External providers + +It should be easy for the user to substitute a default encryption provider for +one of the following: + +* A local HSM implementation that retrieves the keys from the secure enclave + prior to reusing the AES-GCM implementation (initialization of keys only) +* Exchanging a local temporary token for the actual decryption tokens from a + networked secret vault +* Decrypting the AES-256 keys from disk using asymmetric encryption combined + with a user input password +* Sending the data over the network to a key management system for encryption + and decryption (Google KMS, Amazon KMS, Hashicorp Vault w/ Transit backend) + +### Backwards Compatibility + +Once a user encrypts any resource in etcd, they are locked to that Kubernetes +version and higher unless they choose to manually decrypt that resource in etcd. +This will be discouraged. It will be highly recommended that users discern if +their Kubernetes cluster is on a stable version before enabling encryption. + +### Performance + +Introducing even a relatively well tuned AES-GCM implementation is likely to +have performance implications for Kubernetes. Fortunately, existing +optimizations occur above the storage layer and so the highest penalty will be +incurred on writes when secrets are created or updated. In multi-tenant Kube +clusters secrets tend to have the highest load factor (there are 20-40 resources +types per namespace, but most resources only have 1 instance where secrets might +have 3-9 instances across 10k namespaces). Writes are uncommon, creates usually +happen only when a namespace is created, and reads are somewhat common. + +### Actionable Items / Milestones + +* [p0] Add ValueTransformer to storage (Done in [#41939](https://github.com/kubernetes/kubernetes/pull/41939)) +* [p0] Create a default implementation of AES-GCM interface (Done in [#41939](https://github.com/kubernetes/kubernetes/pull/41939)) +* [p0] Add encryption flags on kube-apiserver and key rotation API +* [p1] Add kubectl command to call /rotate endpoint +* [p1] Audit of default implementation for safety and security +* [p2] E2E and performance testing +* [p2] Documentation and users guide +* [p2] Read cache layer if encrypting/decrypting Secrets adds too much load on kube-apiserver + + +## Alternative Considered: Encrypting the entire etcd database + +It should be easy for the user to substitute a default encryption provider for +one of the following: + +Rather than encrypting individual resources inside the etcd database, another +approach is to encrypt the entire database. + +Pros: + + - Removes the complexity of deciding which types of things should be encrypted + in the database. + - Protects any other sensitive information that might be exposed if etcd + backups are made public accidentally or one of the other desribed attacks + occurs. + +Cons: + + - Unknown, but likely significant performance impact. If it isn't fast enough + you don't get to fall back on only encrypting the really important stuff. + As a counter argument: Docker [implemented their encryption at this + layer](https://docs.docker.com/engine/swarm/swarm_manager_locking/) and have + been happy with the performance. + diff --git a/contributors/design-proposals/autoscaling/OWNERS b/contributors/design-proposals/autoscaling/OWNERS index 17089492..9a70bb4c 100644 --- a/contributors/design-proposals/autoscaling/OWNERS +++ b/contributors/design-proposals/autoscaling/OWNERS @@ -1,3 +1,5 @@ +# See the OWNERS docs at https://go.k8s.io/owners + reviewers: - sig-autoscaling-leads approvers: diff --git a/contributors/design-proposals/aws/OWNERS b/contributors/design-proposals/aws/OWNERS index 83317bbe..cc03b55d 100644 --- a/contributors/design-proposals/aws/OWNERS +++ b/contributors/design-proposals/aws/OWNERS @@ -1,3 +1,5 @@ +# See the OWNERS docs at https://go.k8s.io/owners + reviewers: - sig-aws-leads approvers: diff --git a/contributors/design-proposals/cli/OWNERS b/contributors/design-proposals/cli/OWNERS index 248d3e7c..96fdea25 100644 --- a/contributors/design-proposals/cli/OWNERS +++ b/contributors/design-proposals/cli/OWNERS @@ -1,3 +1,5 @@ +# See the OWNERS docs at https://go.k8s.io/owners + reviewers: - sig-cli-leads approvers: diff --git a/contributors/design-proposals/cli/multi-fields-merge-key.md b/contributors/design-proposals/cli/multi-fields-merge-key.md index 9db3d549..857deb25 100644 --- a/contributors/design-proposals/cli/multi-fields-merge-key.md +++ b/contributors/design-proposals/cli/multi-fields-merge-key.md @@ -6,7 +6,7 @@ Support multi-fields merge key in Strategic Merge Patch. ## Background -Strategic Merge Patch is covered in this [doc](/contributors/devel/strategic-merge-patch.md). +Strategic Merge Patch is covered in this [doc](/contributors/devel/sig-api-machinery/strategic-merge-patch.md). In Strategic Merge Patch, we use Merge Key to identify the entries in the list of non-primitive types. It must always be present and unique to perform the merge on the list of non-primitive types, and will be preserved. diff --git a/contributors/design-proposals/cli/preserve-order-in-strategic-merge-patch.md b/contributors/design-proposals/cli/preserve-order-in-strategic-merge-patch.md index 7f6c67d7..1d3c2484 100644 --- a/contributors/design-proposals/cli/preserve-order-in-strategic-merge-patch.md +++ b/contributors/design-proposals/cli/preserve-order-in-strategic-merge-patch.md @@ -4,7 +4,7 @@ Author: @mengqiy ## Motivation -Background of the Strategic Merge Patch is covered [here](../devel/strategic-merge-patch.md). +Background of the Strategic Merge Patch is covered [here](/contributors/devel/sig-api-machinery/strategic-merge-patch.md). The Kubernetes API may apply semantic meaning to the ordering of items within a list, however the strategic merge patch does not keep the ordering of elements. diff --git a/contributors/design-proposals/cluster-lifecycle/OWNERS b/contributors/design-proposals/cluster-lifecycle/OWNERS index d69f24ee..71322d9e 100644 --- a/contributors/design-proposals/cluster-lifecycle/OWNERS +++ b/contributors/design-proposals/cluster-lifecycle/OWNERS @@ -1,3 +1,5 @@ +# See the OWNERS docs at https://go.k8s.io/owners + reviewers: - sig-cluster-lifecycle-leads approvers: diff --git a/contributors/design-proposals/cluster-lifecycle/clustering/OWNERS b/contributors/design-proposals/cluster-lifecycle/clustering/OWNERS index b3d71823..741be590 100644 --- a/contributors/design-proposals/cluster-lifecycle/clustering/OWNERS +++ b/contributors/design-proposals/cluster-lifecycle/clustering/OWNERS @@ -1,3 +1,5 @@ +# See the OWNERS docs at https://go.k8s.io/owners + reviewers: - michelleN approvers: diff --git a/contributors/design-proposals/cluster-lifecycle/runtimeconfig.md b/contributors/design-proposals/cluster-lifecycle/runtimeconfig.md index c247eff8..c1f30f5c 100644 --- a/contributors/design-proposals/cluster-lifecycle/runtimeconfig.md +++ b/contributors/design-proposals/cluster-lifecycle/runtimeconfig.md @@ -47,7 +47,7 @@ feature's owner(s). The following are suggested conventions: - Features that touch multiple components should reserve the same key in each component to toggle on/off. - Alpha features should be disabled by default. Beta features may - be enabled by default. Refer to docs/devel/api_changes.md#alpha-beta-and-stable-versions + be enabled by default. Refer to [this file](/contributors/devel/sig-architecture/api_changes.md#alpha-beta-and-stable-versions) for more detailed guidance on alpha vs. beta. ## Upgrade support diff --git a/contributors/design-proposals/gcp/OWNERS b/contributors/design-proposals/gcp/OWNERS index cd2232f4..4ff966b4 100644 --- a/contributors/design-proposals/gcp/OWNERS +++ b/contributors/design-proposals/gcp/OWNERS @@ -1,3 +1,5 @@ +# See the OWNERS docs at https://go.k8s.io/owners + reviewers: - sig-gcp-leads approvers: diff --git a/contributors/design-proposals/instrumentation/OWNERS b/contributors/design-proposals/instrumentation/OWNERS index 8e29eafa..3e1efb0c 100644 --- a/contributors/design-proposals/instrumentation/OWNERS +++ b/contributors/design-proposals/instrumentation/OWNERS @@ -1,3 +1,5 @@ +# See the OWNERS docs at https://go.k8s.io/owners + reviewers: - sig-instrumentation-leads approvers: diff --git a/contributors/design-proposals/instrumentation/core-metrics-pipeline.md b/contributors/design-proposals/instrumentation/core-metrics-pipeline.md index 1c9d9f70..1ca5dbd9 100644 --- a/contributors/design-proposals/instrumentation/core-metrics-pipeline.md +++ b/contributors/design-proposals/instrumentation/core-metrics-pipeline.md @@ -29,7 +29,7 @@ This document proposes a design for the set of metrics included in an eventual C "Kubelet": The daemon that runs on every kubernetes node and controls pod and container lifecycle, among many other things. ["cAdvisor":](https://github.com/google/cadvisor) An open source container monitoring solution which only monitors containers, and has no concept of kubernetes constructs like pods or volumes. ["Summary API":](https://git.k8s.io/kubernetes/pkg/kubelet/apis/stats/v1alpha1/types.go) A kubelet API which currently exposes node metrics for use by both system components and monitoring systems. -["CRI":](/contributors/devel/container-runtime-interface.md) The Container Runtime Interface designed to provide an abstraction over runtimes (docker, rkt, etc). +["CRI":](/contributors/devel/sig-node/container-runtime-interface.md) The Container Runtime Interface designed to provide an abstraction over runtimes (docker, rkt, etc). "Core Metrics": A set of metrics described in the [Monitoring Architecture](/contributors/design-proposals/instrumentation/monitoring_architecture.md) whose purpose is to provide metrics for first-class resource isolation and utilization features, including [resource feasibility checking](https://github.com/eBay/Kubernetes/blob/master/docs/design/resources.md#the-resource-model) and node resource management. "Resource": A consumable element of a node (e.g. memory, disk space, CPU time, etc). "First-class Resource": A resource critical for scheduling, whose requests and limits can be (or soon will be) set via the Pod/Container Spec. diff --git a/contributors/design-proposals/multicluster/OWNERS b/contributors/design-proposals/multicluster/OWNERS index fca0e564..bedef962 100644 --- a/contributors/design-proposals/multicluster/OWNERS +++ b/contributors/design-proposals/multicluster/OWNERS @@ -1,3 +1,5 @@ +# See the OWNERS docs at https://go.k8s.io/owners + reviewers: - sig-multicluster-leads approvers: diff --git a/contributors/design-proposals/multicluster/cluster-registry/api-design.md b/contributors/design-proposals/multicluster/cluster-registry/api-design.md index 2133f499..3c2b748c 100644 --- a/contributors/design-proposals/multicluster/cluster-registry/api-design.md +++ b/contributors/design-proposals/multicluster/cluster-registry/api-design.md @@ -84,7 +84,7 @@ Optional API operations: support WATCH for this API. Implementations can choose to support or not support this operation. An implementation that does not support the operation should return HTTP error 405, StatusMethodNotAllowed, per the - [relevant Kubernetes API conventions](/contributors/devel/api-conventions.md#error-codes). + [relevant Kubernetes API conventions](/contributors/devel/sig-architecture/api-conventions.md#error-codes). We also intend to support a use case where the server returns a file that can be stored for later use. We expect this to be doable with the standard API @@ -107,7 +107,7 @@ objects that contain a value for the `ClusterName` field. The `Cluster` object's of namespace scoped. The `Cluster` object will have `Spec` and `Status` fields, following the -[Kubernetes API conventions](/contributors/devel/api-conventions.md#spec-and-status). +[Kubernetes API conventions](/contributors/devel/sig-architecture/api-conventions.md#spec-and-status). There was argument in favor of a `State` field instead of `Spec` and `Status` fields, since the `Cluster` in the registry does not necessarily hold a user's intent about the cluster being represented, but instead may hold descriptive diff --git a/contributors/design-proposals/network/OWNERS b/contributors/design-proposals/network/OWNERS index 1939ca5c..42bb9ad2 100644 --- a/contributors/design-proposals/network/OWNERS +++ b/contributors/design-proposals/network/OWNERS @@ -1,3 +1,5 @@ +# See the OWNERS docs at https://go.k8s.io/owners + reviewers: - sig-network-leads approvers: diff --git a/contributors/design-proposals/network/external-lb-source-ip-preservation.md b/contributors/design-proposals/network/external-lb-source-ip-preservation.md index 50140a0e..f6a7d680 100644 --- a/contributors/design-proposals/network/external-lb-source-ip-preservation.md +++ b/contributors/design-proposals/network/external-lb-source-ip-preservation.md @@ -50,7 +50,7 @@ lot of applications and customer use-cases. # Alpha Design This section describes the proposed design for -[alpha-level](../devel/api_changes.md#alpha-beta-and-stable-versions) support, although +[alpha-level](/contributors/devel/sig-architecture/api_changes.md#alpha-beta-and-stable-versions) support, although additional features are described in [future work](#future-work). ## Overview diff --git a/contributors/design-proposals/node/OWNERS b/contributors/design-proposals/node/OWNERS index ab6d8dd5..810bc689 100644 --- a/contributors/design-proposals/node/OWNERS +++ b/contributors/design-proposals/node/OWNERS @@ -1,3 +1,5 @@ +# See the OWNERS docs at https://go.k8s.io/owners + reviewers: - sig-node-leads approvers: diff --git a/contributors/design-proposals/node/cri-dockershim-checkpoint.md b/contributors/design-proposals/node/cri-dockershim-checkpoint.md index 85db4c89..9f3a10b5 100644 --- a/contributors/design-proposals/node/cri-dockershim-checkpoint.md +++ b/contributors/design-proposals/node/cri-dockershim-checkpoint.md @@ -4,7 +4,7 @@ [#34672](https://github.com/kubernetes/kubernetes/issues/34672) ## Background -[Container Runtime Interface (CRI)](../devel/container-runtime-interface.md) +[Container Runtime Interface (CRI)](/contributors/devel/sig-node/container-runtime-interface.md) is an ongoing project to allow container runtimes to integrate with kubernetes via a newly-defined API. [Dockershim](https://github.com/kubernetes/kubernetes/blob/release-1.5/pkg/kubelet/dockershim) diff --git a/contributors/design-proposals/node/secret-configmap-downwardapi-file-mode.md b/contributors/design-proposals/node/secret-configmap-downwardapi-file-mode.md index cdfe1e1c..1d5bd7b7 100644 --- a/contributors/design-proposals/node/secret-configmap-downwardapi-file-mode.md +++ b/contributors/design-proposals/node/secret-configmap-downwardapi-file-mode.md @@ -49,7 +49,7 @@ This was asked on the mailing list here[2] and here[3], too. Several alternatives have been considered: * Add a mode to the API definition when using secrets: this is backward - compatible as described in (docs/devel/api_changes.md) IIUC and seems like the + compatible as described [here](/contributors/devel/sig-architecture/api_changes.md) IIUC and seems like the way to go. Also @thockin said in the ML that he would consider such an approach. But it might be worth to consider if we want to do the same for configmaps or owners, but there is no need to do it now either. diff --git a/contributors/design-proposals/release/OWNERS b/contributors/design-proposals/release/OWNERS index 9d8e7403..c414be94 100644 --- a/contributors/design-proposals/release/OWNERS +++ b/contributors/design-proposals/release/OWNERS @@ -1,3 +1,5 @@ +# See the OWNERS docs at https://go.k8s.io/owners + reviewers: - sig-release-leads approvers: diff --git a/contributors/design-proposals/resource-management/OWNERS b/contributors/design-proposals/resource-management/OWNERS index 60221854..d717eba7 100644 --- a/contributors/design-proposals/resource-management/OWNERS +++ b/contributors/design-proposals/resource-management/OWNERS @@ -1,3 +1,5 @@ +# See the OWNERS docs at https://go.k8s.io/owners + reviewers: - wg-resource-management-leads approvers: diff --git a/contributors/design-proposals/scalability/OWNERS b/contributors/design-proposals/scalability/OWNERS index 2b68b875..6b57aa45 100644 --- a/contributors/design-proposals/scalability/OWNERS +++ b/contributors/design-proposals/scalability/OWNERS @@ -1,3 +1,5 @@ +# See the OWNERS docs at https://go.k8s.io/owners + reviewers: - sig-scalability-leads approvers: diff --git a/contributors/design-proposals/scheduling/OWNERS b/contributors/design-proposals/scheduling/OWNERS index b3248766..f6155ab6 100644 --- a/contributors/design-proposals/scheduling/OWNERS +++ b/contributors/design-proposals/scheduling/OWNERS @@ -1,3 +1,5 @@ +# See the OWNERS docs at https://go.k8s.io/owners + reviewers: - sig-scheduling-leads approvers: diff --git a/contributors/design-proposals/scheduling/images/OWNERS b/contributors/design-proposals/scheduling/images/OWNERS index fe173c27..14c05899 100644 --- a/contributors/design-proposals/scheduling/images/OWNERS +++ b/contributors/design-proposals/scheduling/images/OWNERS @@ -1,3 +1,5 @@ +# See the OWNERS docs at https://go.k8s.io/owners + reviewers: - bsalamat - michelleN diff --git a/contributors/design-proposals/scheduling/scheduler_extender.md b/contributors/design-proposals/scheduling/scheduler_extender.md index de7a6259..bc65f9ba 100644 --- a/contributors/design-proposals/scheduling/scheduler_extender.md +++ b/contributors/design-proposals/scheduling/scheduler_extender.md @@ -2,7 +2,7 @@ There are three ways to add new scheduling rules (predicates and priority functions) to Kubernetes: (1) by adding these rules to the scheduler and -recompiling, [described here](/contributors/devel/scheduler.md), +recompiling, [described here](/contributors/devel/sig-scheduling/scheduler.md), (2) implementing your own scheduler process that runs instead of, or alongside of, the standard Kubernetes scheduler, (3) implementing a "scheduler extender" process that the standard Kubernetes scheduler calls out to as a final pass when diff --git a/contributors/design-proposals/service-catalog/OWNERS b/contributors/design-proposals/service-catalog/OWNERS index 5c6b18ed..a4884d4d 100644 --- a/contributors/design-proposals/service-catalog/OWNERS +++ b/contributors/design-proposals/service-catalog/OWNERS @@ -1,3 +1,5 @@ +# See the OWNERS docs at https://go.k8s.io/owners + reviewers: - sig-service-catalog-leads approvers: diff --git a/contributors/design-proposals/storage/OWNERS b/contributors/design-proposals/storage/OWNERS index fb58418f..6dd5158f 100644 --- a/contributors/design-proposals/storage/OWNERS +++ b/contributors/design-proposals/storage/OWNERS @@ -1,3 +1,5 @@ +# See the OWNERS docs at https://go.k8s.io/owners + reviewers: - sig-storage-leads approvers: diff --git a/contributors/design-proposals/storage/container-storage-interface-pod-information.md b/contributors/design-proposals/storage/container-storage-interface-pod-information.md new file mode 100644 index 00000000..872f9d45 --- /dev/null +++ b/contributors/design-proposals/storage/container-storage-interface-pod-information.md @@ -0,0 +1,48 @@ +# Pod in CSI NodePublish request +Author: @jsafrane + +## Goal +* Pass Pod information (pod name/namespace/UID + service account) to CSI drivers in `NodePublish` request as CSI volume attributes. + +## Motivation +We'd like to move away from exec based Flex to gRPC based CSI volumes. In Flex, kubelet always passes `pod.namespace`, `pod.name`, `pod.uid` and `pod.spec.serviceAccountName` ("pod information") in every `mount` call. In Kubernetes community we've seen some Flex drivers that use pod or service account information to authorize or audit usage of a volume or generate content of the volume tailored to the pod (e.g. https://github.com/Azure/kubernetes-keyvault-flexvol). + +CSI is agnostic to container orchestrators (such as Kubernetes, Mesos or CloudFoundry) and as such does not understand concept of pods and service accounts. [Enhancement of CSI protocol](https://github.com/container-storage-interface/spec/pull/252) to pass "workload" (~pod) information from Kubernetes to CSI driver has met some resistance. + +## High-level design +We decided to pass the pod information as `NodePublishVolumeRequest.volume_attributes`. + +* Kubernetes passes pod information only to CSI drivers that explicitly require that information in their [`CSIDriver` instance](https://github.com/kubernetes/community/pull/2523). These drivers are tightly coupled to Kubernetes and may not work or may require reconfiguration on other cloud orchestrators. It is expected (but not limited to) that these drivers will provide ephemeral volumes similar to Secrets or ConfigMap, extending Kubernetes secret or configuration sources. +* Kubernetes will not pass pod information to CSI drivers that don't know or don't care about pods and service accounts. It is expected (but not limited to) that these drivers will provide real persistent storage. Such CSI driver would reject a CSI call with pod information as invalid. This is current behavior of Kubernetes and it will be the default behavior. + +## Detailed design + +### API changes +No API changes. + +### CSI enhancement +We don't need to change CSI protocol in any way. It allows kubelet to pass `pod.name`, `pod.uid` and `pod.spec.serviceAccountName` in [`NodePublish` call as `volume_attributes`]((https://github.com/container-storage-interface/spec/blob/master/spec.md#nodepublishvolume)). `NodePublish` is roughly equivalent to Flex `mount` call. + +The only thing we need to do is to **define** names of the `volume_attributes` keys that CSI drivers can expect: + * `csi.storage.k8s.io/pod.name`: name of the pod that wants the volume. + * `csi.storage.k8s.io/pod.namespace`: namespace of the pod that wants the volume. + * `csi.storage.k8s.io/pod.uid`: uid of the pod that wants the volume. + * `csi.storage.k8s.io/serviceAccount.name`: name of the service account under which the pod operates. Namespace of the service account is the same as `pod.namespace`. + +Note that these attribute names are very similar to [parameters we pass to flex volume plugin](https://github.com/kubernetes/kubernetes/blob/10688257e63e4d778c499ba30cddbc8c6219abe9/pkg/volume/flexvolume/driver-call.go#L55). + +### Kubelet +Kubelet needs to create informer to cache `CSIDriver` instances. It passes the informer to CSI volume plugin as a new argument of [`ProbeVolumePlugins`](https://github.com/kubernetes/kubernetes/blob/43f805b7bdda7a5b491d34611f85c249a63d7f97/pkg/volume/csi/csi_plugin.go#L58). + +### CSI volume plugin +In `SetUpAt()`, the CSI volume plugin checks the `CSIDriver` informer if `CSIDriver` instance exists for a particular CSI driver that handles the volume. If the instance exists and has `PodInfoRequiredOnMount` set, the volume plugin adds `csi.storage.k8s.io/*` attributes to `volume_attributes` of the CSI volume. It blindly overwrites any existing values there. + +Kubelet and the volume plugin must tolerate when CRD for `CSIDriver` is not created (yet). Kubelet and CSI volume plugin falls back to original behavior, i.e. does not pass any pod information to CSI. We expect that CSI drivers will return reasonable error code instead of mounting a wrong volume. + +TODO(jsafrane): check what (shared?) informer does when it's created for non-existing CRD. Will it start working automatically when the CRD is created? Or shall we retry creation of the informer every X seconds until the CRD is created? Alternatively, we may GEt fresh `CSIDriver` from API server in `SetUpAt()`, without any informer. + +## Implementation + +* Alpha in 1.12 (behind `CSIPodInfo` feature gate) +* Beta in 1.13 (behind `CSIPodInfo` feature gate) +* GA 1.14? diff --git a/contributors/design-proposals/storage/container-storage-interface.md b/contributors/design-proposals/storage/container-storage-interface.md index 9a1b3d5e..e368b4ac 100644 --- a/contributors/design-proposals/storage/container-storage-interface.md +++ b/contributors/design-proposals/storage/container-storage-interface.md @@ -29,7 +29,7 @@ Kubernetes volume plugins are currently “in-tree” meaning they are linked, c 4. Volume plugins get full privileges of kubernetes components (kubelet and kube-controller-manager). 5. Plugin developers are forced to make plugin source code available, and can not choose to release just a binary. -The existing [Flex Volume](/contributors/devel/flexvolume.md) plugin attempted to address this by exposing an exec based API for mount/unmount/attach/detach. Although it enables third party storage vendors to write drivers out-of-tree, it requires access to the root filesystem of node and master machines in order to deploy the third party driver files. +The existing [Flex Volume] plugin attempted to address this by exposing an exec based API for mount/unmount/attach/detach. Although it enables third party storage vendors to write drivers out-of-tree, it requires access to the root filesystem of node and master machines in order to deploy the third party driver files. Additionally, it doesn’t address another pain of in-tree volumes plugins: dependencies. Volume plugins tend to have many external requirements: dependencies on mount and filesystem tools, for example. These dependencies are assumed to be available on the underlying host OS, which often is not the case, and installing them requires direct machine access. There are efforts underway, for example https://github.com/kubernetes/community/pull/589, that are hoping to address this for in-tree volume plugins. But, enabling volume plugins to be completely containerized will make dependency management much easier. @@ -56,7 +56,7 @@ The objective of this document is to document all the requirements for enabling * Recommend deployment process for Kubernetes compatible, third-party CSI Volume drivers on a Kubernetes cluster. ## Non-Goals -* Replace [Flex Volume plugin](/contributors/devel/flexvolume.md) +* Replace [Flex Volume plugin] * The Flex volume plugin exists as an exec based mechanism to create “out-of-tree” volume plugins. * Because Flex drivers exist and depend on the Flex interface, it will continue to be supported with a stable API. * The CSI Volume plugin will co-exist with Flex volume plugin. @@ -243,7 +243,7 @@ type VolumeAttachment struct { metav1.TypeMeta `json:",inline"` // Standard object metadata. - // More info: https://git.k8s.io/community/contributors/devel/api-conventions.md#metadata + // More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#metadata // +optional metav1.ObjectMeta `json:"metadata,omitempty" protobuf:"bytes,1,opt,name=metadata"` @@ -777,3 +777,7 @@ Instead of creating a new `VolumeAttachment` object, another option we considere * List of nodes the volume was successfully attached to. We dismissed this approach because having attach/detach triggered by the creation/deletion of an object is much easier to manage (for both external-attacher and Kubernetes) and more robust (fewer corner cases to worry about). + + +[Flex Volume]: /contributors/devel/sig-storage/flexvolume.md +[Flex Volume plugin]: /contributors/devel/sig-storage/flexvolume.md
\ No newline at end of file diff --git a/contributors/design-proposals/storage/csi-snapshot.md b/contributors/design-proposals/storage/csi-snapshot.md index beb46d58..db9abf4f 100644 --- a/contributors/design-proposals/storage/csi-snapshot.md +++ b/contributors/design-proposals/storage/csi-snapshot.md @@ -59,7 +59,7 @@ The API design of VolumeSnapshot and VolumeSnapshotContent is modeled after Pers type VolumeSnapshot struct { metav1.TypeMeta `json:",inline"` // Standard object's metadata. - // More info: https://git.k8s.io/community/contributors/devel/api-conventions.md#metadata + // More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#metadata // +optional metav1.ObjectMeta `json:"metadata,omitempty" protobuf:"bytes,1,opt,name=metadata"` @@ -144,7 +144,7 @@ Note that if an error occurs before the snapshot is cut, `Error` will be set and type VolumeSnapshotContent struct { metav1.TypeMeta `json:",inline"` // Standard object's metadata. - // More info: https://git.k8s.io/community/contributors/devel/api-conventions.md#metadata + // More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#metadata // +optional metav1.ObjectMeta `json:"metadata,omitempty" protobuf:"bytes,1,opt,name=metadata"` @@ -234,7 +234,7 @@ A new VolumeSnapshotClass API object will be added instead of reusing the existi type VolumeSnapshotClass struct { metav1.TypeMeta `json:",inline"` // Standard object's metadata. - // More info: https://git.k8s.io/community/contributors/devel/api-conventions.md#metadata + // More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#metadata // +optional metav1.ObjectMeta `json:"metadata,omitempty" protobuf:"bytes,1,opt,name=metadata"` @@ -292,7 +292,7 @@ As the figure below shows, the CSI snapshot controller architecture consists of * External snapshotter uses ControllerGetCapabilities to find out if CSI driver supports CREATE_DELETE_SNAPSHOT calls. It degrades to trivial mode if not. -* External snapshotter is responsible for creating/deleting snapshots and binding snapshot and SnapshotContent objects. It follows [controller](https://github.com/kubernetes/community/blob/master/contributors/devel/controllers.md) pattern and uses informers to watch for `VolumeSnapshot` and `VolumeSnapshotContent` create/update/delete events. It filters out `VolumeSnapshot` instances with `Snapshotter==<CSI driver name>` and processes these events in workqueues with exponential backoff. +* External snapshotter is responsible for creating/deleting snapshots and binding snapshot and SnapshotContent objects. It follows [controller](/contributors/devel/sig-api-machinery/controllers.md) pattern and uses informers to watch for `VolumeSnapshot` and `VolumeSnapshotContent` create/update/delete events. It filters out `VolumeSnapshot` instances with `Snapshotter==<CSI driver name>` and processes these events in workqueues with exponential backoff. * For dynamically created snapshot, it should have a VolumeSnapshotClass associated with it. User can explicitly specify a VolumeSnapshotClass in the VolumeSnapshot API object. If user does not specify a VolumeSnapshotClass, a default VolumeSnapshotClass created by the admin will be used. This is similar to how a default StorageClass created by the admin will be used for the provisioning of a PersistentVolumeClaim. diff --git a/contributors/design-proposals/storage/flexvolume-deployment.md b/contributors/design-proposals/storage/flexvolume-deployment.md index 0b40748b..19b7ea63 100644 --- a/contributors/design-proposals/storage/flexvolume-deployment.md +++ b/contributors/design-proposals/storage/flexvolume-deployment.md @@ -10,7 +10,7 @@ Beginning in version 1.8, the Kubernetes Storage SIG is putting a stop to accept [CSI](https://github.com/container-storage-interface/spec/blob/master/spec.md) provides a single interface that storage vendors can implement in order for their storage solutions to work across many different container orchestrators, and volume plugins are out-of-tree by design. This is a large effort, the full implementation of CSI is several quarters away, and there is a need for an immediate solution for storage vendors to continue adding volume plugins. -[Flexvolume](/contributors/devel/flexvolume.md) is an in-tree plugin that has the ability to run any storage solution by executing volume commands against a user-provided driver on the Kubernetes host, and this currently exists today. However, the process of setting up Flexvolume is very manual, pushing it out of consideration for many users. Problems include having to copy the driver to a specific location in each node, manually restarting kubelet, and user's limited access to machines. +[Flexvolume] is an in-tree plugin that has the ability to run any storage solution by executing volume commands against a user-provided driver on the Kubernetes host, and this currently exists today. However, the process of setting up Flexvolume is very manual, pushing it out of consideration for many users. Problems include having to copy the driver to a specific location in each node, manually restarting kubelet, and user's limited access to machines. An automated deployment technique is discussed in [Recommended Driver Deployment Method](#recommended-driver-deployment-method). The crucial change required to enable this method is allowing kubelet and controller manager to dynamically discover plugin changes. @@ -164,3 +164,5 @@ Cons: Does not guarantee every node has a pod running. Pod anti-affinity can be * How does this system work with containerized kubelet? * Are there any SELinux implications? + +[Flexvolume]: /contributors/devel/sig-storage/flexvolume.md
\ No newline at end of file diff --git a/contributors/design-proposals/testing/OWNERS b/contributors/design-proposals/testing/OWNERS index 48c9f03c..541bac08 100644 --- a/contributors/design-proposals/testing/OWNERS +++ b/contributors/design-proposals/testing/OWNERS @@ -1,3 +1,5 @@ +# See the OWNERS docs at https://go.k8s.io/owners + reviewers: - sig-testing-leads approvers: diff --git a/contributors/devel/OWNERS b/contributors/devel/OWNERS index c4d35842..4b7cccf3 100644 --- a/contributors/devel/OWNERS +++ b/contributors/devel/OWNERS @@ -1,3 +1,5 @@ +# See the OWNERS docs at https://go.k8s.io/owners + reviewers: - calebamiles - cblecker @@ -5,6 +7,7 @@ reviewers: - idvoretskyi - Phillels - spiffxp + - guineveresaenger approvers: - calebamiles - cblecker diff --git a/contributors/devel/README.md b/contributors/devel/README.md index 626adaad..31c0bcac 100644 --- a/contributors/devel/README.md +++ b/contributors/devel/README.md @@ -15,7 +15,7 @@ Guide](http://kubernetes.io/docs/admin/). * **Pull Request Process** ([/contributors/guide/pull-requests.md](/contributors/guide/pull-requests.md)): When and why pull requests are closed. -* **Getting Recent Builds** ([getting-builds.md](getting-builds.md)): How to get recent builds including the latest builds that pass CI. +* **Getting Recent Builds** ([getting-builds.md](sig-release/getting-builds.md)): How to get recent builds including the latest builds that pass CI. * **Automated Tools** ([automation.md](automation.md)): Descriptions of the automation that is running on our github repository. @@ -24,20 +24,20 @@ Guide](http://kubernetes.io/docs/admin/). * **Development Guide** ([development.md](development.md)): Setting up your development environment. -* **Testing** ([testing.md](testing.md)): How to run unit, integration, and end-to-end tests in your development sandbox. +* **Testing** ([testing.md](sig-testing/testing.md)): How to run unit, integration, and end-to-end tests in your development sandbox. -* **Conformance Testing** ([conformance-tests.md](conformance-tests.md)) +* **Conformance Testing** ([conformance-tests.md](sig-architecture/conformance-tests.md)) What is conformance testing and how to create/manage them. -* **Hunting flaky tests** ([flaky-tests.md](flaky-tests.md)): We have a goal of 99.9% flake free tests. +* **Hunting flaky tests** ([flaky-tests.md](sig-testing/flaky-tests.md)): We have a goal of 99.9% flake free tests. Here's how to run your tests many times. -* **Logging Conventions** ([logging.md](logging.md)): Glog levels. +* **Logging Conventions** ([logging.md](sig-instrumentation/logging.md)): Glog levels. -* **Profiling Kubernetes** ([profiling.md](profiling.md)): How to plug in go pprof profiler to Kubernetes. +* **Profiling Kubernetes** ([profiling.md](sig-scalability/profiling.md)): How to plug in go pprof profiler to Kubernetes. * **Instrumenting Kubernetes with a new metric** - ([instrumentation.md](instrumentation.md)): How to add a new metrics to the + ([instrumentation.md](sig-instrumentation/instrumentation.md)): How to add a new metrics to the Kubernetes code base. * **Coding Conventions** ([coding-conventions.md](../guide/coding-conventions.md)): diff --git a/contributors/devel/api-conventions.md b/contributors/devel/api-conventions.md index 2e0bd7ad..91eb7417 100644 --- a/contributors/devel/api-conventions.md +++ b/contributors/devel/api-conventions.md @@ -1,1367 +1,3 @@ -API Conventions -=============== - -Updated: 3/7/2017 - -*This document is oriented at users who want a deeper understanding of the -Kubernetes API structure, and developers wanting to extend the Kubernetes API. -An introduction to using resources with kubectl can be found in [the object management overview](https://kubernetes.io/docs/tutorials/object-management-kubectl/object-management/).* - -**Table of Contents** - - - - [Types (Kinds)](#types-kinds) - - [Resources](#resources) - - [Objects](#objects) - - [Metadata](#metadata) - - [Spec and Status](#spec-and-status) - - [Typical status properties](#typical-status-properties) - - [References to related objects](#references-to-related-objects) - - [Lists of named subobjects preferred over maps](#lists-of-named-subobjects-preferred-over-maps) - - [Primitive types](#primitive-types) - - [Constants](#constants) - - [Unions](#unions) - - [Lists and Simple kinds](#lists-and-simple-kinds) - - [Differing Representations](#differing-representations) - - [Verbs on Resources](#verbs-on-resources) - - [PATCH operations](#patch-operations) - - [Strategic Merge Patch](#strategic-merge-patch) - - [Idempotency](#idempotency) - - [Optional vs. Required](#optional-vs-required) - - [Defaulting](#defaulting) - - [Late Initialization](#late-initialization) - - [Concurrency Control and Consistency](#concurrency-control-and-consistency) - - [Serialization Format](#serialization-format) - - [Units](#units) - - [Selecting Fields](#selecting-fields) - - [Object references](#object-references) - - [HTTP Status codes](#http-status-codes) - - [Success codes](#success-codes) - - [Error codes](#error-codes) - - [Response Status Kind](#response-status-kind) - - [Events](#events) - - [Naming conventions](#naming-conventions) - - [Label, selector, and annotation conventions](#label-selector-and-annotation-conventions) - - [WebSockets and SPDY](#websockets-and-spdy) - - [Validation](#validation) - - -The conventions of the [Kubernetes API](https://kubernetes.io/docs/api/) (and related APIs in the -ecosystem) are intended to ease client development and ensure that configuration -mechanisms can be implemented that work across a diverse set of use cases -consistently. - -The general style of the Kubernetes API is RESTful - clients create, update, -delete, or retrieve a description of an object via the standard HTTP verbs -(POST, PUT, DELETE, and GET) - and those APIs preferentially accept and return -JSON. Kubernetes also exposes additional endpoints for non-standard verbs and -allows alternative content types. All of the JSON accepted and returned by the -server has a schema, identified by the "kind" and "apiVersion" fields. Where -relevant HTTP header fields exist, they should mirror the content of JSON -fields, but the information should not be represented only in the HTTP header. - -The following terms are defined: - -* **Kind** the name of a particular object schema (e.g. the "Cat" and "Dog" -kinds would have different attributes and properties) -* **Resource** a representation of a system entity, sent or retrieved as JSON -via HTTP to the server. Resources are exposed via: - * Collections - a list of resources of the same type, which may be queryable - * Elements - an individual resource, addressable via a URL -* **API Group** a set of resources that are exposed together. Along -with the version is exposed in the "apiVersion" field as "GROUP/VERSION", e.g. -"policy.k8s.io/v1". - -Each resource typically accepts and returns data of a single kind. A kind may be -accepted or returned by multiple resources that reflect specific use cases. For -instance, the kind "Pod" is exposed as a "pods" resource that allows end users -to create, update, and delete pods, while a separate "pod status" resource (that -acts on "Pod" kind) allows automated processes to update a subset of the fields -in that resource. - -Resources are bound together in API groups - each group may have one or more -versions that evolve independent of other API groups, and each version within -the group has one or more resources. Group names are typically in domain name -form - the Kubernetes project reserves use of the empty group, all single -word names ("extensions", "apps"), and any group name ending in "*.k8s.io" for -its sole use. When choosing a group name, we recommend selecting a subdomain -your group or organization owns, such as "widget.mycompany.com". - -Resource collections should be all lowercase and plural, whereas kinds are -CamelCase and singular. Group names must be lower case and be valid DNS -subdomains. - - -## Types (Kinds) - -Kinds are grouped into three categories: - -1. **Objects** represent a persistent entity in the system. - - Creating an API object is a record of intent - once created, the system will -work to ensure that resource exists. All API objects have common metadata. - - An object may have multiple resources that clients can use to perform -specific actions that create, update, delete, or get. - - Examples: `Pod`, `ReplicationController`, `Service`, `Namespace`, `Node`. - -2. **Lists** are collections of **resources** of one (usually) or more -(occasionally) kinds. - - The name of a list kind must end with "List". Lists have a limited set of -common metadata. All lists use the required "items" field to contain the array -of objects they return. Any kind that has the "items" field must be a list kind. - - Most objects defined in the system should have an endpoint that returns the -full set of resources, as well as zero or more endpoints that return subsets of -the full list. Some objects may be singletons (the current user, the system -defaults) and may not have lists. - - In addition, all lists that return objects with labels should support label -filtering (see [the labels documentation](https://kubernetes.io/docs/user-guide/labels/)), and most -lists should support filtering by fields. - - Examples: `PodLists`, `ServiceLists`, `NodeLists`. - - TODO: Describe field filtering below or in a separate doc. - -3. **Simple** kinds are used for specific actions on objects and for -non-persistent entities. - - Given their limited scope, they have the same set of limited common metadata -as lists. - - For instance, the "Status" kind is returned when errors occur and is not -persisted in the system. - - Many simple resources are "subresources", which are rooted at API paths of -specific resources. When resources wish to expose alternative actions or views -that are closely coupled to a single resource, they should do so using new -sub-resources. Common subresources include: - - * `/binding`: Used to bind a resource representing a user request (e.g., Pod, -PersistentVolumeClaim) to a cluster infrastructure resource (e.g., Node, -PersistentVolume). - * `/status`: Used to write just the status portion of a resource. For -example, the `/pods` endpoint only allows updates to `metadata` and `spec`, -since those reflect end-user intent. An automated process should be able to -modify status for users to see by sending an updated Pod kind to the server to -the "/pods/<name>/status" endpoint - the alternate endpoint allows -different rules to be applied to the update, and access to be appropriately -restricted. - * `/scale`: Used to read and write the count of a resource in a manner that -is independent of the specific resource schema. - - Two additional subresources, `proxy` and `portforward`, provide access to -cluster resources as described in -[accessing the cluster](https://kubernetes.io/docs/user-guide/accessing-the-cluster/). - -The standard REST verbs (defined below) MUST return singular JSON objects. Some -API endpoints may deviate from the strict REST pattern and return resources that -are not singular JSON objects, such as streams of JSON objects or unstructured -text log data. - -A common set of "meta" API objects are used across all API groups and are -thus considered part of the server group named `meta.k8s.io`. These types may -evolve independent of the API group that uses them and API servers may allow -them to be addressed in their generic form. Examples are `ListOptions`, -`DeleteOptions`, `List`, `Status`, `WatchEvent`, and `Scale`. For historical -reasons these types are part of each existing API group. Generic tools like -quota, garbage collection, autoscalers, and generic clients like kubectl -leverage these types to define consistent behavior across different resource -types, like the interfaces in programming languages. - -The term "kind" is reserved for these "top-level" API types. The term "type" -should be used for distinguishing sub-categories within objects or subobjects. - -### Resources - -All JSON objects returned by an API MUST have the following fields: - -* kind: a string that identifies the schema this object should have -* apiVersion: a string that identifies the version of the schema the object -should have - -These fields are required for proper decoding of the object. They may be -populated by the server by default from the specified URL path, but the client -likely needs to know the values in order to construct the URL path. - -### Objects - -#### Metadata - -Every object kind MUST have the following metadata in a nested object field -called "metadata": - -* namespace: a namespace is a DNS compatible label that objects are subdivided -into. The default namespace is 'default'. See -[the namespace docs](https://kubernetes.io/docs/user-guide/namespaces/) for more. -* name: a string that uniquely identifies this object within the current -namespace (see [the identifiers docs](https://kubernetes.io/docs/user-guide/identifiers/)). -This value is used in the path when retrieving an individual object. -* uid: a unique in time and space value (typically an RFC 4122 generated -identifier, see [the identifiers docs](https://kubernetes.io/docs/user-guide/identifiers/)) -used to distinguish between objects with the same name that have been deleted -and recreated - -Every object SHOULD have the following metadata in a nested object field called -"metadata": - -* resourceVersion: a string that identifies the internal version of this object -that can be used by clients to determine when objects have changed. This value -MUST be treated as opaque by clients and passed unmodified back to the server. -Clients should not assume that the resource version has meaning across -namespaces, different kinds of resources, or different servers. (See -[concurrency control](#concurrency-control-and-consistency), below, for more -details.) -* generation: a sequence number representing a specific generation of the -desired state. Set by the system and monotonically increasing, per-resource. May -be compared, such as for RAW and WAW consistency. -* creationTimestamp: a string representing an RFC 3339 date of the date and time -an object was created -* deletionTimestamp: a string representing an RFC 3339 date of the date and time -after which this resource will be deleted. This field is set by the server when -a graceful deletion is requested by the user, and is not directly settable by a -client. The resource will be deleted (no longer visible from resource lists, and -not reachable by name) after the time in this field except when the object has -a finalizer set. In case the finalizer is set the deletion of the object is -postponed at least until the finalizer is removed. -Once the deletionTimestamp is set, this value may not be unset or be set further -into the future, although it may be shortened or the resource may be deleted -prior to this time. -* labels: a map of string keys and values that can be used to organize and -categorize objects (see [the labels docs](https://kubernetes.io/docs/user-guide/labels/)) -* annotations: a map of string keys and values that can be used by external -tooling to store and retrieve arbitrary metadata about this object (see -[the annotations docs](https://kubernetes.io/docs/user-guide/annotations/)) - -Labels are intended for organizational purposes by end users (select the pods -that match this label query). Annotations enable third-party automation and -tooling to decorate objects with additional metadata for their own use. - -#### Spec and Status - -By convention, the Kubernetes API makes a distinction between the specification -of the desired state of an object (a nested object field called "spec") and the -status of the object at the current time (a nested object field called -"status"). The specification is a complete description of the desired state, -including configuration settings provided by the user, -[default values](#defaulting) expanded by the system, and properties initialized -or otherwise changed after creation by other ecosystem components (e.g., -schedulers, auto-scalers), and is persisted in stable storage with the API -object. If the specification is deleted, the object will be purged from the -system. The status summarizes the current state of the object in the system, and -is usually persisted with the object by an automated processes but may be -generated on the fly. At some cost and perhaps some temporary degradation in -behavior, the status could be reconstructed by observation if it were lost. - -When a new version of an object is POSTed or PUT, the "spec" is updated and -available immediately. Over time the system will work to bring the "status" into -line with the "spec". The system will drive toward the most recent "spec" -regardless of previous versions of that stanza. In other words, if a value is -changed from 2 to 5 in one PUT and then back down to 3 in another PUT the system -is not required to 'touch base' at 5 before changing the "status" to 3. In other -words, the system's behavior is *level-based* rather than *edge-based*. This -enables robust behavior in the presence of missed intermediate state changes. - -The Kubernetes API also serves as the foundation for the declarative -configuration schema for the system. In order to facilitate level-based -operation and expression of declarative configuration, fields in the -specification should have declarative rather than imperative names and -semantics -- they represent the desired state, not actions intended to yield the -desired state. - -The PUT and POST verbs on objects MUST ignore the "status" values, to avoid -accidentally overwriting the status in read-modify-write scenarios. A `/status` -subresource MUST be provided to enable system components to update statuses of -resources they manage. - -Otherwise, PUT expects the whole object to be specified. Therefore, if a field -is omitted it is assumed that the client wants to clear that field's value. The -PUT verb does not accept partial updates. Modification of just part of an object -may be achieved by GETting the resource, modifying part of the spec, labels, or -annotations, and then PUTting it back. See -[concurrency control](#concurrency-control-and-consistency), below, regarding -read-modify-write consistency when using this pattern. Some objects may expose -alternative resource representations that allow mutation of the status, or -performing custom actions on the object. - -All objects that represent a physical resource whose state may vary from the -user's desired intent SHOULD have a "spec" and a "status". Objects whose state -cannot vary from the user's desired intent MAY have only "spec", and MAY rename -"spec" to a more appropriate name. - -Objects that contain both spec and status should not contain additional -top-level fields other than the standard metadata fields. - -Some objects which are not persisted in the system - such as `SubjectAccessReview` -and other webhook style calls - may choose to add spec and status to encapsulate -a "call and response" pattern. The spec is the request (often a request for -information) and the status is the response. For these RPC like objects the only -operation may be POST, but having a consistent schema between submission and -response reduces the complexity of these clients. - - -##### Typical status properties - -**Conditions** represent the latest available observations of an object's -state. They are an extension mechanism intended to be used when the details of -an observation are not a priori known or would not apply to all instances of a -given Kind. For observations that are well known and apply to all instances, a -regular field is preferred. An example of a Condition that probably should -have been a regular field is Pod's "Ready" condition - it is managed by core -controllers, it is well understood, and it applies to all Pods. - -Objects may report multiple conditions, and new types of conditions may be -added in the future or by 3rd party controllers. Therefore, conditions are -represented using a list/slice, where all have similar structure. - -The `FooCondition` type for some resource type `Foo` may include a subset of the -following fields, but must contain at least `type` and `status` fields: - -```go - Type FooConditionType `json:"type" description:"type of Foo condition"` - Status ConditionStatus `json:"status" description:"status of the condition, one of True, False, Unknown"` - - // +optional - Reason *string `json:"reason,omitempty" description:"one-word CamelCase reason for the condition's last transition"` - // +optional - Message *string `json:"message,omitempty" description:"human-readable message indicating details about last transition"` - - // +optional - LastHeartbeatTime *unversioned.Time `json:"lastHeartbeatTime,omitempty" description:"last time we got an update on a given condition"` - // +optional - LastTransitionTime *unversioned.Time `json:"lastTransitionTime,omitempty" description:"last time the condition transit from one status to another"` -``` - -Additional fields may be added in the future. - -Do not use fields that you don't need - simpler is better. - -Use of the `Reason` field is encouraged. - -Use the `LastHeartbeatTime` with great caution - frequent changes to this field -can cause a large fan-out effect for some resources. - -Conditions should be added to explicitly convey properties that users and -components care about rather than requiring those properties to be inferred from -other observations. Once defined, the meaning of a Condition can not be -changed arbitrarily - it becomes part of the API, and has the same backwards- -and forwards-compatibility concerns of any other part of the API. - -Condition status values may be `True`, `False`, or `Unknown`. The absence of a -condition should be interpreted the same as `Unknown`. How controllers handle -`Unknown` depends on the Condition in question. - -Condition types should indicate state in the "abnormal-true" polarity. For -example, if the condition indicates when a policy is invalid, the "is valid" -case is probably the norm, so the condition should be called "Invalid". - -The thinking around conditions has evolved over time, so there are several -non-normative examples in wide use. - -In general, condition values may change back and forth, but some condition -transitions may be monotonic, depending on the resource and condition type. -However, conditions are observations and not, themselves, state machines, nor do -we define comprehensive state machines for objects, nor behaviors associated -with state transitions. The system is level-based rather than edge-triggered, -and should assume an Open World. - -An example of an oscillating condition type is `Ready` (despite it running -afoul of current guidance), which indicates the object was believed to be fully -operational at the time it was last probed. A possible monotonic condition -could be `Failed`. A `True` status for `Failed` would imply failure with no -retry. An object that was still active would generally not have a `Failed` -condition. - -Some resources in the v1 API contain fields called **`phase`**, and associated -`message`, `reason`, and other status fields. The pattern of using `phase` is -deprecated. Newer API types should use conditions instead. Phase was -essentially a state-machine enumeration field, that contradicted [system-design -principles](../design-proposals/architecture/principles.md#control-logic) and -hampered evolution, since [adding new enum values breaks backward -compatibility](api_changes.md). Rather than encouraging clients to infer -implicit properties from phases, we prefer to explicitly expose the individual -conditions that clients need to monitor. Conditions also have the benefit that -it is possible to create some conditions with uniform meaning across all -resource types, while still exposing others that are unique to specific -resource types. See [#7856](http://issues.k8s.io/7856) for more details and -discussion. - -In condition types, and everywhere else they appear in the API, **`Reason`** is -intended to be a one-word, CamelCase representation of the category of cause of -the current status, and **`Message`** is intended to be a human-readable phrase -or sentence, which may contain specific details of the individual occurrence. -`Reason` is intended to be used in concise output, such as one-line -`kubectl get` output, and in summarizing occurrences of causes, whereas -`Message` is intended to be presented to users in detailed status explanations, -such as `kubectl describe` output. - -Historical information status (e.g., last transition time, failure counts) is -only provided with reasonable effort, and is not guaranteed to not be lost. - -Status information that may be large (especially proportional in size to -collections of other resources, such as lists of references to other objects -- -see below) and/or rapidly changing, such as -[resource usage](../design-proposals/scheduling/resources.md#usage-data), should be put into separate -objects, with possibly a reference from the original object. This helps to -ensure that GETs and watch remain reasonably efficient for the majority of -clients, which may not need that data. - -Some resources report the `observedGeneration`, which is the `generation` most -recently observed by the component responsible for acting upon changes to the -desired state of the resource. This can be used, for instance, to ensure that -the reported status reflects the most recent desired status. - -#### References to related objects - -References to loosely coupled sets of objects, such as -[pods](https://kubernetes.io/docs/user-guide/pods/) overseen by a -[replication controller](https://kubernetes.io/docs/user-guide/replication-controller/), are usually -best referred to using a [label selector](https://kubernetes.io/docs/user-guide/labels/). In order to -ensure that GETs of individual objects remain bounded in time and space, these -sets may be queried via separate API queries, but will not be expanded in the -referring object's status. - -References to specific objects, especially specific resource versions and/or -specific fields of those objects, are specified using the `ObjectReference` type -(or other types representing strict subsets of it). Unlike partial URLs, the -ObjectReference type facilitates flexible defaulting of fields from the -referring object or other contextual information. - -References in the status of the referee to the referrer may be permitted, when -the references are one-to-one and do not need to be frequently updated, -particularly in an edge-based manner. - -#### Lists of named subobjects preferred over maps - -Discussed in [#2004](http://issue.k8s.io/2004) and elsewhere. There are no maps -of subobjects in any API objects. Instead, the convention is to use a list of -subobjects containing name fields. - -For example: - -```yaml -ports: - - name: www - containerPort: 80 -``` - -vs. - -```yaml -ports: - www: - containerPort: 80 -``` - -This rule maintains the invariant that all JSON/YAML keys are fields in API -objects. The only exceptions are pure maps in the API (currently, labels, -selectors, annotations, data), as opposed to sets of subobjects. - -#### Primitive types - -* Avoid floating-point values as much as possible, and never use them in spec. - Floating-point values cannot be reliably round-tripped (encoded and - re-decoded) without changing, and have varying precision and representations - across languages and architectures. -* All numbers (e.g., uint32, int64) are converted to float64 by Javascript and - some other languages, so any field which is expected to exceed that either in - magnitude or in precision (specifically integer values > 53 bits) should be - serialized and accepted as strings. -* Do not use unsigned integers, due to inconsistent support across languages and - libraries. Just validate that the integer is non-negative if that's the case. -* Do not use enums. Use aliases for string instead (e.g., `NodeConditionType`). -* Look at similar fields in the API (e.g., ports, durations) and follow the - conventions of existing fields. -* All public integer fields MUST use the Go `(u)int32` or Go `(u)int64` types, - not `(u)int` (which is ambiguous depending on target platform). Internal - types may use `(u)int`. -* Think twice about `bool` fields. Many ideas start as boolean but eventually - trend towards a small set of mutually exclusive options. Plan for future - expansions by describing the policy options explicitly as a string type - alias (e.g. `TerminationMessagePolicy`). - -#### Constants - -Some fields will have a list of allowed values (enumerations). These values will -be strings, and they will be in CamelCase, with an initial uppercase letter. -Examples: `ClusterFirst`, `Pending`, `ClientIP`. - -#### Unions - -Sometimes, at most one of a set of fields can be set. For example, the -[volumes] field of a PodSpec has 17 different volume type-specific fields, such -as `nfs` and `iscsi`. All fields in the set should be -[Optional](#optional-vs-required). - -Sometimes, when a new type is created, the api designer may anticipate that a -union will be needed in the future, even if only one field is allowed initially. -In this case, be sure to make the field [Optional](#optional-vs-required) -optional. In the validation, you may still return an error if the sole field is -unset. Do not set a default value for that field. - -### Lists and Simple kinds - -Every list or simple kind SHOULD have the following metadata in a nested object -field called "metadata": - -* resourceVersion: a string that identifies the common version of the objects -returned by in a list. This value MUST be treated as opaque by clients and -passed unmodified back to the server. A resource version is only valid within a -single namespace on a single kind of resource. - -Every simple kind returned by the server, and any simple kind sent to the server -that must support idempotency or optimistic concurrency should return this -value. Since simple resources are often used as input alternate actions that -modify objects, the resource version of the simple resource should correspond to -the resource version of the object. - - -## Differing Representations - -An API may represent a single entity in different ways for different clients, or -transform an object after certain transitions in the system occur. In these -cases, one request object may have two representations available as different -resources, or different kinds. - -An example is a Service, which represents the intent of the user to group a set -of pods with common behavior on common ports. When Kubernetes detects a pod -matches the service selector, the IP address and port of the pod are added to an -Endpoints resource for that Service. The Endpoints resource exists only if the -Service exists, but exposes only the IPs and ports of the selected pods. The -full service is represented by two distinct resources - under the original -Service resource the user created, as well as in the Endpoints resource. - -As another example, a "pod status" resource may accept a PUT with the "pod" -kind, with different rules about what fields may be changed. - -Future versions of Kubernetes may allow alternative encodings of objects beyond -JSON. - - -## Verbs on Resources - -API resources should use the traditional REST pattern: - -* GET /<resourceNamePlural> - Retrieve a list of type -<resourceName>, e.g. GET /pods returns a list of Pods. -* POST /<resourceNamePlural> - Create a new resource from the JSON object -provided by the client. -* GET /<resourceNamePlural>/<name> - Retrieves a single resource -with the given name, e.g. GET /pods/first returns a Pod named 'first'. Should be -constant time, and the resource should be bounded in size. -* DELETE /<resourceNamePlural>/<name> - Delete the single resource -with the given name. DeleteOptions may specify gracePeriodSeconds, the optional -duration in seconds before the object should be deleted. Individual kinds may -declare fields which provide a default grace period, and different kinds may -have differing kind-wide default grace periods. A user provided grace period -overrides a default grace period, including the zero grace period ("now"). -* PUT /<resourceNamePlural>/<name> - Update or create the resource -with the given name with the JSON object provided by the client. -* PATCH /<resourceNamePlural>/<name> - Selectively modify the -specified fields of the resource. See more information [below](#patch-operations). -* GET /<resourceNamePlural>&watch=true - Receive a stream of JSON -objects corresponding to changes made to any resource of the given kind over -time. - -### PATCH operations - -The API supports three different PATCH operations, determined by their -corresponding Content-Type header: - -* JSON Patch, `Content-Type: application/json-patch+json` - * As defined in [RFC6902](https://tools.ietf.org/html/rfc6902), a JSON Patch is -a sequence of operations that are executed on the resource, e.g. `{"op": "add", -"path": "/a/b/c", "value": [ "foo", "bar" ]}`. For more details on how to use -JSON Patch, see the RFC. -* Merge Patch, `Content-Type: application/merge-patch+json` - * As defined in [RFC7386](https://tools.ietf.org/html/rfc7386), a Merge Patch -is essentially a partial representation of the resource. The submitted JSON is -"merged" with the current resource to create a new one, then the new one is -saved. For more details on how to use Merge Patch, see the RFC. -* Strategic Merge Patch, `Content-Type: application/strategic-merge-patch+json` - * Strategic Merge Patch is a custom implementation of Merge Patch. For a -detailed explanation of how it works and why it needed to be introduced, see -below. - -#### Strategic Merge Patch - -Details of Strategic Merge Patch are covered [here](strategic-merge-patch.md). - -## Idempotency - -All compatible Kubernetes APIs MUST support "name idempotency" and respond with -an HTTP status code 409 when a request is made to POST an object that has the -same name as an existing object in the system. See -[the identifiers docs](https://kubernetes.io/docs/user-guide/identifiers/) for details. - -Names generated by the system may be requested using `metadata.generateName`. -GenerateName indicates that the name should be made unique by the server prior -to persisting it. A non-empty value for the field indicates the name will be -made unique (and the name returned to the client will be different than the name -passed). The value of this field will be combined with a unique suffix on the -server if the Name field has not been provided. The provided value must be valid -within the rules for Name, and may be truncated by the length of the suffix -required to make the value unique on the server. If this field is specified, and -Name is not present, the server will NOT return a 409 if the generated name -exists - instead, it will either return 201 Created or 504 with Reason -`ServerTimeout` indicating a unique name could not be found in the time -allotted, and the client should retry (optionally after the time indicated in -the Retry-After header). - -## Optional vs. Required - -Fields must be either optional or required. - -Optional fields have the following properties: - -- They have the `+optional` comment tag in Go. -- They are a pointer type in the Go definition (e.g. `bool *awesomeFlag`) or -have a built-in `nil` value (e.g. maps and slices). -- The API server should allow POSTing and PUTing a resource with this field -unset. - -In most cases, optional fields should also have the `omitempty` struct tag (the -`omitempty` option specifies that the field should be omitted from the json -encoding if the field has an empty value). However, If you want to have -different logic for an optional field which is not provided vs. provided with -empty values, do not use `omitempty` (e.g. https://github.com/kubernetes/kubernetes/issues/34641). - -Note that for backward compatibility, any field that has the `omitempty` struct -tag will considered to be optional but this may change in future and having -the `+optional` comment tag is highly recommended. - -Required fields have the opposite properties, namely: - -- They do not have an `+optional` comment tag. -- They do not have an `omitempty` struct tag. -- They are not a pointer type in the Go definition (e.g. `bool otherFlag`). -- The API server should not allow POSTing or PUTing a resource with this field -unset. - -Using the `+optional` or the `omitempty` tag causes OpenAPI documentation to -reflect that the field is optional. - -Using a pointer allows distinguishing unset from the zero value for that type. -There are some cases where, in principle, a pointer is not needed for an -optional field since the zero value is forbidden, and thus implies unset. There -are examples of this in the codebase. However: - -- it can be difficult for implementors to anticipate all cases where an empty -value might need to be distinguished from a zero value -- structs are not omitted from encoder output even where omitempty is specified, -which is messy; -- having a pointer consistently imply optional is clearer for users of the Go -language client, and any other clients that use corresponding types - -Therefore, we ask that pointers always be used with optional fields that do not -have a built-in `nil` value. - - -## Defaulting - -Default resource values are API version-specific, and they are applied during -the conversion from API-versioned declarative configuration to internal objects -representing the desired state (`Spec`) of the resource. Subsequent GETs of the -resource will include the default values explicitly. - -Incorporating the default values into the `Spec` ensures that `Spec` depicts the -full desired state so that it is easier for the system to determine how to -achieve the state, and for the user to know what to anticipate. - -API version-specific default values are set by the API server. - -## Late Initialization - -Late initialization is when resource fields are set by a system controller -after an object is created/updated. - -For example, the scheduler sets the `pod.spec.nodeName` field after the pod is -created. - -Late-initializers should only make the following types of modifications: - - Setting previously unset fields - - Adding keys to maps - - Adding values to arrays which have mergeable semantics -(`patchStrategy:"merge"` attribute in the type definition). - -These conventions: - 1. allow a user (with sufficient privilege) to override any system-default - behaviors by setting the fields that would otherwise have been defaulted. - 1. enables updates from users to be merged with changes made during late -initialization, using strategic merge patch, as opposed to clobbering the -change. - 1. allow the component which does the late-initialization to use strategic -merge patch, which facilitates composition and concurrency of such components. - -Although the apiserver Admission Control stage acts prior to object creation, -Admission Control plugins should follow the Late Initialization conventions -too, to allow their implementation to be later moved to a 'controller', or to -client libraries. - -## Concurrency Control and Consistency - -Kubernetes leverages the concept of *resource versions* to achieve optimistic -concurrency. All Kubernetes resources have a "resourceVersion" field as part of -their metadata. This resourceVersion is a string that identifies the internal -version of an object that can be used by clients to determine when objects have -changed. When a record is about to be updated, it's version is checked against a -pre-saved value, and if it doesn't match, the update fails with a StatusConflict -(HTTP status code 409). - -The resourceVersion is changed by the server every time an object is modified. -If resourceVersion is included with the PUT operation the system will verify -that there have not been other successful mutations to the resource during a -read/modify/write cycle, by verifying that the current value of resourceVersion -matches the specified value. - -The resourceVersion is currently backed by [etcd's -modifiedIndex](https://coreos.com/etcd/docs/latest/v2/api.html). -However, it's important to note that the application should *not* rely on the -implementation details of the versioning system maintained by Kubernetes. We may -change the implementation of resourceVersion in the future, such as to change it -to a timestamp or per-object counter. - -The only way for a client to know the expected value of resourceVersion is to -have received it from the server in response to a prior operation, typically a -GET. This value MUST be treated as opaque by clients and passed unmodified back -to the server. Clients should not assume that the resource version has meaning -across namespaces, different kinds of resources, or different servers. -Currently, the value of resourceVersion is set to match etcd's sequencer. You -could think of it as a logical clock the API server can use to order requests. -However, we expect the implementation of resourceVersion to change in the -future, such as in the case we shard the state by kind and/or namespace, or port -to another storage system. - -In the case of a conflict, the correct client action at this point is to GET the -resource again, apply the changes afresh, and try submitting again. This -mechanism can be used to prevent races like the following: - -``` -Client #1 Client #2 -GET Foo GET Foo -Set Foo.Bar = "one" Set Foo.Baz = "two" -PUT Foo PUT Foo -``` - -When these sequences occur in parallel, either the change to Foo.Bar or the -change to Foo.Baz can be lost. - -On the other hand, when specifying the resourceVersion, one of the PUTs will -fail, since whichever write succeeds changes the resourceVersion for Foo. - -resourceVersion may be used as a precondition for other operations (e.g., GET, -DELETE) in the future, such as for read-after-write consistency in the presence -of caching. - -"Watch" operations specify resourceVersion using a query parameter. It is used -to specify the point at which to begin watching the specified resources. This -may be used to ensure that no mutations are missed between a GET of a resource -(or list of resources) and a subsequent Watch, even if the current version of -the resource is more recent. This is currently the main reason that list -operations (GET on a collection) return resourceVersion. - - -## Serialization Format - -APIs may return alternative representations of any resource in response to an -Accept header or under alternative endpoints, but the default serialization for -input and output of API responses MUST be JSON. - -A protobuf encoding is also accepted for built-in resources. As proto is not -self-describing, there is an envelope wrapper which describes the type of -the contents. - -All dates should be serialized as RFC3339 strings. - -## Units - -Units must either be explicit in the field name (e.g., `timeoutSeconds`), or -must be specified as part of the value (e.g., `resource.Quantity`). Which -approach is preferred is TBD, though currently we use the `fooSeconds` -convention for durations. - -Duration fields must be represented as integer fields with units being -part of the field name (e.g. `leaseDurationSeconds`). We don't use Duration -in the API since that would require clients to implement go-compatible parsing. - -## Selecting Fields - -Some APIs may need to identify which field in a JSON object is invalid, or to -reference a value to extract from a separate resource. The current -recommendation is to use standard JavaScript syntax for accessing that field, -assuming the JSON object was transformed into a JavaScript object, without the -leading dot, such as `metadata.name`. - -Examples: - -* Find the field "current" in the object "state" in the second item in the array -"fields": `fields[1].state.current` - -## Object references - -Object references should either be called `fooName` if referring to an object of -kind `Foo` by just the name (within the current namespace, if a namespaced -resource), or should be called `fooRef`, and should contain a subset of the -fields of the `ObjectReference` type. - - -TODO: Plugins, extensions, nested kinds, headers - - -## HTTP Status codes - -The server will respond with HTTP status codes that match the HTTP spec. See the -section below for a breakdown of the types of status codes the server will send. - -The following HTTP status codes may be returned by the API. - -#### Success codes - -* `200 StatusOK` - * Indicates that the request completed successfully. -* `201 StatusCreated` - * Indicates that the request to create kind completed successfully. -* `204 StatusNoContent` - * Indicates that the request completed successfully, and the response contains -no body. - * Returned in response to HTTP OPTIONS requests. - -#### Error codes - -* `307 StatusTemporaryRedirect` - * Indicates that the address for the requested resource has changed. - * Suggested client recovery behavior: - * Follow the redirect. - - -* `400 StatusBadRequest` - * Indicates the requested is invalid. - * Suggested client recovery behavior: - * Do not retry. Fix the request. - - -* `401 StatusUnauthorized` - * Indicates that the server can be reached and understood the request, but -refuses to take any further action, because the client must provide -authorization. If the client has provided authorization, the server is -indicating the provided authorization is unsuitable or invalid. - * Suggested client recovery behavior: - * If the user has not supplied authorization information, prompt them for -the appropriate credentials. If the user has supplied authorization information, -inform them their credentials were rejected and optionally prompt them again. - - -* `403 StatusForbidden` - * Indicates that the server can be reached and understood the request, but -refuses to take any further action, because it is configured to deny access for -some reason to the requested resource by the client. - * Suggested client recovery behavior: - * Do not retry. Fix the request. - - -* `404 StatusNotFound` - * Indicates that the requested resource does not exist. - * Suggested client recovery behavior: - * Do not retry. Fix the request. - - -* `405 StatusMethodNotAllowed` - * Indicates that the action the client attempted to perform on the resource -was not supported by the code. - * Suggested client recovery behavior: - * Do not retry. Fix the request. - - -* `409 StatusConflict` - * Indicates that either the resource the client attempted to create already -exists or the requested update operation cannot be completed due to a conflict. - * Suggested client recovery behavior: - * * If creating a new resource: - * * Either change the identifier and try again, or GET and compare the -fields in the pre-existing object and issue a PUT/update to modify the existing -object. - * * If updating an existing resource: - * See `Conflict` from the `status` response section below on how to -retrieve more information about the nature of the conflict. - * GET and compare the fields in the pre-existing object, merge changes (if -still valid according to preconditions), and retry with the updated request -(including `ResourceVersion`). - - -* `410 StatusGone` - * Indicates that the item is no longer available at the server and no -forwarding address is known. - * Suggested client recovery behavior: - * Do not retry. Fix the request. - - -* `422 StatusUnprocessableEntity` - * Indicates that the requested create or update operation cannot be completed -due to invalid data provided as part of the request. - * Suggested client recovery behavior: - * Do not retry. Fix the request. - - -* `429 StatusTooManyRequests` - * Indicates that the either the client rate limit has been exceeded or the -server has received more requests then it can process. - * Suggested client recovery behavior: - * Read the `Retry-After` HTTP header from the response, and wait at least -that long before retrying. - - -* `500 StatusInternalServerError` - * Indicates that the server can be reached and understood the request, but -either an unexpected internal error occurred and the outcome of the call is -unknown, or the server cannot complete the action in a reasonable time (this may -be due to temporary server load or a transient communication issue with another -server). - * Suggested client recovery behavior: - * Retry with exponential backoff. - - -* `503 StatusServiceUnavailable` - * Indicates that required service is unavailable. - * Suggested client recovery behavior: - * Retry with exponential backoff. - - -* `504 StatusServerTimeout` - * Indicates that the request could not be completed within the given time. -Clients can get this response ONLY when they specified a timeout param in the -request. - * Suggested client recovery behavior: - * Increase the value of the timeout param and retry with exponential -backoff. - -## Response Status Kind - -Kubernetes will always return the `Status` kind from any API endpoint when an -error occurs. Clients SHOULD handle these types of objects when appropriate. - -A `Status` kind will be returned by the API in two cases: - * When an operation is not successful (i.e. when the server would return a non -2xx HTTP status code). - * When a HTTP `DELETE` call is successful. - -The status object is encoded as JSON and provided as the body of the response. -The status object contains fields for humans and machine consumers of the API to -get more detailed information for the cause of the failure. The information in -the status object supplements, but does not override, the HTTP status code's -meaning. When fields in the status object have the same meaning as generally -defined HTTP headers and that header is returned with the response, the header -should be considered as having higher priority. - -**Example:** - -```console -$ curl -v -k -H "Authorization: Bearer WhCDvq4VPpYhrcfmF6ei7V9qlbqTubUc" https://10.240.122.184:443/api/v1/namespaces/default/pods/grafana - -> GET /api/v1/namespaces/default/pods/grafana HTTP/1.1 -> User-Agent: curl/7.26.0 -> Host: 10.240.122.184 -> Accept: */* -> Authorization: Bearer WhCDvq4VPpYhrcfmF6ei7V9qlbqTubUc -> - -< HTTP/1.1 404 Not Found -< Content-Type: application/json -< Date: Wed, 20 May 2015 18:10:42 GMT -< Content-Length: 232 -< -{ - "kind": "Status", - "apiVersion": "v1", - "metadata": {}, - "status": "Failure", - "message": "pods \"grafana\" not found", - "reason": "NotFound", - "details": { - "name": "grafana", - "kind": "pods" - }, - "code": 404 -} -``` - -`status` field contains one of two possible values: -* `Success` -* `Failure` - -`message` may contain human-readable description of the error - -`reason` may contain a machine-readable, one-word, CamelCase description of why -this operation is in the `Failure` status. If this value is empty there is no -information available. The `reason` clarifies an HTTP status code but does not -override it. - -`details` may contain extended data associated with the reason. Each reason may -define its own extended details. This field is optional and the data returned is -not guaranteed to conform to any schema except that defined by the reason type. - -Possible values for the `reason` and `details` fields: -* `BadRequest` - * Indicates that the request itself was invalid, because the request doesn't -make any sense, for example deleting a read-only object. - * This is different than `status reason` `Invalid` above which indicates that -the API call could possibly succeed, but the data was invalid. - * API calls that return BadRequest can never succeed. - * Http status code: `400 StatusBadRequest` - - -* `Unauthorized` - * Indicates that the server can be reached and understood the request, but -refuses to take any further action without the client providing appropriate -authorization. If the client has provided authorization, this error indicates -the provided credentials are insufficient or invalid. - * Details (optional): - * `kind string` - * The kind attribute of the unauthorized resource (on some operations may -differ from the requested resource). - * `name string` - * The identifier of the unauthorized resource. - * HTTP status code: `401 StatusUnauthorized` - - -* `Forbidden` - * Indicates that the server can be reached and understood the request, but -refuses to take any further action, because it is configured to deny access for -some reason to the requested resource by the client. - * Details (optional): - * `kind string` - * The kind attribute of the forbidden resource (on some operations may -differ from the requested resource). - * `name string` - * The identifier of the forbidden resource. - * HTTP status code: `403 StatusForbidden` - - -* `NotFound` - * Indicates that one or more resources required for this operation could not -be found. - * Details (optional): - * `kind string` - * The kind attribute of the missing resource (on some operations may -differ from the requested resource). - * `name string` - * The identifier of the missing resource. - * HTTP status code: `404 StatusNotFound` - - -* `AlreadyExists` - * Indicates that the resource you are creating already exists. - * Details (optional): - * `kind string` - * The kind attribute of the conflicting resource. - * `name string` - * The identifier of the conflicting resource. - * HTTP status code: `409 StatusConflict` - -* `Conflict` - * Indicates that the requested update operation cannot be completed due to a -conflict. The client may need to alter the request. Each resource may define -custom details that indicate the nature of the conflict. - * HTTP status code: `409 StatusConflict` - - -* `Invalid` - * Indicates that the requested create or update operation cannot be completed -due to invalid data provided as part of the request. - * Details (optional): - * `kind string` - * the kind attribute of the invalid resource - * `name string` - * the identifier of the invalid resource - * `causes` - * One or more `StatusCause` entries indicating the data in the provided -resource that was invalid. The `reason`, `message`, and `field` attributes will -be set. - * HTTP status code: `422 StatusUnprocessableEntity` - - -* `Timeout` - * Indicates that the request could not be completed within the given time. -Clients may receive this response if the server has decided to rate limit the -client, or if the server is overloaded and cannot process the request at this -time. - * Http status code: `429 TooManyRequests` - * The server should set the `Retry-After` HTTP header and return -`retryAfterSeconds` in the details field of the object. A value of `0` is the -default. - - -* `ServerTimeout` - * Indicates that the server can be reached and understood the request, but -cannot complete the action in a reasonable time. This maybe due to temporary -server load or a transient communication issue with another server. - * Details (optional): - * `kind string` - * The kind attribute of the resource being acted on. - * `name string` - * The operation that is being attempted. - * The server should set the `Retry-After` HTTP header and return -`retryAfterSeconds` in the details field of the object. A value of `0` is the -default. - * Http status code: `504 StatusServerTimeout` - - -* `MethodNotAllowed` - * Indicates that the action the client attempted to perform on the resource -was not supported by the code. - * For instance, attempting to delete a resource that can only be created. - * API calls that return MethodNotAllowed can never succeed. - * Http status code: `405 StatusMethodNotAllowed` - - -* `InternalError` - * Indicates that an internal error occurred, it is unexpected and the outcome -of the call is unknown. - * Details (optional): - * `causes` - * The original error. - * Http status code: `500 StatusInternalServerError` `code` may contain the suggested HTTP return code for this status. - - -## Events - -Events are complementary to status information, since they can provide some -historical information about status and occurrences in addition to current or -previous status. Generate events for situations users or administrators should -be alerted about. - -Choose a unique, specific, short, CamelCase reason for each event category. For -example, `FreeDiskSpaceInvalid` is a good event reason because it is likely to -refer to just one situation, but `Started` is not a good reason because it -doesn't sufficiently indicate what started, even when combined with other event -fields. - -`Error creating foo` or `Error creating foo %s` would be appropriate for an -event message, with the latter being preferable, since it is more informational. - -Accumulate repeated events in the client, especially for frequent events, to -reduce data volume, load on the system, and noise exposed to users. - -## Naming conventions - -* Go field names must be CamelCase. JSON field names must be camelCase. Other -than capitalization of the initial letter, the two should almost always match. -No underscores nor dashes in either. -* Field and resource names should be declarative, not imperative (DoSomething, -SomethingDoer, DoneBy, DoneAt). -* Use `Node` where referring to -the node resource in the context of the cluster. Use `Host` where referring to -properties of the individual physical/virtual system, such as `hostname`, -`hostPath`, `hostNetwork`, etc. -* `FooController` is a deprecated kind naming convention. Name the kind after -the thing being controlled instead (e.g., `Job` rather than `JobController`). -* The name of a field that specifies the time at which `something` occurs should -be called `somethingTime`. Do not use `stamp` (e.g., `creationTimestamp`). -* We use the `fooSeconds` convention for durations, as discussed in the [units -subsection](#units). - * `fooPeriodSeconds` is preferred for periodic intervals and other waiting -periods (e.g., over `fooIntervalSeconds`). - * `fooTimeoutSeconds` is preferred for inactivity/unresponsiveness deadlines. - * `fooDeadlineSeconds` is preferred for activity completion deadlines. -* Do not use abbreviations in the API, except where they are extremely commonly -used, such as "id", "args", or "stdin". -* Acronyms should similarly only be used when extremely commonly known. All -letters in the acronym should have the same case, using the appropriate case for -the situation. For example, at the beginning of a field name, the acronym should -be all lowercase, such as "httpGet". Where used as a constant, all letters -should be uppercase, such as "TCP" or "UDP". -* The name of a field referring to another resource of kind `Foo` by name should -be called `fooName`. The name of a field referring to another resource of kind -`Foo` by ObjectReference (or subset thereof) should be called `fooRef`. -* More generally, include the units and/or type in the field name if they could -be ambiguous and they are not specified by the value or value type. -* The name of a field expressing a boolean property called 'fooable' should be -called `Fooable`, not `IsFooable`. - -### Namespace Names -* The name of a namespace must be a -[DNS_LABEL](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/architecture/identifiers.md). -* The `kube-` prefix is reserved for Kubernetes system namespaces, e.g. `kube-system` and `kube-public`. -* See -[the namespace docs](https://kubernetes.io/docs/user-guide/namespaces/) for more information. - -## Label, selector, and annotation conventions - -Labels are the domain of users. They are intended to facilitate organization and -management of API resources using attributes that are meaningful to users, as -opposed to meaningful to the system. Think of them as user-created mp3 or email -inbox labels, as opposed to the directory structure used by a program to store -its data. The former enables the user to apply an arbitrary ontology, whereas -the latter is implementation-centric and inflexible. Users will use labels to -select resources to operate on, display label values in CLI/UI columns, etc. -Users should always retain full power and flexibility over the label schemas -they apply to labels in their namespaces. - -However, we should support conveniences for common cases by default. For -example, what we now do in ReplicationController is automatically set the RC's -selector and labels to the labels in the pod template by default, if they are -not already set. That ensures that the selector will match the template, and -that the RC can be managed using the same labels as the pods it creates. Note -that once we generalize selectors, it won't necessarily be possible to -unambiguously generate labels that match an arbitrary selector. - -If the user wants to apply additional labels to the pods that it doesn't select -upon, such as to facilitate adoption of pods or in the expectation that some -label values will change, they can set the selector to a subset of the pod -labels. Similarly, the RC's labels could be initialized to a subset of the pod -template's labels, or could include additional/different labels. - -For disciplined users managing resources within their own namespaces, it's not -that hard to consistently apply schemas that ensure uniqueness. One just needs -to ensure that at least one value of some label key in common differs compared -to all other comparable resources. We could/should provide a verification tool -to check that. However, development of conventions similar to the examples in -[Labels](https://kubernetes.io/docs/user-guide/labels/) make uniqueness straightforward. Furthermore, -relatively narrowly used namespaces (e.g., per environment, per application) can -be used to reduce the set of resources that could potentially cause overlap. - -In cases where users could be running misc. examples with inconsistent schemas, -or where tooling or components need to programmatically generate new objects to -be selected, there needs to be a straightforward way to generate unique label -sets. A simple way to ensure uniqueness of the set is to ensure uniqueness of a -single label value, such as by using a resource name, uid, resource hash, or -generation number. - -Problems with uids and hashes, however, include that they have no semantic -meaning to the user, are not memorable nor readily recognizable, and are not -predictable. Lack of predictability obstructs use cases such as creation of a -replication controller from a pod, such as people want to do when exploring the -system, bootstrapping a self-hosted cluster, or deletion and re-creation of a -new RC that adopts the pods of the previous one, such as to rename it. -Generation numbers are more predictable and much clearer, assuming there is a -logical sequence. Fortunately, for deployments that's the case. For jobs, use of -creation timestamps is common internally. Users should always be able to turn -off auto-generation, in order to permit some of the scenarios described above. -Note that auto-generated labels will also become one more field that needs to be -stripped out when cloning a resource, within a namespace, in a new namespace, in -a new cluster, etc., and will need to be ignored around when updating a resource -via patch or read-modify-write sequence. - -Inclusion of a system prefix in a label key is fairly hostile to UX. A prefix is -only necessary in the case that the user cannot choose the label key, in order -to avoid collisions with user-defined labels. However, I firmly believe that the -user should always be allowed to select the label keys to use on their -resources, so it should always be possible to override default label keys. - -Therefore, resources supporting auto-generation of unique labels should have a -`uniqueLabelKey` field, so that the user could specify the key if they wanted -to, but if unspecified, it could be set by default, such as to the resource -type, like job, deployment, or replicationController. The value would need to be -at least spatially unique, and perhaps temporally unique in the case of job. - -Annotations have very different intended usage from labels. They are -primarily generated and consumed by tooling and system extensions, or are used -by end-users to engage non-standard behavior of components. For example, an -annotation might be used to indicate that an instance of a resource expects -additional handling by non-kubernetes controllers. Annotations may carry -arbitrary payloads, including JSON documents. Like labels, annotation keys can -be prefixed with a governing domain (e.g. `example.com/key-name`). Unprefixed -keys (e.g. `key-name`) are reserved for end-users. Third-party components must -use prefixed keys. Key prefixes under the "kubernetes.io" and "k8s.io" domains -are reserved for use by the kubernetes project and must not be used by -third-parties. - -In early versions of Kubernetes, some in-development features represented new -API fields as annotations, generally with the form `something.alpha.kubernetes.io/name` or -`something.beta.kubernetes.io/name` (depending on our confidence in it). This -pattern is deprecated. Some such annotations may still exist, but no new -annotations may be defined. New API fields are now developed as regular fields. - -Other advice regarding use of labels, annotations, taints, and other generic map keys by -Kubernetes components and tools: - - Key names should be all lowercase, with words separated by dashes instead of camelCase - - For instance, prefer `foo.kubernetes.io/foo-bar` over `foo.kubernetes.io/fooBar`, prefer - `desired-replicas` over `DesiredReplicas` - - Unprefixed keys are reserved for end-users. All other labels and annotations must be prefixed. - - Key prefixes under "kubernetes.io" and "k8s.io" are reserved for the Kubernetes - project. - - Such keys are effectively part of the kubernetes API and may be subject - to deprecation and compatibility policies. - - Key names, including prefixes, should be precise enough that a user could - plausibly understand where it came from and what it is for. - - Key prefixes should carry as much context as possible. - - For instance, prefer `subsystem.kubernetes.io/parameter` over `kubernetes.io/subsystem-parameter` - - Use annotations to store API extensions that the controller responsible for -the resource doesn't need to know about, experimental fields that aren't -intended to be generally used API fields, etc. Beware that annotations aren't -automatically handled by the API conversion machinery. - -## WebSockets and SPDY - -Some of the API operations exposed by Kubernetes involve transfer of binary -streams between the client and a container, including attach, exec, portforward, -and logging. The API therefore exposes certain operations over upgradeable HTTP -connections ([described in RFC 2817](https://tools.ietf.org/html/rfc2817)) via -the WebSocket and SPDY protocols. These actions are exposed as subresources with -their associated verbs (exec, log, attach, and portforward) and are requested -via a GET (to support JavaScript in a browser) and POST (semantically accurate). - -There are two primary protocols in use today: - -1. Streamed channels - - When dealing with multiple independent binary streams of data such as the -remote execution of a shell command (writing to STDIN, reading from STDOUT and -STDERR) or forwarding multiple ports the streams can be multiplexed onto a -single TCP connection. Kubernetes supports a SPDY based framing protocol that -leverages SPDY channels and a WebSocket framing protocol that multiplexes -multiple channels onto the same stream by prefixing each binary chunk with a -byte indicating its channel. The WebSocket protocol supports an optional -subprotocol that handles base64-encoded bytes from the client and returns -base64-encoded bytes from the server and character based channel prefixes ('0', -'1', '2') for ease of use from JavaScript in a browser. - -2. Streaming response - - The default log output for a channel of streaming data is an HTTP Chunked -Transfer-Encoding, which can return an arbitrary stream of binary data from the -server. Browser-based JavaScript is limited in its ability to access the raw -data from a chunked response, especially when very large amounts of logs are -returned, and in future API calls it may be desirable to transfer large files. -The streaming API endpoints support an optional WebSocket upgrade that provides -a unidirectional channel from the server to the client and chunks data as binary -WebSocket frames. An optional WebSocket subprotocol is exposed that base64 -encodes the stream before returning it to the client. - -Clients should use the SPDY protocols if their clients have native support, or -WebSockets as a fallback. Note that WebSockets is susceptible to Head-of-Line -blocking and so clients must read and process each message sequentially. In -the future, an HTTP/2 implementation will be exposed that deprecates SPDY. - - -## Validation - -API objects are validated upon receipt by the apiserver. Validation errors are -flagged and returned to the caller in a `Failure` status with `reason` set to -`Invalid`. In order to facilitate consistent error messages, we ask that -validation logic adheres to the following guidelines whenever possible (though -exceptional cases will exist). - -* Be as precise as possible. -* Telling users what they CAN do is more useful than telling them what they -CANNOT do. -* When asserting a requirement in the positive, use "must". Examples: "must be -greater than 0", "must match regex '[a-z]+'". Words like "should" imply that -the assertion is optional, and must be avoided. -* When asserting a formatting requirement in the negative, use "must not". -Example: "must not contain '..'". Words like "should not" imply that the -assertion is optional, and must be avoided. -* When asserting a behavioral requirement in the negative, use "may not". -Examples: "may not be specified when otherField is empty", "only `name` may be -specified". -* When referencing a literal string value, indicate the literal in -single-quotes. Example: "must not contain '..'". -* When referencing another field name, indicate the name in back-quotes. -Example: "must be greater than `request`". -* When specifying inequalities, use words rather than symbols. Examples: "must -be less than 256", "must be greater than or equal to 0". Do not use words -like "larger than", "bigger than", "more than", "higher than", etc. -* When specifying numeric ranges, use inclusive ranges when possible. +This file has moved to https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md. +This file is a placeholder to preserve links. Please remove after 2019-07-01 or the release of kubernetes 1.15, whichever comes first. diff --git a/contributors/devel/api_changes.md b/contributors/devel/api_changes.md index 1f46d298..fa654177 100644 --- a/contributors/devel/api_changes.md +++ b/contributors/devel/api_changes.md @@ -1,1007 +1,3 @@ -*This document is oriented at developers who want to change existing APIs. -A set of API conventions, which applies to new APIs and to changes, can be -found at [API Conventions](api-conventions.md). +This file has moved to https://git.k8s.io/community/contributors/devel/sig-architecture/api_changes.md. -**Table of Contents** - -- [So you want to change the API?](#so-you-want-to-change-the-api) - - [Operational overview](#operational-overview) - - [On compatibility](#on-compatibility) - - [Backward compatibility gotchas](#backward-compatibility-gotchas) - - [Incompatible API changes](#incompatible-api-changes) - - [Changing versioned APIs](#changing-versioned-apis) - - [Edit types.go](#edit-typesgo) - - [Edit defaults.go](#edit-defaultsgo) - - [Edit conversion.go](#edit-conversiongo) - - [Changing the internal structures](#changing-the-internal-structures) - - [Edit types.go](#edit-typesgo-1) - - [Edit validation.go](#edit-validationgo) - - [Edit version conversions](#edit-version-conversions) - - [Generate protobuf objects](#generate-protobuf-objects) - - [Edit json (un)marshaling code](#edit-json-unmarshaling-code) - - [Making a new API Version](#making-a-new-api-version) - - [Making a new API Group](#making-a-new-api-group) - - [Update the fuzzer](#update-the-fuzzer) - - [Update the semantic comparisons](#update-the-semantic-comparisons) - - [Implement your change](#implement-your-change) - - [Write end-to-end tests](#write-end-to-end-tests) - - [Examples and docs](#examples-and-docs) - - [Alpha, Beta, and Stable Versions](#alpha-beta-and-stable-versions) - - [Adding Unstable Features to Stable Versions](#adding-unstable-features-to-stable-versions) - - -# So you want to change the API? - -Before attempting a change to the API, you should familiarize yourself with a -number of existing API types and with the [API conventions](api-conventions.md). -If creating a new API type/resource, we also recommend that you first send a PR -containing just a proposal for the new API types. - -The Kubernetes API has two major components - the internal structures and -the versioned APIs. The versioned APIs are intended to be stable, while the -internal structures are implemented to best reflect the needs of the Kubernetes -code itself. - -What this means for API changes is that you have to be somewhat thoughtful in -how you approach changes, and that you have to touch a number of pieces to make -a complete change. This document aims to guide you through the process, though -not all API changes will need all of these steps. - -## Operational overview - -It is important to have a high level understanding of the API system used in -Kubernetes in order to navigate the rest of this document. - -As mentioned above, the internal representation of an API object is decoupled -from any one API version. This provides a lot of freedom to evolve the code, -but it requires robust infrastructure to convert between representations. There -are multiple steps in processing an API operation - even something as simple as -a GET involves a great deal of machinery. - -The conversion process is logically a "star" with the internal form at the -center. Every versioned API can be converted to the internal form (and -vice-versa), but versioned APIs do not convert to other versioned APIs directly. -This sounds like a heavy process, but in reality we do not intend to keep more -than a small number of versions alive at once. While all of the Kubernetes code -operates on the internal structures, they are always converted to a versioned -form before being written to storage (disk or etcd) or being sent over a wire. -Clients should consume and operate on the versioned APIs exclusively. - -To demonstrate the general process, here is a (hypothetical) example: - - 1. A user POSTs a `Pod` object to `/api/v7beta1/...` - 2. The JSON is unmarshalled into a `v7beta1.Pod` structure - 3. Default values are applied to the `v7beta1.Pod` - 4. The `v7beta1.Pod` is converted to an `api.Pod` structure - 5. The `api.Pod` is validated, and any errors are returned to the user - 6. The `api.Pod` is converted to a `v6.Pod` (because v6 is the latest stable -version) - 7. The `v6.Pod` is marshalled into JSON and written to etcd - -Now that we have the `Pod` object stored, a user can GET that object in any -supported api version. For example: - - 1. A user GETs the `Pod` from `/api/v5/...` - 2. The JSON is read from etcd and unmarshalled into a `v6.Pod` structure - 3. Default values are applied to the `v6.Pod` - 4. The `v6.Pod` is converted to an `api.Pod` structure - 5. The `api.Pod` is converted to a `v5.Pod` structure - 6. The `v5.Pod` is marshalled into JSON and sent to the user - -The implication of this process is that API changes must be done carefully and -backward-compatibly. - -## On compatibility - -Before talking about how to make API changes, it is worthwhile to clarify what -we mean by API compatibility. Kubernetes considers forwards and backwards -compatibility of its APIs a top priority. Compatibility is *hard*, especially -handling issues around rollback-safety. This is something every API change -must consider. - -An API change is considered compatible if it: - - * adds new functionality that is not required for correct behavior (e.g., -does not add a new required field) - * does not change existing semantics, including: - * the semantic meaning of default values *and behavior* - * interpretation of existing API types, fields, and values - * which fields are required and which are not - * mutable fields do not become immutable - * valid values do not become invalid - * explicitly invalid values do not become valid - -Put another way: - -1. Any API call (e.g. a structure POSTed to a REST endpoint) that succeeded -before your change must succeed after your change. -2. Any API call that does not use your change must behave the same as it did -before your change. -3. Any API call that uses your change must not cause problems (e.g. crash or -degrade behavior) when issued against an API servers that do not include your -change. -4. It must be possible to round-trip your change (convert to different API -versions and back) with no loss of information. -5. Existing clients need not be aware of your change in order for them to -continue to function as they did previously, even when your change is in use. -6. It must be possible to rollback to a previous version of API server that -does not include your change and have no impact on API objects which do not use -your change. API objects that use your change will be impacted in case of a -rollback. - -If your change does not meet these criteria, it is not considered compatible, -and may break older clients, or result in newer clients causing undefined -behavior. Such changes are generally disallowed, though exceptions have been -made in extreme cases (e.g. security or obvious bugs). - -Let's consider some examples. - -In a hypothetical API (assume we're at version v6), the `Frobber` struct looks -something like this: - -```go -// API v6. -type Frobber struct { - Height int `json:"height"` - Param string `json:"param"` -} -``` - -You want to add a new `Width` field. It is generally allowed to add new fields -without changing the API version, so you can simply change it to: - -```go -// Still API v6. -type Frobber struct { - Height int `json:"height"` - Width int `json:"width"` - Param string `json:"param"` -} -``` - -The onus is on you to define a sane default value for `Width` such that rules -#1 and #2 above are true - API calls and stored objects that used to work must -continue to work. - -For your next change you want to allow multiple `Param` values. You can not -simply remove `Param string` and add `Params []string` (without creating a -whole new API version) - that fails rules #1, #2, #3, and #6. Nor can you -simply add `Params []string` and use it instead - that fails #2 and #6. - -You must instead define a new field and the relationship between that field and -the existing field(s). Start by adding the new plural field: - -```go -// Still API v6. -type Frobber struct { - Height int `json:"height"` - Width int `json:"width"` - Param string `json:"param"` // the first param - Params []string `json:"params"` // all of the params -} -``` - -This new field must be inclusive of the singular field. In order to satisfy -the compatibility rules you must handle all the cases of version skew, multiple -clients, and rollbacks. This can be handled by defaulting or admission control -logic linking the fields together with context from the API operation to get as -close as possible to the user's intentions. - -Upon any mutating API operation: - * If only the singular field is specified (e.g. an older client), API logic - must populate plural[0] from the singular value, and de-dup the plural - field. - * If only the plural field is specified (e.g. a newer client), API logic must - populate the singular value from plural[0]. - * If both the singular and plural fields are specified, API logic must - validate that the singular value matches plural[0]. - * Any other case is an error and must be rejected. - -For this purpose "is specified" means the following: - * On a create or patch operation: the field is present in the user-provided input - * On an update operation: the field is present and has changed from the - current value - -Older clients that only know the singular field will continue to succeed and -produce the same results as before the change. Newer clients can use your -change without impacting older clients. The API server can be rolled back and -only objects that use your change will be impacted. - -Part of the reason for versioning APIs and for using internal types that are -distinct from any one version is to handle growth like this. The internal -representation can be implemented as: - -```go -// Internal, soon to be v7beta1. -type Frobber struct { - Height int - Width int - Params []string -} -``` - -The code that converts to/from versioned APIs can decode this into the -compatible structure. Eventually, a new API version, e.g. v7beta1, -will be forked and it can drop the singular field entirely. - -We've seen how to satisfy rules #1, #2, and #3. Rule #4 means that you can not -extend one versioned API without also extending the others. For example, an -API call might POST an object in API v7beta1 format, which uses the cleaner -`Params` field, but the API server might store that object in trusty old v6 -form (since v7beta1 is "beta"). When the user reads the object back in the -v7beta1 API it would be unacceptable to have lost all but `Params[0]`. This -means that, even though it is ugly, a compatible change must be made to the v6 -API, as above. - -For some changes, this can be challenging to do correctly. It may require multiple -representations of the same information in the same API resource, which need to -be kept in sync should either be changed. - -For example, let's say you decide to rename a field within the same API -version. In this case, you add units to `height` and `width`. You implement -this by adding new fields: - -```go -type Frobber struct { - Height *int `json:"height"` - Width *int `json:"width"` - HeightInInches *int `json:"heightInInches"` - WidthInInches *int `json:"widthInInches"` -} -``` - -You convert all of the fields to pointers in order to distinguish between unset -and set to 0, and then set each corresponding field from the other in the -defaulting logic (e.g. `heightInInches` from `height`, and vice versa). That -works fine when the user creates a sends a hand-written configuration -- -clients can write either field and read either field. - -But what about creation or update from the output of a GET, or update via PATCH -(see [In-place updates](https://kubernetes.io/docs/user-guide/managing-deployments/#in-place-updates-of-resources))? -In these cases, the two fields will conflict, because only one field would be -updated in the case of an old client that was only aware of the old field -(e.g. `height`). - -Suppose the client creates: - -```json -{ - "height": 10, - "width": 5 -} -``` - -and GETs: - -```json -{ - "height": 10, - "heightInInches": 10, - "width": 5, - "widthInInches": 5 -} -``` - -then PUTs back: - -```json -{ - "height": 13, - "heightInInches": 10, - "width": 5, - "widthInInches": 5 -} -``` - -As per the compatibility rules, the update must not fail, because it would have -worked before the change. - -## Backward compatibility gotchas - -* A single feature/property cannot be represented using multiple spec fields - simultaneously within an API version. Only one representation can be - populated at a time, and the client needs to be able to specify which field - they expect to use (typically via API version), on both mutation and read. As - above, older clients must continue to function properly. - -* A new representation, even in a new API version, that is more expressive than an - old one breaks backward compatibility, since clients that only understood the - old representation would not be aware of the new representation nor its - semantics. Examples of proposals that have run into this challenge include - [generalized label selectors](http://issues.k8s.io/341) and [pod-level security context](http://prs.k8s.io/12823). - -* Enumerated values cause similar challenges. Adding a new value to an enumerated set - is *not* a compatible change. Clients which assume they know how to handle all possible - values of a given field will not be able to handle the new values. However, removing a - value from an enumerated set *can* be a compatible change, if handled properly (treat the - removed value as deprecated but allowed). For enumeration-like fields that expect to add - new values in the future, such as `reason` fields, please document that expectation clearly - in the API field descriptions. Clients should treat such sets of values as potentially - open-ended. - -* For [Unions](api-conventions.md#unions), sets of fields where at most one should - be set, it is acceptable to add a new option to the union if the [appropriate - conventions](api-conventions.md#objects) were followed in the original object. - Removing an option requires following the [deprecation process](https://kubernetes.io/docs/reference/deprecation-policy/). - -* Changing any validation rules always has the potential of breaking some client, since it changes the - assumptions about part of the API, similar to adding new enum values. Validation rules on spec fields can - neither be relaxed nor strengthened. Strengthening cannot be permitted because any requests that previously - worked must continue to work. Weakening validation has the potential to break other consumers and generators - of the API resource. Status fields whose writers are under our control (e.g., written by non-pluggable - controllers), may potentially tighten validation, since that would cause a subset of previously valid - values to be observable by clients. - -* Do not add a new API version of an existing resource and make it the preferred version in the same - release, and do not make it the storage version. The latter is necessary so that a rollback of the - apiserver doesn't render resources in etcd undecodable after rollback. - -* Any field with a default value in one API version must have a *non-nil* default - value in all API versions. This can be split into 2 cases: - * Adding a new API version with a default value for an existing non-defaulted - field: it is required to add a default value semantically equivalent to - being unset in all previous API versions, to preserve the semantic meaning - of the value being unset. - * Adding a new field with a default value: the default values must be - semantically equivalent in all currently supported API versions. - -## Incompatible API changes - -There are times when incompatible changes might be OK, but mostly we want -changes that meet the above definitions. If you think you need to break -compatibility, you should talk to the Kubernetes API reviewers first. - -Breaking compatibility of a beta or stable API version, such as v1, is -unacceptable. Compatibility for experimental or alpha APIs is not strictly -required, but breaking compatibility should not be done lightly, as it disrupts -all users of the feature. Alpha and beta API versions may be deprecated and -eventually removed wholesale, as described in the [deprecation policy](https://kubernetes.io/docs/reference/deprecation-policy/). - -If your change is going to be backward incompatible or might be a breaking -change for API consumers, please send an announcement to -`kubernetes-dev@googlegroups.com` before the change gets in. If you are unsure, -ask. Also make sure that the change gets documented in the release notes for the -next release by labeling the PR with the "release-note-action-required" github label. - -If you found that your change accidentally broke clients, it should be reverted. - -In short, the expected API evolution is as follows: - -* `newapigroup/v1alpha1` -> ... -> `newapigroup/v1alphaN` -> -* `newapigroup/v1beta1` -> ... -> `newapigroup/v1betaN` -> -* `newapigroup/v1` -> -* `newapigroup/v2alpha1` -> ... - -While in alpha we expect to move forward with it, but may break it. - -Once in beta we will preserve forward compatibility, but may introduce new -versions and delete old ones. - -v1 must be backward-compatible for an extended length of time. - -## Changing versioned APIs - -For most changes, you will probably find it easiest to change the versioned -APIs first. This forces you to think about how to make your change in a -compatible way. Rather than doing each step in every version, it's usually -easier to do each versioned API one at a time, or to do all of one version -before starting "all the rest". - -### Edit types.go - -The struct definitions for each API are in -`staging/src/k8s.io/api/<group>/<version>/types.go`. Edit those files to reflect -the change you want to make. Note that all types and non-inline fields in -versioned APIs must be preceded by descriptive comments - these are used to -generate documentation. Comments for types should not contain the type name; API -documentation is generated from these comments and end-users should not be -exposed to golang type names. - -For types that need the generated -[DeepCopyObject](https://github.com/kubernetes/kubernetes/commit/8dd0989b395b29b872e1f5e06934721863e4a210#diff-6318847735efb6fae447e7dbf198c8b2R3767) -methods, usually only required by the top-level types like `Pod`, add this line -to the comment -([example](https://github.com/kubernetes/kubernetes/commit/39d95b9b065fffebe5b6f233d978fe1723722085#diff-ab819c2e7a94a3521aecf6b477f9b2a7R30)): - -```golang - // +k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object -``` - -Optional fields should have the `,omitempty` json tag; fields are interpreted as -being required otherwise. - -### Edit defaults.go - -If your change includes new fields for which you will need default values, you -need to add cases to `pkg/apis/<group>/<version>/defaults.go`. - -**Note:** When adding default values to new fields, you *must* also add default -values in all API versions, instead of leaving new fields unset (e.g. `nil`) in -old API versions. This is required because defaulting happens whenever a -serialized version is read (see [#66135]). When possible, pick meaningful values -as sentinels for unset values. - -In the past the core v1 API -was special. Its `defaults.go` used to live at `pkg/api/v1/defaults.go`. -If you see code referencing that path, you can be sure its outdated. Now the core v1 api lives at -`pkg/apis/core/v1/defaults.go` which follows the above convention. - -Of course, since you have added code, you have to add a test: -`pkg/apis/<group>/<version>/defaults_test.go`. - -Do use pointers to scalars when you need to distinguish between an unset value -and an automatic zero value. For example, -`PodSpec.TerminationGracePeriodSeconds` is defined as `*int64` the go type -definition. A zero value means 0 seconds, and a nil value asks the system to -pick a default. - -Don't forget to run the tests! - -[#66135]: https://github.com/kubernetes/kubernetes/issues/66135 - -### Edit conversion.go - -Given that you have not yet changed the internal structs, this might feel -premature, and that's because it is. You don't yet have anything to convert to -or from. We will revisit this in the "internal" section. If you're doing this -all in a different order (i.e. you started with the internal structs), then you -should jump to that topic below. In the very rare case that you are making an -incompatible change you might or might not want to do this now, but you will -have to do more later. The files you want are -`pkg/apis/<group>/<version>/conversion.go` and -`pkg/apis/<group>/<version>/conversion_test.go`. - -Note that the conversion machinery doesn't generically handle conversion of -values, such as various kinds of field references and API constants. [The client -library](https://github.com/kubernetes/client-go/blob/v4.0.0-beta.0/rest/request.go#L352) -has custom conversion code for field references. You also need to add a call to -`AddFieldLabelConversionFunc` of your scheme with a mapping function that -understands supported translations, like this -[line](https://github.com/kubernetes/kubernetes/blob/v1.8.0-alpha.2/pkg/api/v1/conversion.go#L165). - -## Changing the internal structures - -Now it is time to change the internal structs so your versioned changes can be -used. - -### Edit types.go - -Similar to the versioned APIs, the definitions for the internal structs are in -`pkg/apis/<group>/types.go`. Edit those files to reflect the change you want to -make. Keep in mind that the internal structs must be able to express *all* of -the versioned APIs. - -Similar to the versioned APIs, you need to add the `+k8s:deepcopy-gen` tag to -types that need generated DeepCopyObject methods. - -## Edit validation.go - -Most changes made to the internal structs need some form of input validation. -Validation is currently done on internal objects in -`pkg/apis/<group>/validation/validation.go`. This validation is the one of the -first opportunities we have to make a great user experience - good error -messages and thorough validation help ensure that users are giving you what you -expect and, when they don't, that they know why and how to fix it. Think hard -about the contents of `string` fields, the bounds of `int` fields and the -optionality of fields. - -Of course, code needs tests - `pkg/apis/<group>/validation/validation_test.go`. - -## Edit version conversions - -At this point you have both the versioned API changes and the internal -structure changes done. If there are any notable differences - field names, -types, structural change in particular - you must add some logic to convert -versioned APIs to and from the internal representation. If you see errors from -the `serialization_test`, it may indicate the need for explicit conversions. - -Performance of conversions very heavily influence performance of apiserver. -Thus, we are auto-generating conversion functions that are much more efficient -than the generic ones (which are based on reflections and thus are highly -inefficient). - -The conversion code resides with each versioned API. There are two files: - - - `pkg/apis/<group>/<version>/conversion.go` containing manually written - conversion functions - - `pkg/apis/<group>/<version>/zz_generated.conversion.go` containing - auto-generated conversion functions - -Since auto-generated conversion functions are using manually written ones, -those manually written should be named with a defined convention, i.e. a -function converting type `X` in pkg `a` to type `Y` in pkg `b`, should be named: -`convert_a_X_To_b_Y`. - -Also note that you can (and for efficiency reasons should) use auto-generated -conversion functions when writing your conversion functions. - -Adding manually written conversion also requires you to add tests to -`pkg/apis/<group>/<version>/conversion_test.go`. - -Once all the necessary manually written conversions are added, you need to -regenerate auto-generated ones. To regenerate them run: - -```sh -make clean && make generated_files -``` - -`make clean` is important, otherwise the generated files might be stale, because -the build system uses custom cache. - -`make all` will invoke `make generated_files` as well. - -The `make generated_files` will also regenerate the `zz_generated.deepcopy.go`, -`zz_generated.defaults.go`, and `api/openapi-spec/swagger.json`. - -If regeneration is somehow not possible due to compile errors, the easiest -workaround is to remove the files causing errors and rerun the command. - -## Generate Code - -Apart from the `defaulter-gen`, `deepcopy-gen`, `conversion-gen` and -`openapi-gen`, there are a few other generators: - - `go-to-protobuf` - - `client-gen` - - `lister-gen` - - `informer-gen` - - `codecgen` (for fast json serialization with ugorji codec) - -Many of the generators are based on -[`gengo`](https://github.com/kubernetes/gengo) and share common -flags. The `--verify-only` flag will check the existing files on disk -and fail if they are not what would have been generated. - -The generators that create go code have a `--go-header-file` flag -which should be a file that contains the header that should be -included. This header is the copyright that should be present at the -top of the generated file and should be checked with the -[`repo-infra/verify/verify-boilerplane.sh`](https://git.k8s.io/repo-infra/verify/verify-boilerplate.sh) -script at a later stage of the build. - -To invoke these generators, you can run `make update`, which runs a bunch of -[scripts](https://github.com/kubernetes/kubernetes/blob/v1.8.0-alpha.2/hack/update-all.sh#L63-L78). -Please continue to read the next a few sections, because some generators have -prerequisites, also because they introduce how to invoke the generators -individually if you find `make update` takes too long to run. - -### Generate protobuf objects - -For any core API object, we also need to generate the Protobuf IDL and marshallers. -That generation is invoked with - -```sh -hack/update-generated-protobuf.sh -``` - -The vast majority of objects will not need any consideration when converting -to protobuf, but be aware that if you depend on a Golang type in the standard -library there may be additional work required, although in practice we typically -use our own equivalents for JSON serialization. The `pkg/api/serialization_test.go` -will verify that your protobuf serialization preserves all fields - be sure to -run it several times to ensure there are no incompletely calculated fields. - -### Generate Clientset - -`client-gen` is a tool to generate clientsets for top-level API objects. - -`client-gen` requires the `// +genclient` annotation on each -exported type in both the internal `pkg/apis/<group>/types.go` as well as each -specifically versioned `staging/src/k8s.io/api/<group>/<version>/types.go`. - -If the apiserver hosts your API under a different group name than the `<group>` -in the filesystem, (usually this is because the `<group>` in the filesystem -omits the "k8s.io" suffix, e.g., admission vs. admission.k8s.io), you can -instruct the `client-gen` to use the correct group name by adding the `// -+groupName=` annotation in the `doc.go` in both the internal -`pkg/apis/<group>/doc.go` as well as in each specifically versioned -`staging/src/k8s.io/api/<group>/<version>/types.go`. - -Once you added the annotations, generate the client with - -```sh -hack/update-codegen.sh -``` - -Note that you can use the optional `// +groupGoName=` to specify a CamelCase -custom Golang identifier to de-conflict e.g. `policy.authorization.k8s.io` and -`policy.k8s.io`. These two would both map to `Policy()` in clientsets. - -client-gen is flexible. See [this document](generating-clientset.md) if you need -client-gen for non-kubernetes API. - -### Generate Listers - -`lister-gen` is a tool to generate listers for a client. It reuses the -`//+genclient` and the `// +groupName=` annotations, so you do not need to -specify extra annotations. - -Your previous run of `hack/update-codegen.sh` has invoked `lister-gen`. - -### Generate Informers - -`informer-gen` generates the very useful Informers which watch API -resources for changes. It reuses the `//+genclient` and the -`//+groupName=` annotations, so you do not need to specify extra annotations. - -Your previous run of `hack/update-codegen.sh` has invoked `informer-gen`. - -### Edit json (un)marshaling code - -We are auto-generating code for marshaling and unmarshaling json representation -of api objects - this is to improve the overall system performance. - -The auto-generated code resides with each versioned API: - - - `staging/src/k8s.io/api/<group>/<version>/generated.proto` - - `staging/src/k8s.io/api/<group>/<version>/generated.pb.go` - -To regenerate them run: - -```sh -hack/update-generated-protobuf.sh -``` - -## Making a new API Version - -This section is under construction, as we make the tooling completely generic. - -If you are adding a new API version to an existing group, you can copy the -structure of the existing `pkg/apis/<group>/<existing-version>` and -`staging/src/k8s.io/api/<group>/<existing-version>` directories. - -Due to the fast changing nature of the project, the following content is probably out-dated: -* You can control if the version is enabled by default by update -[pkg/master/master.go](https://github.com/kubernetes/kubernetes/blob/v1.8.0-alpha.2/pkg/master/master.go#L381). -* You must add the new version to - [pkg/apis/group_name/install/install.go](https://github.com/kubernetes/kubernetes/blob/v1.8.0-alpha.2/pkg/apis/apps/install/install.go). -* You must add the new version to - [hack/lib/init.sh#KUBE_AVAILABLE_GROUP_VERSIONS](https://github.com/kubernetes/kubernetes/blob/v1.8.0-alpha.2/hack/lib/init.sh#L53). -* You must add the new version to - [hack/update-generated-protobuf-dockerized.sh](https://github.com/kubernetes/kubernetes/blob/v1.8.2/hack/update-generated-protobuf-dockerized.sh#L44) - to generate protobuf IDL and marshallers. -* You must add the new version to - [cmd/kube-apiserver/app#apiVersionPriorities](https://github.com/kubernetes/kubernetes/blob/v1.8.0-alpha.2/cmd/kube-apiserver/app/aggregator.go#L172) -* You must setup storage for the new version in - [pkg/registry/group_name/rest](https://github.com/kubernetes/kubernetes/blob/v1.8.0-alpha.2/pkg/registry/authentication/rest/storage_authentication.go) - -You need to regenerate the generated code as instructed in the sections above. - -## Making a new API Group - -You'll have to make a new directory under `pkg/apis/` and -`staging/src/k8s.io/api`; copy the directory structure of an existing API group, -e.g. `pkg/apis/authentication` and `staging/src/k8s.io/api/authentication`; -replace "authentication" with your group name and replace versions with your -versions; replace the API kinds in -[versioned](https://github.com/kubernetes/kubernetes/blob/v1.8.0-alpha.2/staging/src/k8s.io/api/authentication/v1/register.go#L47) -and -[internal](https://github.com/kubernetes/kubernetes/blob/v1.8.0-alpha.2/pkg/apis/authentication/register.go#L47) -register.go, and -[install.go](https://github.com/kubernetes/kubernetes/blob/v1.8.0-alpha.2/pkg/apis/authentication/install/install.go#L43) -with your kinds. - -You'll have to add your API group/version to a few places in the code base, as -noted in [Making a new API Version](#making-a-new-api-version) section. - -You need to regenerate the generated code as instructed in the sections above. - -## Update the fuzzer - -Part of our testing regimen for APIs is to "fuzz" (fill with random values) API -objects and then convert them to and from the different API versions. This is -a great way of exposing places where you lost information or made bad -assumptions. If you have added any fields which need very careful formatting -(the test does not run validation) or if you have made assumptions such as -"this slice will always have at least 1 element", you may get an error or even -a panic from the `serialization_test`. If so, look at the diff it produces (or -the backtrace in case of a panic) and figure out what you forgot. Encode that -into the fuzzer's custom fuzz functions. Hint: if you added defaults for a -field, that field will need to have a custom fuzz function that ensures that the -field is fuzzed to a non-empty value. - -The fuzzer can be found in `pkg/api/testing/fuzzer.go`. - -## Update the semantic comparisons - -VERY VERY rarely is this needed, but when it hits, it hurts. In some rare cases -we end up with objects (e.g. resource quantities) that have morally equivalent -values with different bitwise representations (e.g. value 10 with a base-2 -formatter is the same as value 0 with a base-10 formatter). The only way Go -knows how to do deep-equality is through field-by-field bitwise comparisons. -This is a problem for us. - -The first thing you should do is try not to do that. If you really can't avoid -this, I'd like to introduce you to our `apiequality.Semantic.DeepEqual` routine. -It supports custom overrides for specific types - you can find that in -`pkg/api/helper/helpers.go`. - -There's one other time when you might have to touch this: `unexported fields`. -You see, while Go's `reflect` package is allowed to touch `unexported fields`, -us mere mortals are not - this includes `apiequality.Semantic.DeepEqual`. -Fortunately, most of our API objects are "dumb structs" all the way down - all -fields are exported (start with a capital letter) and there are no unexported -fields. But sometimes you want to include an object in our API that does have -unexported fields somewhere in it (for example, `time.Time` has unexported fields). -If this hits you, you may have to touch the `apiequality.Semantic.DeepEqual` -customization functions. - -## Implement your change - -Now you have the API all changed - go implement whatever it is that you're -doing! - -## Write end-to-end tests - -Check out the [E2E docs](e2e-tests.md) for detailed information about how to -write end-to-end tests for your feature. - -## Examples and docs - -At last, your change is done, all unit tests pass, e2e passes, you're done, -right? Actually, no. You just changed the API. If you are touching an existing -facet of the API, you have to try *really* hard to make sure that *all* the -examples and docs are updated. There's no easy way to do this, due in part to -JSON and YAML silently dropping unknown fields. You're clever - you'll figure it -out. Put `grep` or `ack` to good use. - -If you added functionality, you should consider documenting it and/or writing -an example to illustrate your change. - -Make sure you update the swagger and OpenAPI spec by running: - -```sh -hack/update-swagger-spec.sh -hack/update-openapi-spec.sh -``` - -The API spec changes should be in a commit separate from your other changes. - -## Alpha, Beta, and Stable Versions - -New feature development proceeds through a series of stages of increasing -maturity: - -- Development level - - Object Versioning: no convention - - Availability: not committed to main kubernetes repo, and thus not available -in official releases - - Audience: other developers closely collaborating on a feature or -proof-of-concept - - Upgradeability, Reliability, Completeness, and Support: no requirements or -guarantees -- Alpha level - - Object Versioning: API version name contains `alpha` (e.g. `v1alpha1`) - - Availability: committed to main kubernetes repo; appears in an official -release; feature is disabled by default, but may be enabled by flag - - Audience: developers and expert users interested in giving early feedback on -features - - Completeness: some API operations, CLI commands, or UI support may not be -implemented; the API need not have had an *API review* (an intensive and -targeted review of the API, on top of a normal code review) - - Upgradeability: the object schema and semantics may change in a later -software release, without any provision for preserving objects in an existing -cluster; removing the upgradability concern allows developers to make rapid -progress; in particular, API versions can increment faster than the minor -release cadence and the developer need not maintain multiple versions; -developers should still increment the API version when object schema or -semantics change in an [incompatible way](#on-compatibility) - - Cluster Reliability: because the feature is relatively new, and may lack -complete end-to-end tests, enabling the feature via a flag might expose bugs -with destabilize the cluster (e.g. a bug in a control loop might rapidly create -excessive numbers of object, exhausting API storage). - - Support: there is *no commitment* from the project to complete the feature; -the feature may be dropped entirely in a later software release - - Recommended Use Cases: only in short-lived testing clusters, due to -complexity of upgradeability and lack of long-term support and lack of -upgradability. -- Beta level: - - Object Versioning: API version name contains `beta` (e.g. `v2beta3`) - - Availability: in official Kubernetes releases, and enabled by default - - Audience: users interested in providing feedback on features - - Completeness: all API operations, CLI commands, and UI support should be -implemented; end-to-end tests complete; the API has had a thorough API review -and is thought to be complete, though use during beta may frequently turn up API -issues not thought of during review - - Upgradeability: the object schema and semantics may change in a later -software release; when this happens, an upgrade path will be documented; in some -cases, objects will be automatically converted to the new version; in other -cases, a manual upgrade may be necessary; a manual upgrade may require downtime -for anything relying on the new feature, and may require manual conversion of -objects to the new version; when manual conversion is necessary, the project -will provide documentation on the process - - Cluster Reliability: since the feature has e2e tests, enabling the feature -via a flag should not create new bugs in unrelated features; because the feature -is new, it may have minor bugs - - Support: the project commits to complete the feature, in some form, in a -subsequent Stable version; typically this will happen within 3 months, but -sometimes longer; releases should simultaneously support two consecutive -versions (e.g. `v1beta1` and `v1beta2`; or `v1beta2` and `v1`) for at least one -minor release cycle (typically 3 months) so that users have enough time to -upgrade and migrate objects - - Recommended Use Cases: in short-lived testing clusters; in production -clusters as part of a short-lived evaluation of the feature in order to provide -feedback -- Stable level: - - Object Versioning: API version `vX` where `X` is an integer (e.g. `v1`) - - Availability: in official Kubernetes releases, and enabled by default - - Audience: all users - - Completeness: must have conformance tests, approved by SIG Architecture, -in the appropriate conformance profile (e.g., non-portable and/or optional -features may not be in the default profile) - - Upgradeability: only [strictly compatible](#on-compatibility) changes -allowed in subsequent software releases - - Cluster Reliability: high - - Support: API version will continue to be present for many subsequent -software releases; - - Recommended Use Cases: any - -### Adding Unstable Features to Stable Versions - -When adding a feature to an object which is already Stable, the new fields and -new behaviors need to meet the Stable level requirements. If these cannot be -met, then the new field cannot be added to the object. - -For example, consider the following object: - -```go -// API v6. -type Frobber struct { - // height ... - Height *int32 `json:"height" - // param ... - Param string `json:"param" -} -``` - -A developer is considering adding a new `Width` parameter, like this: - -```go -// API v6. -type Frobber struct { - // height ... - Height *int32 `json:"height" - // param ... - Param string `json:"param" - // width ... - Width *int32 `json:"width,omitempty" -} -``` - -However, the new feature is not stable enough to be used in a stable version -(`v6`). Some reasons for this might include: - -- the final representation is undecided (e.g. should it be called `Width` or `Breadth`?) -- the implementation is not stable enough for general use (e.g. the `Area()` routine sometimes overflows.) - -The developer cannot add the new field unconditionally until stability is met. However, -sometimes stability cannot be met until some users try the new feature, and some -users are only able or willing to accept a released version of Kubernetes. In -that case, the developer has a few options, both of which require staging work -over several releases. - -#### Alpha field in existing API version - -Previously, annotations were used for experimental alpha features, but are no longer recommended for several reasons: - -* They expose the cluster to "time-bomb" data added as unstructured annotations against an earlier API server (https://issue.k8s.io/30819) -* They cannot be migrated to first-class fields in the same API version (see the issues with representing a single value in multiple places in [backward compatibility gotchas](#backward-compatibility-gotchas)) - -The preferred approach adds an alpha field to the existing object, and ensures it is disabled by default: - -1. Add a feature gate to the API server to control enablement of the new field (and associated function): - - In [staging/src/k8s.io/apiserver/pkg/features/kube_features.go](https://git.k8s.io/kubernetes/staging/src/k8s.io/apiserver/pkg/features/kube_features.go): - - ```go - // owner: @you - // alpha: v1.11 - // - // Add multiple dimensions to frobbers. - Frobber2D utilfeature.Feature = "Frobber2D" - - var defaultKubernetesFeatureGates = map[utilfeature.Feature]utilfeature.FeatureSpec{ - ... - Frobber2D: {Default: false, PreRelease: utilfeature.Alpha}, - } - ``` - -2. Add the field to the API type: - - * ensure the field is [optional](api-conventions.md#optional-vs-required) - * add the `omitempty` struct tag - * add the `// +optional` comment tag - * ensure the field is entirely absent from API responses when empty (optional fields should be pointers, anyway) - * include details about the alpha-level in the field description - - ```go - // API v6. - type Frobber struct { - // height ... - Height int32 `json:"height"` - // param ... - Param string `json:"param"` - // width indicates how wide the object is. - // This field is alpha-level and is only honored by servers that enable the Frobber2D feature. - // +optional - Width *int32 `json:"width,omitempty"` - } - ``` - -3. Before persisting the object to storage, clear disabled alpha fields on create, -and on update if the existing object does not already have a value in the field. -This prevents new usage of the feature while it is disabled, while ensuring existing data is preserved. -The recommended place to do this is in the REST storage strategy's PrepareForCreate/PrepareForUpdate methods: - - ```go - func (frobberStrategy) PrepareForCreate(ctx genericapirequest.Context, obj runtime.Object) { - frobber := obj.(*api.Frobber) - - if !utilfeature.DefaultFeatureGate.Enabled(features.Frobber2D) { - frobber.Width = nil - } - } - - func (frobberStrategy) PrepareForUpdate(ctx genericapirequest.Context, obj, old runtime.Object) { - newFrobber := obj.(*api.Frobber) - oldFrobber := old.(*api.Frobber) - - if !utilfeature.DefaultFeatureGate.Enabled(features.Frobber2D) && oldFrobber.Width == nil { - newFrobber.Width = nil - } - } - ``` - -4. In validation, validate the field if present: - - ```go - func ValidateFrobber(f *api.Frobber, fldPath *field.Path) field.ErrorList { - ... - if f.Width != nil { - ... validation of width field ... - } - ... - } - ``` - -In future Kubernetes versions: - -* if the feature progresses to beta or stable status, the feature gate can be removed or be enabled by default. -* if the schema of the alpha field must change in an incompatible way, a new field name must be used. -* if the feature is abandoned, or the field name is changed, the field should be removed from the go struct, with a tombstone comment ensuring the field name and protobuf tag are not reused: - - ```go - // API v6. - type Frobber struct { - // height ... - Height int32 `json:"height" protobuf:"varint,1,opt,name=height"` - // param ... - Param string `json:"param" protobuf:"bytes,2,opt,name=param"` - - // +k8s:deprecated=width,protobuf=3 - } - ``` - -#### New alpha API version - -Another option is to introduce a new type with an new `alpha` or `beta` version -designator, like this: - -```go -// API v7alpha1 -type Frobber struct { - // height ... - Height *int32 `json:"height"` - // param ... - Param string `json:"param"` - // width ... - Width *int32 `json:"width,omitempty"` -} -``` - -The latter requires that all objects in the same API group as `Frobber` to be -replicated in the new version, `v7alpha1`. This also requires user to use a new -client which uses the other version. Therefore, this is not a preferred option. - -A related issue is how a cluster manager can roll back from a new version -with a new feature, that is already being used by users. See -https://github.com/kubernetes/kubernetes/issues/4855. +This file is a placeholder to preserve links. Please remove after 2019-07-01 or the release of kubernetes 1.15, whichever comes first. diff --git a/contributors/devel/bazel.md b/contributors/devel/bazel.md index 991a0ac2..82aa57d1 100644 --- a/contributors/devel/bazel.md +++ b/contributors/devel/bazel.md @@ -1,184 +1,3 @@ -# Build and test with Bazel +This file has moved to https://git.k8s.io/community/contributors/devel/sig-testing/bazel.md. -Building and testing Kubernetes with Bazel is supported but not yet default. - -Bazel is used to run all Kubernetes PRs on [Prow](https://prow.k8s.io), -as remote caching enables significantly reduced build and test times. - -Some repositories (such as kubernetes/test-infra) have switched to using Bazel -exclusively for all build, test, and release workflows. - -Go rules are managed by the [`gazelle`](https://github.com/bazelbuild/rules_go/tree/master/go/tools/gazelle) -tool, with some additional rules managed by the [`kazel`](https://git.k8s.io/repo-infra/kazel) tool. -These tools are called via the `hack/update-bazel.sh` script. - -Instructions for installing Bazel -can be found [here](https://www.bazel.io/versions/master/docs/install.html). - -Several convenience `make` rules have been created for common operations: - -* `make bazel-build`: builds all binaries in tree (`bazel build -- //... - -//vendor/...`) -* `make bazel-test`: runs all unit tests (`bazel test --config=unit -- //... - //hack:verify-all -//build/... -//vendor/...`) -* `make bazel-test-integration`: runs all integration tests (`bazel test - --config integration //test/integration/...`) -* `make bazel-release`: builds release tarballs, Docker images (for server - components), and Debian images (`bazel build //build/release-tars`) - -You can also interact with Bazel directly; for example, to run all `kubectl` unit -tests, run - -```console -$ bazel test //pkg/kubectl/... -``` - -## Planter -If you don't want to install Bazel, you can instead try using the unofficial -[Planter](https://git.k8s.io/test-infra/planter) tool, -which runs Bazel inside a Docker container. - -For example, you can run -```console -$ ../test-infra/planter/planter.sh make bazel-test -$ ../test-infra/planter/planter.sh bazel build //cmd/kubectl -``` - -## Continuous Integration - -There are several bazel CI jobs: -* [ci-kubernetes-bazel-build](http://k8s-testgrid.appspot.com/google-unit#bazel-build): builds everything - with Bazel -* [ci-kubernetes-bazel-test](http://k8s-testgrid.appspot.com/google-unit#bazel-test): runs unit tests in - with Bazel - -Similar jobs are run on all PRs; additionally, several of the e2e jobs use -Bazel-built binaries when launching and testing Kubernetes clusters. - -## Updating `BUILD` files - -To update `BUILD` files, run: - -```console -$ ./hack/update-bazel.sh -``` - -To prevent Go rules from being updated, consult the [gazelle -documentation](https://github.com/bazelbuild/rules_go/tree/master/go/tools/gazelle). - -Note that much like Go files and `gofmt`, `BUILD` files have standardized, -opinionated style rules, and running `hack/update-bazel.sh` will format them for you. - -If you want to auto-format `BUILD` files in your editor, use of -[Buildifier](https://github.com/bazelbuild/buildtools/blob/master/buildifier/README.md) -is recommended. - -Updating the `BUILD` file for a package will be required when: -* Files are added to or removed from a package -* Import dependencies change for a package -* A `BUILD` file has been updated and needs to be reformatted -* A new `BUILD` file has been added (parent `BUILD` files will be updated) - -## Known issues and limitations - -### [Cross-compilation of cgo is not currently natively supported](https://github.com/bazelbuild/rules_go/issues/1020) -All binaries are currently built for the host OS and architecture running Bazel. -(For example, you can't currently target linux/amd64 from macOS or linux/s390x -from an amd64 machine.) - -The Go rules support cross-compilation of pure Go code using the `--platforms` -flag, and this is being used successfully in the kubernetes/test-infra repo. - -It may already be possible to cross-compile cgo code if a custom CC toolchain is -set up, possibly reusing the kube-cross Docker image, but this area needs -further exploration. - -### The CC toolchain is not fully hermetic -Bazel requires several tools and development packages to be installed in the system, including `gcc`, `g++`, `glibc and libstdc++ development headers` and `glibc static development libraries`. Please check your distribution for exact names of the packages. Examples for some commonly used distributions are below: - -| Dependency | Debian/Ubuntu | CentOS | OpenSuSE | -|:---------------------:|-------------------------------|--------------------------------|-----------------------------------------| -| Build essentials | `apt install build-essential` | `yum groupinstall development` | `zypper install -t pattern devel_C_C++` | -| GCC C++ | `apt install g++` | `yum install gcc-c++` | `zypper install gcc-c++` | -| GNU Libc static files | `apt install libc6-dev` | `yum install glibc-static` | `zypper install glibc-devel-static` | - -If any of these packages change, they may also cause spurious build failures -as described in [this issue](https://github.com/bazelbuild/bazel/issues/4907). - -An example error might look something like -``` -ERROR: undeclared inclusion(s) in rule '//vendor/golang.org/x/text/cases:go_default_library.cgo_c_lib': -this rule is missing dependency declarations for the following files included by 'vendor/golang.org/x/text/cases/linux_amd64_stripped/go_default_library.cgo_codegen~/_cgo_export.c': - '/usr/lib/gcc/x86_64-linux-gnu/7/include/stddef.h' -``` - -The only way to recover from this error is to force Bazel to regenerate its -automatically-generated CC toolchain configuration by running `bazel clean ---expunge`. - -Improving cgo cross-compilation may help with all of this. - -### Changes to Go imports requires updating BUILD files -The Go rules in `BUILD` and `BUILD.bazel` files must be updated any time files -are added or removed or Go imports are changed. These rules are automatically -maintained by `gazelle`, which is run via `hack/update-bazel.sh`, but this is -still a source of friction. - -[Autogazelle](https://github.com/bazelbuild/bazel-gazelle/tree/master/cmd/autogazelle) -is a new experimental tool which may reduce or remove the need for developers -to run `hack/update-bazel.sh`, but no work has yet been done to support it in -kubernetes/kubernetes. - -### Code coverage support is incomplete for Go -Bazel and the Go rules have limited support for code coverage. Running something -like `bazel coverage -- //... -//vendor/...` will run tests in coverage mode, -but no report summary is currently generated. It may be possible to combine -`bazel coverage` with -[Gopherage](https://github.com/kubernetes/test-infra/tree/master/gopherage), -however. - -### Kubernetes code generators are not fully supported -The make-based build system in kubernetes/kubernetes runs several code -generators at build time: -* [conversion-gen](https://github.com/kubernetes/code-generator/tree/master/cmd/conversion-gen) -* [deepcopy-gen](https://github.com/kubernetes/code-generator/tree/master/cmd/deepcopy-gen) -* [defaulter-gen](https://github.com/kubernetes/code-generator/tree/master/cmd/defaulter-gen) -* [openapi-gen](https://github.com/kubernetes/kube-openapi/tree/master/cmd/openapi-gen) -* [go-bindata](https://github.com/jteeuwen/go-bindata/tree/master/go-bindata) - -Of these, only `openapi-gen` and `go-bindata` are currently supported when -building Kubernetes with Bazel. - -The `go-bindata` generated code is produced by hand-written genrules. - -The other code generators use special build tags of the form `// -+k8s:generator-name=arg`; for example, input files to the openapi-gen tool are -specified with `// +k8s:openapi-gen=true`. - -`kazel` is used to find all packages that require OpenAPI generation, and then a -handwritten genrule consumes this list of packages to run `openapi-gen`. - -For `openapi-gen`, a single output file is produced in a single Go package, which -makes this fairly compatible with Bazel. -All other Kubernetes code generators generally produce one output file per input -package, which is less compatible with the Bazel workflow. - -The make-based build system batches up all input packages into one call to the -code generator binary, but this is inefficient for Bazel's incrementality, as a -change in one package may result in unnecessarily recompiling many other -packages. -On the other hand, calling the code generator binary multiple times is less -efficient than calling it once, since many of the generators parse the tree for -Go type information and other metadata. - -One additional challenge is that many of the code generators add additional -Go imports which `gazelle` (and `autogazelle`) cannot infer, and so they must be -explicitly added as dependencies in the `BUILD` files. - -Kubernetes has even more code generators than this limited list, but the rest -are generally run as `hack/update-*.sh` scripts and checked into the repository, -and so are not immediately needed for Bazel parity. - -## Contacts -For help or discussion, join the [#bazel](https://kubernetes.slack.com/messages/bazel) -channel on Kubernetes Slack. +This file is a placeholder to preserve links. Please remove after 2019-07-01 or the release of kubernetes 1.15, whichever comes first. diff --git a/contributors/devel/cherry-picks.md b/contributors/devel/cherry-picks.md index 7769f970..5e587d47 100644 --- a/contributors/devel/cherry-picks.md +++ b/contributors/devel/cherry-picks.md @@ -1,73 +1,3 @@ -# Overview +This file has moved to https://git.k8s.io/community/contributors/devel/sig-release/cherry-picks.md. -This document explains how cherry-picks are managed on release branches within -the kubernetes/kubernetes repository. -A common use case for this task is backporting PRs from master to release -branches. - -## Prerequisites - * [Contributor License Agreement](http://git.k8s.io/community/CLA.md) is - considered implicit for all code within cherry-pick pull requests, - **unless there is a large conflict**. - * A pull request merged against the master branch. - * [Release branch](https://git.k8s.io/release/docs/branching.md) exists. - * The normal git and GitHub configured shell environment for pushing to your - kubernetes `origin` fork on GitHub and making a pull request against a - configured remote `upstream` that tracks - "https://github.com/kubernetes/kubernetes.git", including `GITHUB_USER`. - * Have `hub` installed, which is most easily installed via `go get - github.com/github/hub` assuming you have a standard golang development - environment. - -## Initiate a Cherry-pick - * Run the [cherry-pick - script](https://git.k8s.io/kubernetes/hack/cherry_pick_pull.sh). - This example applies a master branch PR #98765 to the remote branch - `upstream/release-3.14`: `hack/cherry_pick_pull.sh upstream/release-3.14 - 98765` - * Be aware the cherry-pick script assumes you have a git remote called - `upstream` that points at the Kubernetes github org. - Please see our [recommended Git workflow](https://git.k8s.io/community/contributors/guide/github-workflow.md#workflow). - * You will need to run the cherry-pick script separately for each patch release you want to cherry-pick to. - - * Your cherry-pick PR will immediately get the `do-not-merge/cherry-pick-not-approved` label. - The [Branch Manager](https://git.k8s.io/sig-release/release-team/role-handbooks/branch-manager) - will triage PRs targeted to the next .0 minor release branch up until the - release, while the [Patch Release Team](https://git.k8s.io/sig-release/release-team/role-handbooks/patch-release-manager) - will handle all cherry-picks to patch releases. - Normal rules apply for code merge. - * Reviewers `/lgtm` and owners `/approve` as they deem appropriate. - * Milestones on cherry-pick PRs should be the milestone for the target - release branch (for example, milestone 1.11 for a cherry-pick onto - release-1.11). - * You can find the current release team members in the - [appropriate release folder](https://git.k8s.io/sig-release/releases) for the target release. - You may cc them with `<@githubusername>` on your cherry-pick PR. - -## Cherry-pick Review - -Cherry-pick pull requests have an additional requirement compared to normal pull -requests. -They must be approved specifically for cherry-pick by Approvers. -The [Branch Manager](https://git.k8s.io/sig-release/release-team/role-handbooks/branch-manager) -or the [Patch Release Team](https://git.k8s.io/sig-release/release-team/role-handbooks/patch-release-manager) -are the final authority on removing the `do-not-merge/cherry-pick-not-approved` -label and triggering a merge into the target branch. - -## Searching for Cherry-picks - -- [A sample search on kubernetes/kubernetes pull requests that are labeled as `cherry-pick-approved`](https://github.com/kubernetes/kubernetes/pulls?q=is%3Aopen+is%3Apr+label%3Acherry-pick-approved) - -- [A sample search on kubernetes/kubernetes pull requests that are labeled as `do-not-merge/cherry-pick-not-approved`](https://github.com/kubernetes/kubernetes/pulls?q=is%3Aopen+is%3Apr+label%3Ado-not-merge%2Fcherry-pick-not-approved) - - -## Troubleshooting Cherry-picks - -Contributors may encounter some of the following difficulties when initiating a cherry-pick. - -- A cherry-pick PR does not apply cleanly against an old release branch. -In that case, you will need to manually fix conflicts. - -- The cherry-pick PR includes code that does not pass CI tests. -In such a case you will have to fetch the auto-generated branch from your fork, amend the problematic commit and force push to the auto-generated branch. -Alternatively, you can create a new PR, which is noisier. +This file is a placeholder to preserve links. Please remove after 2019-07-01 or the release of kubernetes 1.15, whichever comes first. diff --git a/contributors/devel/collab.md b/contributors/devel/collab.md index 62cb85eb..cdf234d2 100644 --- a/contributors/devel/collab.md +++ b/contributors/devel/collab.md @@ -1,3 +1,3 @@ -This document has moved to: [here](https://git.k8s.io/community/contributors/guide/collab.md). +This file has moved to https://git.k8s.io/community/contributors/guide/collab.md. -*This file is a redirect stub. It should be deleted within 3 months from the current date.*
\ No newline at end of file +This file is a placeholder to preserve links. Please remove after 2019-07-01 or the release of kubernetes 1.15, whichever comes first. diff --git a/contributors/devel/component-config-conventions.md b/contributors/devel/component-config-conventions.md index d3fe1000..3273f3b8 100644 --- a/contributors/devel/component-config-conventions.md +++ b/contributors/devel/component-config-conventions.md @@ -1,221 +1,3 @@ -# Component Configuration Conventions +This file has moved to https://git.k8s.io/community/contributors/devel/sig-architecture/component-config-conventions.md. -# Objective - -This document concerns the configuration of Kubernetes system components (as -opposed to the configuration of user workloads running on Kubernetes). -Component configuration is a major operational burden for operators of -Kubernetes clusters. To date, much literature has been written on and much -effort expended to improve component configuration. Despite this, the state of -component configuration remains dissonant. This document attempts to aggregate -that literature and propose a set of guidelines that component owners can -follow to improve consistency across the project. - -# Background - -Currently, component configuration is primarily driven through command line -flags. Command line driven configuration poses certain problems which are -discussed below. Attempts to improve component configuration as a whole have -been slow to make progress and have petered out (ref componentconfig api group, -configmap driven config issues). Some component owners have made use case -specific improvements on a per-need basis. Various comments in issues recommend -subsets of best design practice but no coherent, complete story exists. - -## Pain Points of Current Configuration - -Flag based configuration has poor qualities such as: - -1. Flags exist in a flat namespace, hampering the ability to organize them and expose them in helpful documentation. --help becomes useless as a reference as the number of knobs grows. It's impossible to distinguish useful knobs from cruft. -1. Flags can't easily have different values for different instances of a class. To adjust the resync period in the informers of O(n) controllers requires O(n) different flags in a global namespace. -1. Changing a process's command line necessitates a binary restart. This negatively impacts availability. -1. Flags are unsuitable for passing confidential configuration. The command line of a process is available to unprivileged process running in the host pid namespace. -1. Flags are a public API but are unversioned and unversionable. -1. Many arguments against using global variables apply to flags. - -Configuration in general has poor qualities such as: - -1. Configuration changes have the same forward/backward compatibility requirements as releases but rollout/rollback of configuration largely untested. Examples of configuration changes that might break a cluster: kubelet CNI plugin, etcd storage version. -1. Configuration options often exist only to test a specific feature where the default is reasonable for all real use cases. Examples: many sync periods. -1. Configuration options often exist to defer a "hard" design decision and to pay forward the "TODO(someone-else): think critically". -1. Configuration options are often used to workaround deficiencies of the API. For example `--register-with-labels` and `--register-with-taints` could be solved with a node initializer, if initializers existed. -1. Configuration options often exist to take testing shortcuts. There is a mentality that because a feature is opt-in, it can be released as a flag without robust testing. -1. Configuration accumulates new knobs, knobs accumulate new behaviors, knobs are forgotten and bitrot reducing code quality over time. -1. Number of configuration options is inversely proportional to test coverage. The size of the configuration state space grows >O(2^n) with the number of configuration bits. A handful of states in that space are ever tested. -1. Configuration options hamper troubleshooting efforts. On github, users frequently file tickets from environments that are neither consistent nor reproducible. - -## Types Of Configuration - -Configuration can only come from three sources: - -1. Command line flags. -1. API types serialized and stored on disk. -1. API types serialized and stored in the kubernetes API. - -Configuration options can be partitioned along certain lines. To name a few -important partitions: - -1. Bootstrap: This is configuration that is required before the component can contact the API. Examples include the kubeconfig and the filepath to the kubeconfig. -1. Dynamic vs Static: Dynamic config is config that is expected to change as part of normal operations such as a scheduler configuration or a node entering maintenance mode. Static config is config that is unlikely to change over subsequent deployments and even releases of a component. -1. Shared vs Per-Instance: Per-Instance configuration is configuration whose value is unique to the instance that the node runs on (e.g. Kubelet's `--hostname-override`). -1. Feature Gates: Feature gates are configuration options that enable a feature that has been deemed unsafe to enable by default. -1. Request context dependent: Request context dependent config is config that should probably be scoped to an attribute of the request (such as the user). We do a pretty good job of keeping these out of config and in policy objects (e.g. Quota, RBAC) but we could do more (e.g. rate limits). -1. Environment information: This is configuration that is available through downwards and OS APIs, e.g. node name, pod name, number of cpus, IP address. - -# Requirements - -Desired qualities of a configuration solution: - -1. Secure: We need to control who can change configuration. We need to control who can read sensitive configuration. -1. Manageable: We need to control which instances of a component uses which configuration, especially when those instances differ in version. -1. Reliable: Configuration pushes should just work. If they fail, they should fail early in the rollout, rollback config if possible, and alert noisily. -1. Recoverable: We need to be able to update (e.g. rollback) configuration when a component is down. -1. Monitorable: Both humans and computers need to monitor configuration; humans through json interfaces like /configz, computers through interfaces like prometheus /streamz. Confidential configuration needs to be accounted for, but can also be useful to monitor in an unredacted or partially redacted (i.e. hashed) form. -1. Verifiable: We need to be able to verify that a configuration is good. We need to verify the integrity of the received configuration and we need to validate that the encoded configuration state is sensible. -1. Auditable: We need to be able to trace the origin of a configuration change. -1. Accountable: We need to correlate a configuration push with its impact to the system. We need to be able to do this at the time of the push and later when analyzing logs. -1. Available: We should avoid high frequency configuration updates that require service disruption. We need to take into account system component SLA. -1. Scalable: We need to support distributing configuration to O(10,000) components at our current supported scalability limits. -1. Consistent: There should exist conventions that hold across components. -1. Composable: We should favor composition of configuration sources over layering/templating/inheritance. -1. Normalized: Redundant specification of configuration data should be avoided. -1. Testable: We need to be able to test the system under many different configurations. We also need to test configuration changes, both dynamic changes and those that require process restarts. -1. Maintainable: We need to push back on ever increasing cyclomatic complexity in our codebase. Each if statement and function argument added to support a configuration option negatively impacts the maintainability of our code. -1. Evolvable: We need to be able to extend our configuration API like we extend our other user facing APIs. We need to hold our configuration API to the same SLA and deprecation policy of public facing APIs. (e.g. [dynamic admission control](https://github.com/kubernetes/community/pull/611) and [hooks](https://github.com/kubernetes/kubernetes/issues/3585)) - -These don't need to be implemented immediately but are good to keep in mind. At -some point these should be ranked by priority and implemented. - -# Two Part Solution: - -## Part 1: Don't Make It Configuration - -The most effective way to reduce the operational burden of configuration is to -minimize the amount of configuration. When adding a configuration option, ask -whether alternatives might be a better fit. - -1. Policy objects: Create first class Kubernetes objects to encompass how the system should behave. These are especially useful for request context dependent configuration. We do this already in places such as RBAC and ResourceQuota but we could do more such as rate limiting. We should never hardcode groups or usermaps in configuration. -1. API features: Use (or implement) functionality of the API (e.g. think through and implement initializers instead of --register-with-label). Allowing for extension in the right places is a better way to give users control. -1. Feature discovery: Write components that introspect the existing API to decide whether to enable a feature or not. E.g. controller-manager should start an app controller if the app API is available, kubelet should enable zram if zram is set in the node spec. -1. Downwards API: Use the APIs that the OS and pod environment expose directly before opting to pass in new configuration options. -1. const's: If you don't know whether tweaking a value will be useful, make the value const. Only give it a configuration option once there becomes a need to tweak the value at runtime. -1. Autotuning: Build systems that incorporate feedback and do the best thing under the given circumstances. This makes the system more robust. (e.g. prefer congestion control, load shedding, backoff rather than explicit limiting). -1. Avoid feature flags: Turn on features when they are tested and ready for production. Don't use feature flags as a fallback for poorly tested code. -1. Configuration profiles: Instead of allowing individual configuration options to be modified, try to encompass a broader desire as a configuration profile. For example: instead of enabling individual alpha features, have an EnableAlpha option that enables all. Instead of allowing individual controller knobs to be modified, have a TestMode option that sets a broad number of parameters to be suitable for tests. - -## Part 2: Component Configuration - -### Versioning Configuration - -We create configuration API groups per component that live in the source tree of -the component. Each component has its own API group for configuration. -Components will use the same API machinery that we use for other API groups. -Configuration API serialization doesn't have the same performance requirements -as other APIs so much of the codegen can be avoided (e.g. ugorji, generated -conversions) and we can instead fallback to the reflection based implementations -where they exist. - -Configuration API groups for component config should be named according to the -scheme `<component>.config.k8s.io`. The `.config.k8s.io` suffix serves to -disambiguate types of config API groups from served APIs. - -### Retrieving Configuration - -The primary mechanism for retrieving static configuration should be -deserialization from files. For the majority of components (with the possible -exception of the kubelet, see -[here](https://github.com/kubernetes/kubernetes/pull/29459)), these files will -be source from the configmap API and managed by the kubelet. Reliability of -this mechanism is predicated on kubelet checkpointing of pod dependencies. - - -### Structuring Configuration - -Group related options into distinct objects and subobjects. Instead of writing: - - -```yaml -kind: KubeProxyConfiguration -apiVersion: kubeproxy.config.k8s.io/v1beta3 -ipTablesSyncPeriod: 2 -ipTablesConntrackHashSize: 2 -ipTablesConntrackTableSize: 2 -``` - -Write: - -```yaml -kind: KubeProxyConfiguration -apiVersion: kubeproxy.config.k8s.io/v1beta3 -ipTables: - syncPeriod: 2 - conntrack: - hashSize: 2 - tableSize: 2 -``` - -We should avoid passing around full configuration options to deeply constructed -modules. For example, instead of calling NewSomethingController in the -controller-manager with the full controller-manager config, group relevant -config into a subobject and only pass the subobject. We should expose the -smallest possible necessary configuration to the SomethingController. - - -### Handling Different Types Of Configuration - -Above in "Type Of Configuration" we introduce a few ways to partition -configuration options. Environment information, request context depending -configuration, feature gates, and static configuration should be avoided if at -all possible using a configuration alternative. We should maintain separate -objects along these partitions and consider retrieving these configurations -from separate source (i.e. files). For example: kubeconfig (which falls into -the bootstrapping category) should not be part of the main config option (nor -should the filepath to the kubeconfig), per-instance config should be stored -separately from shared config. This allows for composition and obviates the -need for layering/templating solutions. - - -### In-Process Representation Of Configuration - -We should separate structs for flags, serializable config, and runtime config. - -1. Structs for flags should have enough information for the process startup to retrieve its full configuration. Examples include: path the kubeconfig, path to configuration file, namespace and name of configmap to use for configuration. -1. Structs for serializable configuration: This struct contains the full set of options in a serializable form (e.g. to represent an ip address instead of `net.IP`, use `string`). This is the struct that is versioned and serialized to disk using API machinery. -1. Structs for runtime: This struct holds data in the most appropriate format for execution. This field can hold non-serializable types (e.g. have a `kubeClient` field instead of a `kubeConfig` field, store ip addresses as `net.IP`). - -The flag struct is transformed into the configuration struct which is -transformed into the runtime struct. - - -### Migrating Away From Flags - -Migrating to component configuration can happen incrementally (per component). -By versioning each component's API group separately, we can allow each API -group to advance to beta and GA independently. APIs should be approved by -component owners and reviewers familiar with the component configuration -conventions. We can incentivize operators to migrate away from flags by making -new configuration options only available through the component configuration -APIs. - -# Caveats - -Proposed are not laws but guidelines and as such we've favored completeness -over consistency. There will thus be need for exceptions. - -1. Components (especially those that are not self hosted such as the kubelet) will require custom rollout strategies of new config. -1. Pod checkpointing by kubelet would allow this strategy to be simpler to make reliable. - - -# Miscellaneous Consideration - -1. **This document takes intentionally a very zealous stance against configuration.** Often configuration alternatives are not possible in Kubernetes as they are in proprietary software because Kubernetes has to run in diverse environments, with diverse users, managed by diverse operators. -1. More frequent releases of kubernetes would make "skipping the config knob" more enticing because fixing a bad guess at a const wouldn't take O(4 months) best case to rollout. Factoring in our support for old versions, it takes closer to a year. -1. Self-hosting resolves much of the distribution issue (except for maybe the Kubelet) but reliability is predicated on to-be-implemented features such as kubelet checkpointing of pod dependencies and sound operational practices such as incremental rollout of new configuration using Deployments/DaemonSets. -1. Validating config is hard. Fatal logs lead to crash loops and error logs are ignored. Both options are suboptimal. -1. Configuration needs to be updatable when components are down. -1. Naming style guide: - 1. No negatives, e.g. prefer --enable-foo over --disable-foo - 1. Use the active voice -1. We should actually enforce deprecation. Can we have a test that fails when a comment exists beyond its deadline to be removed? See [#44248](https://github.com/kubernetes/kubernetes/issues/44248) -1. Use different implementations of the same interface rather than if statements to toggle features. This makes deprecation and deletion easy, improving maintainability. -1. How does the proposed solution meet the requirements? Which desired qualities are missed? -1. Configuration changes should trigger predictable and reproducible actions. From a given system state and a given component configuration, we should be able to simulate the actions that the system will take. +This file is a placeholder to preserve links. Please remove after 2019-07-01 or the release of kubernetes 1.15, whichever comes first. diff --git a/contributors/devel/conformance-tests.md b/contributors/devel/conformance-tests.md index 46ca318d..414e9727 100644 --- a/contributors/devel/conformance-tests.md +++ b/contributors/devel/conformance-tests.md @@ -1,216 +1,3 @@ -# Conformance Testing in Kubernetes +This file has moved to https://git.k8s.io/community/contributors/devel/sig-architecture/conformance-tests.md. -The Kubernetes Conformance test suite is a subset of e2e tests that SIG -Architecture has approved to define the core set of interoperable features that -all conformant Kubernetes clusters must support. The tests verify that the -expected behavior works as a user might encounter it in the wild. - -The process to add new conformance tests is intended to decouple the development -of useful tests from their promotion to conformance: -- Contributors write and submit e2e tests, to be approved by owning SIGs -- Tests are proven to meet the [conformance test requirements] by review - and by accumulation of data on flakiness and reliability -- A follow up PR is submitted to [promote the test to conformance](#promoting-tests-to-conformance) - -NB: This should be viewed as a living document in a few key areas: -- The desired set of conformant behaviors is not adequately expressed by the - current set of e2e tests, as such this document is currently intended to - guide us in the addition of new e2e tests than can fill this gap -- This document currently focuses solely on the requirements for GA, - non-optional features or APIs. The list of requirements will be refined over - time to the point where it as concrete and complete as possible. -- There are currently conformance tests that violate some of the requirements - (e.g., require privileged access), we will be categorizing these tests and - deciding what to do once we have a better understanding of the situation -- Once we resolve the above issues, we plan on identifying the appropriate areas - to relax requirements to allow for the concept of conformance Profiles that - cover optional or additional behaviors - -## Conformance Test Requirements - -Conformance tests currently test only GA, non-optional features or APIs. More -specifically, a test is eligible for promotion to conformance if: - -- it tests only GA, non-optional features or APIs (e.g., no alpha or beta - endpoints, no feature flags required, no deprecated features) -- it works for all providers (e.g., no `SkipIfProviderIs`/`SkipUnlessProviderIs` - calls) -- it is non-privileged (e.g., does not require root on nodes, access to raw - network interfaces, or cluster admin permissions) -- it works without access to the public internet (short of whatever is required - to pre-pull images for conformance tests) -- it works without non-standard filesystem permissions granted to pods -- it does not rely on any binaries that would not be required for the linux - kernel or kubelet to run (e.g., can't rely on git) -- any container images used within the test support all architectures for which - kubernetes releases are built -- it passes against the appropriate versions of kubernetes as spelled out in - the [conformance test version skew policy] -- it is stable and runs consistently (e.g., no flakes) - -Examples of features which are not currently eligible for conformance tests: - -- node/platform-reliant features, eg: multiple disk mounts, GPUs, high density, - etc. -- optional features, eg: policy enforcement -- cloud-provider-specific features, eg: GCE monitoring, S3 Bucketing, etc. -- anything that requires a non-default admission plugin - -Examples of tests which are not eligible for promotion to conformance: -- anything that checks specific Events are generated, as we make no guarantees - about the contents of events, nor their delivery -- anything that checks optional Condition fields, such as Reason or Message, as - these may change over time (however it is reasonable to verify these fields - exist or are non-empty) - -Examples of areas we may want to relax these requirements once we have a -sufficient corpus of tests that define out of the box functionality in all -reasonable production worthy environments: -- tests may need to create or set objects or fields that are alpha or beta that - bypass policies that are not yet GA, but which may reasonably be enabled on a - conformant cluster (e.g., pod security policy, non-GA scheduler annotations) - -## Conformance Test Version Skew Policy - -As each new release of Kubernetes provides new functionality, the subset of -tests necessary to demonstrate conformance grows with each release. Conformance -is thus considered versioned, with the same backwards compatibility guarantees -as laid out in the [kubernetes versioning policy] - -To quote: - -> For example, a v1.3 master should work with v1.1, v1.2, and v1.3 nodes, and -> should work with v1.2, v1.3, and v1.4 clients. - -Conformance tests for a given version should be run off of the release branch -that corresponds to that version. Thus `v1.2` conformance tests would be run -from the head of the `release-1.2` branch. - -For example, suppose we're in the midst of developing kubernetes v1.3. Clusters -with the following versions must pass conformance tests built from the -following branches: - -| cluster version | master | release-1.3 | release-1.2 | release-1.1 | -| --------------- | ----- | ----------- | ----------- | ----------- | -| v1.3.0-alpha | yes | yes | yes | no | -| v1.2.x | no | no | yes | yes | -| v1.1.x | no | no | no | yes | - -## Running Conformance Tests - -Conformance tests are designed to be run even when there is no cloud provider -configured. Conformance tests must be able to be run against clusters that have -not been created with `hack/e2e.go`, just provide a kubeconfig with the -appropriate endpoint and credentials. - -These commands are intended to be run within a kubernetes directory, either -cloned from source, or extracted from release artifacts such as -`kubernetes.tar.gz`. They assume you have a valid golang installation. - -```sh -# ensure kubetest is installed -go get -u k8s.io/test-infra/kubetest - -# build test binaries, ginkgo, and kubectl first: -make WHAT="test/e2e/e2e.test vendor/github.com/onsi/ginkgo/ginkgo cmd/kubectl" - -# setup for conformance tests -export KUBECONFIG=/path/to/kubeconfig -export KUBERNETES_CONFORMANCE_TEST=y - -# Option A: run all conformance tests serially -kubetest --provider=skeleton --test --test_args="--ginkgo.focus=\[Conformance\]" - -# Option B: run parallel conformance tests first, then serial conformance tests serially -GINKGO_PARALLEL=y kubetest --provider=skeleton --test --test_args="--ginkgo.focus=\[Conformance\] --ginkgo.skip=\[Serial\]" -kubetest --provider=skeleton --test --test_args="--ginkgo.focus=\[Serial\].*\[Conformance\]" -``` - -## Kubernetes Conformance Document - -For each Kubernetes release, a Conformance Document will be generated that lists -all of the tests that comprise the conformance test suite, along with the formal -specification of each test. For an example, see the [v1.9 conformance doc]. -This document will help people understand what features are being tested without -having to look through the testcase's code directly. - - -## Promoting Tests to Conformance - -To promote a test to the conformance test suite, open a PR as follows: -- is titled "Promote xxx e2e test to Conformance" -- includes information and metadata in the description as follows: - - "/area conformance" on a newline - - "@kubernetes/sig-architecture-pr-reviews @kubernetes/sig-foo-pr-reviews - @kubernetes/cncf-conformance-wg" on a new line, where sig-foo is whichever - sig owns this test - - any necessary information in the description to verify that the test meets - [conformance test requirements], such as links to reports or dashboards that - prove lack of flakiness -- contains no other modifications to test source code other than the following: - - modifies the testcase to use the `framework.ConformanceIt()` function rather - than the `framework.It()` function - - adds a comment immediately before the `ConformanceIt()` call that includes - all of the required [conformance test comment metadata] -- add the PR to SIG Architecture's [Conformance Test Review board] - - -### Conformance Test Comment Metadata - -Each conformance test must include the following piece of metadata -within its associated comment: - -- `Release`: indicates the Kubernetes release that the test was added to the - conformance test suite. If the test was modified in subsequent releases - then those releases should be included as well (comma separated) -- `Testname`: a human readable short name of the test -- `Description`: a detailed description of the test. This field must describe - the required behaviour of the Kubernetes components being tested using - [RFC2119](https://tools.ietf.org/html/rfc2119) keywords. This field - is meant to be a "specification" of the tested Kubernetes features, as - such, it must be detailed enough so that readers can fully understand - the aspects of Kubernetes that are being tested without having to read - the test's code directly. Additionally, this test should provide a clear - distinction between the parts of the test that are there for the purpose - of validating Kubernetes rather than simply infrastructure logic that - is necessary to setup, or clean up, the test. - -### Sample Conformance Test - -The following snippet of code shows a sample conformance test's metadata: - -``` -/* - Release : v1.9 - Testname: Kubelet: log output - Description: By default the stdout and stderr from the process being - executed in a pod MUST be sent to the pod's logs. -*/ -framework.ConformanceIt("it should print the output to logs", func() { - ... -}) -``` - -The corresponding portion of the Kubernetes Conformance Documentfor this test -would then look like this: - -> ## [Kubelet: log output](https://github.com/kubernetes/kubernetes/tree/release-1.9/test/e2e_node/kubelet_test.go#L47) -> -> Release : v1.9 -> -> By default the stdout and stderr from the process being executed in a pod MUST be sent to the pod's logs. - -### Reporting Conformance Test Results - -Conformance test results, by provider and releases, can be viewed in the -[testgrid conformance dashboard]. If you wish to contribute test results -for your provider, please see the [testgrid conformance README] - -[kubernetes versioning policy]: /contributors/design-proposals/release/versioning.md#supported-releases-and-component-skew -[Conformance Test Review board]: https://github.com/kubernetes-sigs/architecture-tracking/projects/1 -[conformance test requirements]: #conformance-test-requirements -[conformance test metadata]: #conformance-test-metadata -[conformance test version skew policy]: #conformance-test-version-skew-policy -[testgrid conformance dashboard]: https://testgrid.k8s.io/conformance-all -[testgrid conformance README]: https://github.com/kubernetes/test-infra/blob/master/testgrid/conformance/README.md -[v1.9 conformance doc]: https://github.com/cncf/k8s-conformance/blob/master/docs/KubeConformance-1.9.md +This file is a placeholder to preserve links. Please remove after 2019-07-01 or the release of kubernetes 1.15, whichever comes first. diff --git a/contributors/devel/container-runtime-interface.md b/contributors/devel/container-runtime-interface.md index 1a121c9e..6b12b564 100644 --- a/contributors/devel/container-runtime-interface.md +++ b/contributors/devel/container-runtime-interface.md @@ -1,136 +1,3 @@ -# CRI: the Container Runtime Interface - -## What is CRI? - -CRI (_Container Runtime Interface_) consists of a -[protobuf API](https://git.k8s.io/kubernetes/pkg/kubelet/apis/cri/runtime/v1alpha2/api.proto), -specifications/requirements (to-be-added), -and [libraries](https://git.k8s.io/kubernetes/pkg/kubelet/server/streaming) -for container runtimes to integrate with kubelet on a node. CRI is currently in Alpha. - -In the future, we plan to add more developer tools such as the CRI validation -tests. - -## Why develop CRI? - -Prior to the existence of CRI, container runtimes (e.g., `docker`, `rkt`) were -integrated with kubelet through implementing an internal, high-level interface -in kubelet. The entrance barrier for runtimes was high because the integration -required understanding the internals of kubelet and contributing to the main -Kubernetes repository. More importantly, this would not scale because every new -addition incurs a significant maintenance overhead in the main Kubernetes -repository. - -Kubernetes aims to be extensible. CRI is one small, yet important step to enable -pluggable container runtimes and build a healthier ecosystem. - -## How to use CRI? - -For Kubernetes 1.6+: - -1. Start the image and runtime services on your node. You can have a single - service acting as both image and runtime services. -2. Set the kubelet flags - - Pass the unix socket(s) to which your services listen to kubelet: - `--container-runtime-endpoint` and `--image-service-endpoint`. - - Use the "remote" runtime by `--container-runtime=remote`. - -CRI is still young and we are actively incorporating feedback from developers -to improve the API. Although we strive to maintain backward compatibility, -developers should expect occasional API breaking changes. - -*For Kubernetes 1.5, additional flags are required:* - - Set apiserver flag `--feature-gates=StreamingProxyRedirects=true`. - - Set kubelet flag `--experimental-cri=true`. - -## Does Kubelet use CRI today? - -Yes, Kubelet always uses CRI except for using the rktnetes integration. - -The old, pre-CRI Docker integration was removed in 1.7. - -## Specifications, design documents and proposals - -The Kubernetes 1.5 [blog post on CRI](https://kubernetes.io/blog/2016/12/container-runtime-interface-cri-in-kubernetes/) -serves as a general introduction. - - -Below is a mixed list of CRI specifications/requirements, design docs and -proposals. We are working on adding more documentation for the API. - - - [Original proposal](https://github.com/kubernetes/kubernetes/blob/release-1.5/docs/proposals/container-runtime-interface-v1.md) - - [Networking](/contributors/devel/kubelet-cri-networking.md) - - [Container metrics](/contributors/devel/cri-container-stats.md) - - [Exec/attach/port-forward streaming requests](https://docs.google.com/document/d/1OE_QoInPlVCK9rMAx9aybRmgFiVjHpJCHI9LrfdNM_s/edit?usp=sharing) - - [Container stdout/stderr logs](https://github.com/kubernetes/kubernetes/blob/release-1.5/docs/proposals/kubelet-cri-logging.md) - -## Work-In-Progress CRI runtimes - - - [cri-o](https://github.com/kubernetes-incubator/cri-o) - - [rktlet](https://github.com/kubernetes-incubator/rktlet) - - [frakti](https://github.com/kubernetes/frakti) - - [cri-containerd](https://github.com/kubernetes-incubator/cri-containerd) - -## [Status update](#status-update) -### Kubernetes v1.7 release (Docker-CRI integration GA, container metrics API) - - The Docker CRI integration has been promoted to GA. - - The legacy, non-CRI Docker integration has been completely removed from - Kubelet. The deprecated `--enable-cri` flag has been removed. - - CRI has been extended to support collecting container metrics from the - runtime. - -### Kubernetes v1.6 release (Docker-CRI integration Beta) - **The Docker CRI integration has been promoted to Beta, and been enabled by -default in Kubelet**. - - **Upgrade**: It is recommended to drain your node before upgrading the - Kubelet. If you choose to perform in-place upgrade, the Kubelet will - restart all Kubernetes-managed containers on the node. - - **Resource usage and performance**: There is no performance regression - in our measurement. The memory usage of Kubelet increases slightly - (~0.27MB per pod) due to the additional gRPC serialization for CRI. - - **Disable**: To disable the Docker CRI integration and fall back to the - old implementation, set `--enable-cri=false`. Note that the old - implementation has been *deprecated* and is scheduled to be removed in - the next release. You are encouraged to migrate to CRI as early as - possible. - - **Others**: The Docker container naming/labeling scheme has changed - significantly in 1.6. This is perceived as implementation detail and - should not be relied upon by any external tools or scripts. - -### Kubernetes v1.5 release (CRI v1alpha1) - - - [v1alpha1 version](https://github.com/kubernetes/kubernetes/blob/release-1.5/pkg/kubelet/api/v1alpha1/runtime/api.proto) of CRI is released. - -#### [CRI known issues](#cri-1.5-known-issues): - - - [#27097](https://github.com/kubernetes/kubernetes/issues/27097): Container - metrics are not yet defined in CRI. - - [#36401](https://github.com/kubernetes/kubernetes/issues/36401): The new - container log path/format is not yet supported by the logging pipeline - (e.g., fluentd, GCL). - - CRI may not be compatible with other experimental features (e.g., Seccomp). - - Streaming server needs to be hardened. - - [#36666](https://github.com/kubernetes/kubernetes/issues/36666): - Authentication. - - [#36187](https://github.com/kubernetes/kubernetes/issues/36187): Avoid - including user data in the redirect URL. - -#### [Docker CRI integration known issues](#docker-cri-1.5-known-issues) - - - Docker compatibility: Support only Docker v1.11 and v1.12. - - Network: - - [#35457](https://github.com/kubernetes/kubernetes/issues/35457): Does - not support host ports. - - [#37315](https://github.com/kubernetes/kubernetes/issues/37315): Does - not support bandwidth shaping. - - Exec/attach/port-forward (streaming requests): - - [#35747](https://github.com/kubernetes/kubernetes/issues/35747): Does - not support `nsenter` as the exec handler (`--exec-handler=nsenter`). - - Also see [CRI 1.5 known issues](#cri-1.5-known-issues) for limitations - on CRI streaming. - -## Contacts - - - Email: sig-node (kubernetes-sig-node@googlegroups.com) - - Slack: https://kubernetes.slack.com/messages/sig-node +This file has moved to https://git.k8s.io/community/contributors/devel/sig-node/container-runtime-interface.md. +This file is a placeholder to preserve links. Please remove after 2019-07-01 or the release of kubernetes 1.15, whichever comes first. diff --git a/contributors/devel/controllers.md b/contributors/devel/controllers.md index 268e0d10..725c3dde 100644 --- a/contributors/devel/controllers.md +++ b/contributors/devel/controllers.md @@ -1,191 +1,3 @@ -# Writing Controllers +This file has moved to https://git.k8s.io/community/contributors/devel/sig-api-machinery/controllers.md. -A Kubernetes controller is an active reconciliation process. That is, it watches some object for the world's desired state, and it watches the world's actual state, too. Then, it sends instructions to try and make the world's current state be more like the desired state. - -The simplest implementation of this is a loop: - -```go -for { - desired := getDesiredState() - current := getCurrentState() - makeChanges(desired, current) -} -``` - -Watches, etc, are all merely optimizations of this logic. - -## Guidelines - -When you're writing controllers, there are few guidelines that will help make sure you get the results and performance you're looking for. - -1. Operate on one item at a time. If you use a `workqueue.Interface`, you'll be able to queue changes for a particular resource and later pop them in multiple “worker” gofuncs with a guarantee that no two gofuncs will work on the same item at the same time. - - Many controllers must trigger off multiple resources (I need to "check X if Y changes"), but nearly all controllers can collapse those into a queue of “check this X” based on relationships. For instance, a ReplicaSet controller needs to react to a pod being deleted, but it does that by finding the related ReplicaSets and queuing those. - -1. Random ordering between resources. When controllers queue off multiple types of resources, there is no guarantee of ordering amongst those resources. - - Distinct watches are updated independently. Even with an objective ordering of “created resourceA/X” and “created resourceB/Y”, your controller could observe “created resourceB/Y” and “created resourceA/X”. - -1. Level driven, not edge driven. Just like having a shell script that isn't running all the time, your controller may be off for an indeterminate amount of time before running again. - - If an API object appears with a marker value of `true`, you can't count on having seen it turn from `false` to `true`, only that you now observe it being `true`. Even an API watch suffers from this problem, so be sure that you're not counting on seeing a change unless your controller is also marking the information it last made the decision on in the object's status. - -1. Use `SharedInformers`. `SharedInformers` provide hooks to receive notifications of adds, updates, and deletes for a particular resource. They also provide convenience functions for accessing shared caches and determining when a cache is primed. - - Use the factory methods down in https://git.k8s.io/kubernetes/staging/src/k8s.io/client-go/informers/factory.go to ensure that you are sharing the same instance of the cache as everyone else. - - This saves us connections against the API server, duplicate serialization costs server-side, duplicate deserialization costs controller-side, and duplicate caching costs controller-side. - - You may see other mechanisms like reflectors and deltafifos driving controllers. Those were older mechanisms that we later used to build the `SharedInformers`. You should avoid using them in new controllers. - -1. Never mutate original objects! Caches are shared across controllers, this means that if you mutate your "copy" (actually a reference or shallow copy) of an object, you'll mess up other controllers (not just your own). - - The most common point of failure is making a shallow copy, then mutating a map, like `Annotations`. Use `api.Scheme.Copy` to make a deep copy. - -1. Wait for your secondary caches. Many controllers have primary and secondary resources. Primary resources are the resources that you'll be updating `Status` for. Secondary resources are resources that you'll be managing (creating/deleting) or using for lookups. - - Use the `framework.WaitForCacheSync` function to wait for your secondary caches before starting your primary sync functions. This will make sure that things like a Pod count for a ReplicaSet isn't working off of known out of date information that results in thrashing. - -1. There are other actors in the system. Just because you haven't changed an object doesn't mean that somebody else hasn't. - - Don't forget that the current state may change at any moment--it's not sufficient to just watch the desired state. If you use the absence of objects in the desired state to indicate that things in the current state should be deleted, make sure you don't have a bug in your observation code (e.g., act before your cache has filled). - -1. Percolate errors to the top level for consistent re-queuing. We have a `workqueue.RateLimitingInterface` to allow simple requeuing with reasonable backoffs. - - Your main controller func should return an error when requeuing is necessary. When it isn't, it should use `utilruntime.HandleError` and return nil instead. This makes it very easy for reviewers to inspect error handling cases and to be confident that your controller doesn't accidentally lose things it should retry for. - -1. Watches and Informers will “sync”. Periodically, they will deliver every matching object in the cluster to your `Update` method. This is good for cases where you may need to take additional action on the object, but sometimes you know there won't be more work to do. - - In cases where you are *certain* that you don't need to requeue items when there are no new changes, you can compare the resource version of the old and new objects. If they are the same, you skip requeuing the work. Be careful when you do this. If you ever skip requeuing your item on failures, you could fail, not requeue, and then never retry that item again. - -1. If the primary resource your controller is reconciling supports ObservedGeneration in its status, make sure you correctly set it to metadata.Generation whenever the values between the two fields mismatches. - - This lets clients know that the controller has processed a resource. Make sure that your controller is the main controller that is responsible for that resource, otherwise if you need to communicate observation via your own controller, you will need to create a different kind of ObservedGeneration in the Status of the resource. - -1. Consider using owner references for resources that result in the creation of other resources (eg. a ReplicaSet results in creating Pods). Thus you ensure that children resources are going to be garbage-collected once a resource managed by your controller is deleted. For more information on owner references, read more [here](/contributors/design-proposals/api-machinery/controller-ref.md). - - Pay special attention in the way you are doing adoption. You shouldn't adopt children for a resource when either the parent or the children are marked for deletion. If you are using a cache for your resources, you will likely need to bypass it with a direct API read in case you observe that an owner reference has been updated for one of the children. Thus, you ensure your controller is not racing with the garbage collector. - - See [k8s.io/kubernetes/pull/42938](https://github.com/kubernetes/kubernetes/pull/42938) for more information. - -## Rough Structure - -Overall, your controller should look something like this: - -```go -type Controller struct { - // pods gives cached access to pods. - pods informers.PodLister - podsSynced cache.InformerSynced - - // queue is where incoming work is placed to de-dup and to allow "easy" - // rate limited requeues on errors - queue workqueue.RateLimitingInterface -} - -func NewController(pods informers.PodInformer) *Controller { - c := &Controller{ - pods: pods.Lister(), - podsSynced: pods.Informer().HasSynced, - queue: workqueue.NewNamedRateLimitingQueue(workqueue.DefaultControllerRateLimiter(), "controller-name"), - } - - // register event handlers to fill the queue with pod creations, updates and deletions - pods.Informer().AddEventHandler(cache.ResourceEventHandlerFuncs{ - AddFunc: func(obj interface{}) { - key, err := cache.MetaNamespaceKeyFunc(obj) - if err == nil { - c.queue.Add(key) - } - }, - UpdateFunc: func(old interface{}, new interface{}) { - key, err := cache.MetaNamespaceKeyFunc(new) - if err == nil { - c.queue.Add(key) - } - }, - DeleteFunc: func(obj interface{}) { - // IndexerInformer uses a delta nodeQueue, therefore for deletes we have to use this - // key function. - key, err := cache.DeletionHandlingMetaNamespaceKeyFunc(obj) - if err == nil { - c.queue.Add(key) - } - }, - },) - - return c -} - -func (c *Controller) Run(threadiness int, stopCh chan struct{}) { - // don't let panics crash the process - defer utilruntime.HandleCrash() - // make sure the work queue is shutdown which will trigger workers to end - defer c.queue.ShutDown() - - glog.Infof("Starting <NAME> controller") - - // wait for your secondary caches to fill before starting your work - if !cache.WaitForCacheSync(stopCh, c.podsSynced) { - return - } - - // start up your worker threads based on threadiness. Some controllers - // have multiple kinds of workers - for i := 0; i < threadiness; i++ { - // runWorker will loop until "something bad" happens. The .Until will - // then rekick the worker after one second - go wait.Until(c.runWorker, time.Second, stopCh) - } - - // wait until we're told to stop - <-stopCh - glog.Infof("Shutting down <NAME> controller") -} - -func (c *Controller) runWorker() { - // hot loop until we're told to stop. processNextWorkItem will - // automatically wait until there's work available, so we don't worry - // about secondary waits - for c.processNextWorkItem() { - } -} - -// processNextWorkItem deals with one key off the queue. It returns false -// when it's time to quit. -func (c *Controller) processNextWorkItem() bool { - // pull the next work item from queue. It should be a key we use to lookup - // something in a cache - key, quit := c.queue.Get() - if quit { - return false - } - // you always have to indicate to the queue that you've completed a piece of - // work - defer c.queue.Done(key) - - // do your work on the key. This method will contains your "do stuff" logic - err := c.syncHandler(key.(string)) - if err == nil { - // if you had no error, tell the queue to stop tracking history for your - // key. This will reset things like failure counts for per-item rate - // limiting - c.queue.Forget(key) - return true - } - - // there was a failure so be sure to report it. This method allows for - // pluggable error handling which can be used for things like - // cluster-monitoring - utilruntime.HandleError(fmt.Errorf("%v failed with : %v", key, err)) - - // since we failed, we should requeue the item to work on later. This - // method will add a backoff to avoid hotlooping on particular items - // (they're probably still not going to work right away) and overall - // controller protection (everything I've done is broken, this controller - // needs to calm down or it can starve other useful work) cases. - c.queue.AddRateLimited(key) - - return true -} -``` +This file is a placeholder to preserve links. Please remove after 2019-07-01 or the release of kubernetes 1.15, whichever comes first. diff --git a/contributors/devel/cri-container-stats.md b/contributors/devel/cri-container-stats.md index c1176f05..10ea700c 100644 --- a/contributors/devel/cri-container-stats.md +++ b/contributors/devel/cri-container-stats.md @@ -1,121 +1,3 @@ -# Container Runtime Interface: Container Metrics - -[Container runtime interface -(CRI)](/contributors/devel/container-runtime-interface.md) -provides an abstraction for container runtimes to integrate with Kubernetes. -CRI expects the runtime to provide resource usage statistics for the -containers. - -## Background - -Historically Kubelet relied on the [cAdvisor](https://github.com/google/cadvisor) -library, an open-source project hosted in a separate repository, to retrieve -container metrics such as CPU and memory usage. These metrics are then aggregated -and exposed through Kubelet's [Summary -API](https://git.k8s.io/kubernetes/pkg/kubelet/apis/stats/v1alpha1/types.go) -for the monitoring pipeline (and other components) to consume. Any container -runtime (e.g., Docker and Rkt) integrated with Kubernetes needed to add a -corresponding package in cAdvisor to support tracking container and image file -system metrics. - -With CRI being the new abstraction for integration, it was a natural -progression to augment CRI to serve container metrics to eliminate a separate -integration point. - -*See the [core metrics design -proposal](/contributors/design-proposals/instrumentation/core-metrics-pipeline.md) -for more information on metrics exposed by Kubelet, and [monitoring -architecture](/contributors/design-proposals/instrumentation/monitoring_architecture.md) -for the evolving monitoring pipeline in Kubernetes.* - -# Container Metrics - -Kubelet is responsible for creating pod-level cgroups based on the Quality of -Service class to which the pod belongs, and passes this as a parent cgroup to the -runtime so that it can ensure all resources used by the pod (e.g., pod sandbox, -containers) will be charged to the cgroup. Therefore, Kubelet has the ability -to track resource usage at the pod level (using the built-in cAdvisor), and the -API enhancement focuses on the container-level metrics. - - -We include the only a set of metrics that are necessary to fulfill the needs of -Kubelet. As the requirements evolve over time, we may extend the API to support -more metrics. Below is the API with the metrics supported today. - -```go -// ContainerStats returns stats of the container. If the container does not -// exist, the call returns an error. -rpc ContainerStats(ContainerStatsRequest) returns (ContainerStatsResponse) {} -// ListContainerStats returns stats of all running containers. -rpc ListContainerStats(ListContainerStatsRequest) returns (ListContainerStatsResponse) {} -``` - -```go -// ContainerStats provides the resource usage statistics for a container. -message ContainerStats { - // Information of the container. - ContainerAttributes attributes = 1; - // CPU usage gathered from the container. - CpuUsage cpu = 2; - // Memory usage gathered from the container. - MemoryUsage memory = 3; - // Usage of the writable layer. - FilesystemUsage writable_layer = 4; -} - -// CpuUsage provides the CPU usage information. -message CpuUsage { - // Timestamp in nanoseconds at which the information were collected. Must be > 0. - int64 timestamp = 1; - // Cumulative CPU usage (sum across all cores) since object creation. - UInt64Value usage_core_nano_seconds = 2; -} - -// MemoryUsage provides the memory usage information. -message MemoryUsage { - // Timestamp in nanoseconds at which the information were collected. Must be > 0. - int64 timestamp = 1; - // The amount of working set memory in bytes. - UInt64Value working_set_bytes = 2; -} - -// FilesystemUsage provides the filesystem usage information. -message FilesystemUsage { - // Timestamp in nanoseconds at which the information were collected. Must be > 0. - int64 timestamp = 1; - // The underlying storage of the filesystem. - StorageIdentifier storage_id = 2; - // UsedBytes represents the bytes used for images on the filesystem. - // This may differ from the total bytes used on the filesystem and may not - // equal CapacityBytes - AvailableBytes. - UInt64Value used_bytes = 3; - // InodesUsed represents the inodes used by the images. - // This may not equal InodesCapacity - InodesAvailable because the underlying - // filesystem may also be used for purposes other than storing images. - UInt64Value inodes_used = 4; -} -``` - -There are three categories or resources: CPU, memory, and filesystem. Each of -the resource usage message includes a timestamp to indicate when the usage -statistics is collected. This is necessary because some resource usage (e.g., -filesystem) are inherently more expensive to collect and may be updated less -frequently than others. Having the timestamp allows the consumer to know how -stale/fresh the data is, while giving the runtime flexibility to adjust. - -Although CRI does not dictate the frequency of the stats update, Kubelet needs -a minimum guarantee of freshness of the stats for certain resources so that it -can reclaim them timely when under pressure. We will formulate the requirements -for any of such resources and include them in CRI in the near future. - - -*For more details on why we request cached stats with timestamps as opposed to -requesting stats on-demand, here is the [rationale](https://github.com/kubernetes/kubernetes/pull/45614#issuecomment-302258090) -behind it.* - -## Status - -The container metrics calls are added to CRI in Kubernetes 1.7, but Kubelet does not -yet use it to gather metrics from the runtime. We plan to enable Kubelet to -optionally consume the container metrics from the API in 1.8. +This file has moved to https://git.k8s.io/community/contributors/devel/sig-node/cri-container-stats.md. +This file is a placeholder to preserve links. Please remove after 2019-07-01 or the release of kubernetes 1.15, whichever comes first. diff --git a/contributors/devel/cri-testing-policy.md b/contributors/devel/cri-testing-policy.md index 73b48c5e..e0dec073 100644 --- a/contributors/devel/cri-testing-policy.md +++ b/contributors/devel/cri-testing-policy.md @@ -1,118 +1,3 @@ -# Container Runtime Interface: Testing Policy +This file has moved to https://git.k8s.io/community/contributors/devel/sig-node/cri-testing-policy.md. -**Owner: SIG-Node** - -This document describes testing policy and process for runtimes implementing the -[Container Runtime Interface (CRI)](/contributors/devel/container-runtime-interface.md) -to publish test results in a federated dashboard. The objective is to provide -the Kubernetes community an easy way to track the conformance, stability, and -supported features of a CRI runtime. - -This document focuses on Kubernetes node/cluster end-to-end (E2E) testing -because many features require integration of runtime, OS, or even the cloud -provider. A higher-level integration tests provider better signals on vertical -stack compatibility to the Kubernetes community. On the other hand, runtime -developers are strongly encouraged to run low-level -[CRI validation test suite](https://github.com/kubernetes-sigs/cri-tools/blob/master/docs/validation.md) -for validation as part of their development process. - -## Required and optional tests - -Runtime maintainers are **required** to submit the tests listed below. - 1. Node conformance test suite - 2. Node feature test suite - -Node E2E tests qualify an OS image with a pre-installed CRI runtime. The -runtime maintainers are free to choose any OS distribution, packaging, and -deployment mechanism. Please see the -[tutorial](https://github.com/kubernetes/community/blob/master/contributors/devel/e2e-node-tests.md) -to know more about the Node E2E test framework and tests for validating a -compatible OS image. - -The conformance suite is a set of platform-agnostic (e.g., OS, runtime, and -cloud provider) tests that validate the conformance of the OS image. The feature -suite allows the runtime to demonstrate what features are supported with the OS -distribution. - -In addition to the required tests, the runtime maintainers are *strongly -recommended to run and submit results from the Kubernetes conformance test -suite*. This cluster-level E2E test suite provides extra test signal for areas -such as Networking, which cannot be covered by CRI, or Node-level -tests. Because networking requires deep integration between the runtime, the -cloud provider, and/or other cluster components, runtime maintainers are -recommended to reach out to other relevant SIGs (e.g., SIG-GCP or SIG-AWS) for -guidance and/or sponsorship. - -## Process for publishing test results - -To publish tests results, please submit a proposal in the -[Kubernetes community repository](https://github.com/kubernetes/community) -briefly explaining your runtime, providing at least two maintainers, and -assigning the proposal to the leads of SIG-Node. - -These test results should be published under the `sig-node` tab, organized -as follows. - -``` -sig-node -> sig-node-cri-{Kubernetes-version} -> [page containing the required jobs] -``` - -Only the last three most recent Kubernetes versions and the master branch are -kept at any time. This is consistent with the Kubernetes release schedule and -policy. - -## Test job maintenance - -Tests are required to run at least nightly. - -The runtime maintainers are responsible for keeping the tests healthy. If the -tests are deemed not actively maintained, SIG-Node may remove the tests from -the test grid at their discretion. - -## Process for adding pre-submit testing - -If the tests are in good standing (i.e., consistently passing for more than 2 -weeks), the runtime maintainers may request that the tests to be included in the -pre-submit Pull Request (PR) tests. Please note that the pre-submit tests -require significantly higher testing capacity, and are held at a higher standard -since they directly affect the development velocity. - -If the tests are flaky or failing, and the maintainers are unable to respond and -fix the issues in a timely manner, the SIG leads may remove the runtime from -the presubmit tests until the issues are resolved. - -As of now, SIG-Node only accepts promotion of Node conformance tests to -pre-submit because Kubernetes conformance tests involve a wider scope and may -need co-sponsorships from other SIGs. - -## FAQ - - *1. Can runtime maintainers publish results from other E2E tests?* - -Yes, runtime maintainers can publish additional Node E2E tests results. These -test jobs will be displayed in the `sig-node-{runtime-name}` page. The same -policy for test maintenance applies. - -As for additional Cluster E2E tests, SIG-Node may agree to host the -results. However, runtime maintainers are strongly encouraged to seek for a more -appropriate SIG to sponsor or host the results. - - *2. Can these runtime-specific test jobs be considered release blocking?* - -This is beyond the authority of SIG-Node, and requires agreement and consensus -across multiple SIGs (e.g., Release, the relevant cloud provider SIG, etc). - - *3. How to run the aforementioned tests?* - -It is hard to keep instructions are even links to them up-to-date in one -document. Please contact the relevant SIGs for assistance. - - *4. How can I change the test-grid to publish the test results?* - -Please contact SIG-Node for the detailed instructions. - - *5. How does this policy apply to Windows containers?* - -Windows containers are still in the early development phase and the features -they support change rapidly. Therefore, it is suggested to treat it as a -feature with select, whitelisted tests to run. +This file is a placeholder to preserve links. Please remove after 2019-07-01 or the release of kubernetes 1.15, whichever comes first. diff --git a/contributors/devel/cri-validation.md b/contributors/devel/cri-validation.md index 84842c9b..23a04f02 100644 --- a/contributors/devel/cri-validation.md +++ b/contributors/devel/cri-validation.md @@ -1,53 +1,3 @@ -# Container Runtime Interface (CRI) Validation Testing +This file has moved to https://git.k8s.io/community/contributors/devel/sig-node/cri-validation.md. -CRI validation testing provides a test framework and a suite of tests to validate that the Container Runtime Interface (CRI) server implementation meets all the requirements. This allows the CRI runtime developers to verify that their runtime conforms to CRI, without needing to set up Kubernetes components or run Kubernetes end-to-end tests. - -CRI validation testing is GA since v1.11.0 and is hosted at the [cri-tools](https://github.com/kubernetes-sigs/cri-tools) repository. We encourage the CRI developers to report bugs or help extend the test coverage by adding more tests. - -## Install - -The test suites can be downloaded from cri-tools [release page](https://github.com/kubernetes-sigs/cri-tools/releases): - -```sh -VERSION="v1.11.0" -wget https://github.com/kubernetes-sigs/cri-tools/releases/download/$VERSION/critest-$VERSION-linux-amd64.tar.gz -sudo tar zxvf critest-$VERSION-linux-amd64.tar.gz -C /usr/local/bin -rm -f critest-$VERSION-linux-amd64.tar.gz -``` - -critest requires [ginkgo](https://github.com/onsi/ginkgo) to run parallel tests. It could be installed by - -```sh -go get -u github.com/onsi/ginkgo/ginkgo -``` - -*Note: ensure GO is installed and GOPATH is set before installing ginkgo.* - -## Running tests - -### Prerequisite - -Before running the test, you need to _ensure that the CRI server under test is running and listening on a Unix socket_. Because the validation tests are designed to request changes (e.g., create/delete) to the containers and verify that correct status is reported, it expects to be the only user of the CRI server. Please make sure that 1) there are no existing CRI-managed containers running on the node, and 2) no other processes (e.g., Kubelet) will interfere with the tests. - -### Run - -```sh -critest -``` - -This will - -- Connect to the shim of CRI container runtime -- Run the tests using `ginkgo` -- Output the test results to STDOUT - -critest connects to `unix:///var/run/dockershim.sock` by default. For other runtimes, the endpoint can be set by flags `-runtime-endpoint` and `-image-endpoint`. - -## Additional options - -- `-ginkgo.focus`: Only run the tests that match the regular expression. -- `-image-endpoint`: Set the endpoint of image service. Same with runtime-endpoint if not specified. -- `-runtime-endpoint`: Set the endpoint of runtime service. Default to `unix:///var/run/dockershim.sock`. -- `-ginkgo.skip`: Skip the tests that match the regular expression. -- `-parallel`: The number of parallel test nodes to run (default 1). ginkgo must be installed to run parallel tests. -- `-h`: Show help and all supported options. +This file is a placeholder to preserve links. Please remove after 2019-07-01 or the release of kubernetes 1.15, whichever comes first. diff --git a/contributors/devel/development.md b/contributors/devel/development.md index 60bb883c..8c305b14 100644 --- a/contributors/devel/development.md +++ b/contributors/devel/development.md @@ -167,7 +167,7 @@ Kubernetes uses [`godep`](https://github.com/tools/godep) to manage dependencies. Developers who need to manage dependencies in the `vendor/` tree should read -the docs on [using godep to manage dependencies](godep.md). +the docs on [using godep to manage dependencies](sig-architecture/godep.md). ## Build with Bazel/Gazel @@ -186,7 +186,7 @@ To check out code to work on, please refer to [this guide](/contributors/guide/g [build/common.sh]: https://git.k8s.io/kubernetes/build/common.sh [e2e-image]: https://git.k8s.io/test-infra/jenkins/e2e-image [etcd-latest]: https://coreos.com/etcd/docs/latest -[etcd-install]: testing.md#install-etcd-dependency +[etcd-install]: sig-testing/testing.md#install-etcd-dependency <!-- https://github.com/coreos/etcd/releases --> [go-workspace]: https://golang.org/doc/code.html#Workspaces [issue]: https://github.com/kubernetes/kubernetes/issues @@ -194,4 +194,4 @@ To check out code to work on, please refer to [this guide](/contributors/guide/g [kubernetes.io]: https://kubernetes.io [mercurial]: http://mercurial.selenic.com/wiki/Download [test-image]: https://git.k8s.io/test-infra/jenkins/test-image -[Build with Bazel]: bazel.md +[Build with Bazel]: sig-testing/bazel.md diff --git a/contributors/devel/e2e-node-tests.md b/contributors/devel/e2e-node-tests.md index 4f3327cb..815fe0b8 100644 --- a/contributors/devel/e2e-node-tests.md +++ b/contributors/devel/e2e-node-tests.md @@ -1,229 +1,3 @@ -# Node End-To-End tests - -Node e2e tests are component tests meant for testing the Kubelet code on a custom host environment. - -Tests can be run either locally or against a host running on GCE. - -Node e2e tests are run as both pre- and post- submit tests by the Kubernetes project. - -*Note: Linux only. Mac and Windows unsupported.* - -*Note: There is no scheduler running. The e2e tests have to do manual scheduling, e.g. by using `framework.PodClient`.* - -# Running tests - -## Locally - -Why run tests *Locally*? Much faster than running tests Remotely. - -Prerequisites: -- [Install etcd](https://github.com/coreos/etcd/releases) on your PATH - - Verify etcd is installed correctly by running `which etcd` - - Or make etcd binary available and executable at `/tmp/etcd` -- [Install ginkgo](https://github.com/onsi/ginkgo) on your PATH - - Verify ginkgo is installed correctly by running `which ginkgo` - -From the Kubernetes base directory, run: - -```sh -make test-e2e-node -``` - -This will: run the *ginkgo* binary against the subdirectory *test/e2e_node*, which will in turn: -- Ask for sudo access (needed for running some of the processes) -- Build the Kubernetes source code -- Pre-pull docker images used by the tests -- Start a local instance of *etcd* -- Start a local instance of *kube-apiserver* -- Start a local instance of *kubelet* -- Run the test using the locally started processes -- Output the test results to STDOUT -- Stop *kubelet*, *kube-apiserver*, and *etcd* - -## Remotely - -Why Run tests *Remotely*? Tests will be run in a customized pristine environment. Closely mimics what will be done -as pre- and post- submit testing performed by the project. - -Prerequisites: -- [join the googlegroup](https://groups.google.com/forum/#!forum/kubernetes-dev) -`kubernetes-dev@googlegroups.com` - - *This provides read access to the node test images.* -- Setup a [Google Cloud Platform](https://cloud.google.com/) account and project with Google Compute Engine enabled -- Install and setup the [gcloud sdk](https://cloud.google.com/sdk/downloads) - - Verify the sdk is setup correctly by running `gcloud compute instances list` and `gcloud compute images list --project kubernetes-node-e2e-images` - -Run: - -```sh -make test-e2e-node REMOTE=true -``` - -This will: -- Build the Kubernetes source code -- Create a new GCE instance using the default test image - - Instance will be called **test-e2e-node-containervm-v20160321-image** -- Lookup the instance public ip address -- Copy a compressed archive file to the host containing the following binaries: - - ginkgo - - kubelet - - kube-apiserver - - e2e_node.test (this binary contains the actual tests to be run) -- Unzip the archive to a directory under **/tmp/gcloud** -- Run the tests using the `ginkgo` command - - Starts etcd, kube-apiserver, kubelet - - The ginkgo command is used because this supports more features than running the test binary directly -- Output the remote test results to STDOUT -- `scp` the log files back to the local host under /tmp/_artifacts/e2e-node-containervm-v20160321-image -- Stop the processes on the remote host -- **Leave the GCE instance running** - -**Note: Subsequent tests run using the same image will *reuse the existing host* instead of deleting it and -provisioning a new one. To delete the GCE instance after each test see -*[DELETE_INSTANCE](#delete-instance-after-tests-run)*.** - - -# Additional Remote Options - -## Run tests using different images - -This is useful if you want to run tests against a host using a different OS distro or container runtime than -provided by the default image. - -List the available test images using gcloud. - -```sh -make test-e2e-node LIST_IMAGES=true -``` - -This will output a list of the available images for the default image project. - -Then run: - -```sh -make test-e2e-node REMOTE=true IMAGES="<comma-separated-list-images>" -``` - -## Run tests against a running GCE instance (not an image) - -This is useful if you have an host instance running already and want to run the tests there instead of on a new instance. - -```sh -make test-e2e-node REMOTE=true HOSTS="<comma-separated-list-of-hostnames>" -``` - -## Delete instance after tests run - -This is useful if you want recreate the instance for each test run to trigger flakes related to starting the instance. - -```sh -make test-e2e-node REMOTE=true DELETE_INSTANCES=true -``` - -## Keep instance, test binaries, and *processes* around after tests run - -This is useful if you want to manually inspect or debug the kubelet process run as part of the tests. - -```sh -make test-e2e-node REMOTE=true CLEANUP=false -``` - -## Run tests using an image in another project - -This is useful if you want to create your own host image in another project and use it for testing. - -```sh -make test-e2e-node REMOTE=true IMAGE_PROJECT="<name-of-project-with-images>" IMAGES="<image-name>" -``` - -Setting up your own host image may require additional steps such as installing etcd or docker. See -[setup_host.sh](https://git.k8s.io/kubernetes/test/e2e_node/environment/setup_host.sh) for common steps to setup hosts to run node tests. - -## Create instances using a different instance name prefix - -This is useful if you want to create instances using a different name so that you can run multiple copies of the -test in parallel against different instances of the same image. - -```sh -make test-e2e-node REMOTE=true INSTANCE_PREFIX="my-prefix" -``` - -# Additional Test Options for both Remote and Local execution - -## Only run a subset of the tests - -To run tests matching a regex: - -```sh -make test-e2e-node REMOTE=true FOCUS="<regex-to-match>" -``` - -To run tests NOT matching a regex: - -```sh -make test-e2e-node REMOTE=true SKIP="<regex-to-match>" -``` - -## Run tests continually until they fail - -This is useful if you are trying to debug a flaky test failure. This will cause ginkgo to continually -run the tests until they fail. **Note: this will only perform test setup once (e.g. creating the instance) and is -less useful for catching flakes related creating the instance from an image.** - -```sh -make test-e2e-node REMOTE=true RUN_UNTIL_FAILURE=true -``` - -## Run tests in parallel - -Running test in parallel can usually shorten the test duration. By default node -e2e test runs with`--nodes=8` (see ginkgo flag -[--nodes](https://onsi.github.io/ginkgo/#parallel-specs)). You can use the -`PARALLELISM` option to change the parallelism. - -```sh -make test-e2e-node PARALLELISM=4 # run test with 4 parallel nodes -make test-e2e-node PARALLELISM=1 # run test sequentially -``` - -## Run tests with kubenet network plugin - -[kubenet](http://kubernetes.io/docs/admin/network-plugins/#kubenet) is -the default network plugin used by kubelet since Kubernetes 1.3. The -plugin requires [CNI](https://github.com/containernetworking/cni) and -[nsenter](http://man7.org/linux/man-pages/man1/nsenter.1.html). - -Currently, kubenet is enabled by default for Remote execution `REMOTE=true`, -but disabled for Local execution. **Note: kubenet is not supported for -local execution currently. This may cause network related test result to be -different for Local and Remote execution. So if you want to run network -related test, Remote execution is recommended.** - -To enable/disable kubenet: - -```sh -# enable kubenet -make test-e2e-node TEST_ARGS='--kubelet-flags="--network-plugin=kubenet --network-plugin-dir=/opt/cni/bin"' -# disable kubenet -make test-e2e-node TEST_ARGS='--kubelet-flags="--network-plugin= --network-plugin-dir="' -``` - -## Additional QoS Cgroups Hierarchy level testing - -For testing with the QoS Cgroup Hierarchy enabled, you can pass --cgroups-per-qos flag as an argument into Ginkgo using TEST_ARGS - -```sh -make test_e2e_node TEST_ARGS="--cgroups-per-qos=true" -``` - -# Notes on tests run by the Kubernetes project during pre-, post- submit. - -The node e2e tests are run by the PR builder for each Pull Request and the results published at -the bottom of the comments section. To re-run just the node e2e tests from the PR builder add the comment -`@k8s-bot node e2e test this issue: #<Flake-Issue-Number or IGNORE>` and **include a link to the test -failure logs if caused by a flake.** - -The PR builder runs tests against the images listed in [jenkins-pull.properties](https://git.k8s.io/kubernetes/test/e2e_node/jenkins/jenkins-pull.properties) - -The post submit tests run against the images listed in [jenkins-ci.properties](https://git.k8s.io/kubernetes/test/e2e_node/jenkins/jenkins-ci.properties) +This file has moved to https://git.k8s.io/community/contributors/devel/sig-node/e2e-node-tests.md. +This file is a placeholder to preserve links. Please remove after 2019-07-01 or the release of kubernetes 1.15, whichever comes first. diff --git a/contributors/devel/e2e-tests.md b/contributors/devel/e2e-tests.md index 20698c49..f8427634 100644 --- a/contributors/devel/e2e-tests.md +++ b/contributors/devel/e2e-tests.md @@ -1,759 +1,3 @@ -# End-to-End Testing in Kubernetes +This file has moved to https://git.k8s.io/community/contributors/devel/sig-testing/e2e-tests.md. -**Table of Contents** - -- [End-to-End Testing in Kubernetes](#end-to-end-testing-in-kubernetes) - - [Overview](#overview) - - [Building Kubernetes and Running the Tests](#building-kubernetes-and-running-the-tests) - - [Cleaning up](#cleaning-up) - - [Advanced testing](#advanced-testing) - - [Extracting a specific version of kubernetes](#extracting-a-specific-version-of-kubernetes) - - [Bringing up a cluster for testing](#bringing-up-a-cluster-for-testing) - - [Federation e2e tests](#federation-e2e-tests) - - [Configuring federation e2e tests](#configuring-federation-e2e-tests) - - [Image Push Repository](#image-push-repository) - - [Build](#build) - - [Deploy federation control plane](#deploy-federation-control-plane) - - [Run the Tests](#run-the-tests) - - [Teardown](#teardown) - - [Shortcuts for test developers](#shortcuts-for-test-developers) - - [Debugging clusters](#debugging-clusters) - - [Local clusters](#local-clusters) - - [Testing against local clusters](#testing-against-local-clusters) - - [Version-skewed and upgrade testing](#version-skewed-and-upgrade-testing) - - [Test jobs naming convention](#test-jobs-naming-convention) - - [Kinds of tests](#kinds-of-tests) - - [Viper configuration and hierarchichal test parameters.](#viper-configuration-and-hierarchichal-test-parameters) - - [Conformance tests](#conformance-tests) - - [Continuous Integration](#continuous-integration) - - [What is CI?](#what-is-ci) - - [What runs in CI?](#what-runs-in-ci) - - [Non-default tests](#non-default-tests) - - [The PR-builder](#the-pr-builder) - - [Adding a test to CI](#adding-a-test-to-ci) - - [Moving a test out of CI](#moving-a-test-out-of-ci) - - [Performance Evaluation](#performance-evaluation) - - [One More Thing](#one-more-thing) - - -## Overview - -End-to-end (e2e) tests for Kubernetes provide a mechanism to test end-to-end -behavior of the system, and is the last signal to ensure end user operations -match developer specifications. Although unit and integration tests provide a -good signal, in a distributed system like Kubernetes it is not uncommon that a -minor change may pass all unit and integration tests, but cause unforeseen -changes at the system level. - -The primary objectives of the e2e tests are to ensure a consistent and reliable -behavior of the kubernetes code base, and to catch hard-to-test bugs before -users do, when unit and integration tests are insufficient. - -The e2e tests in kubernetes are built atop of -[Ginkgo](http://onsi.github.io/ginkgo/) and -[Gomega](http://onsi.github.io/gomega/). There are a host of features that this -Behavior-Driven Development (BDD) testing framework provides, and it is -recommended that the developer read the documentation prior to diving into the - tests. - -The purpose of *this* document is to serve as a primer for developers who are -looking to execute or add tests using a local development environment. - -Before writing new tests or making substantive changes to existing tests, you -should also read [Writing Good e2e Tests](writing-good-e2e-tests.md) - -## Building Kubernetes and Running the Tests - -There are a variety of ways to run e2e tests, but we aim to decrease the number -of ways to run e2e tests to a canonical way: `kubetest`. - -You can install `kubetest` as follows: -```sh -go get -u k8s.io/test-infra/kubetest -``` - -You can run an end-to-end test which will bring up a master and nodes, perform -some tests, and then tear everything down. Make sure you have followed the -getting started steps for your chosen cloud platform (which might involve -changing the --provider flag value to something other than "gce"). - -You can quickly recompile the e2e testing framework via `go install ./test/e2e`. -This will not do anything besides allow you to verify that the go code compiles. -If you want to run your e2e testing framework without re-provisioning the e2e setup, -you can do so via `make WHAT=test/e2e/e2e.test`, and then re-running the ginkgo tests. - -To build Kubernetes, up a cluster, run tests, and tear everything down, use: - -```sh -kubetest --build --up --test --down -``` - -If you'd like to just perform one of these steps, here are some examples: - -```sh -# Build binaries for testing -kubetest --build - -# Create a fresh cluster. Deletes a cluster first, if it exists -kubetest --up - -# Run all tests -kubetest --test - -# Run tests matching the regex "\[Feature:Performance\]" against a local cluster -# Specify "--provider=local" flag when running the tests locally -kubetest --test --test_args="--ginkgo.focus=\[Feature:Performance\]" --provider=local - -# Conversely, exclude tests that match the regex "Pods.*env" -kubetest --test --test_args="--ginkgo.skip=Pods.*env" - -# Run tests in parallel, skip any that must be run serially -GINKGO_PARALLEL=y kubetest --test --test_args="--ginkgo.skip=\[Serial\]" - -# Run tests in parallel, skip any that must be run serially and keep the test namespace if test failed -GINKGO_PARALLEL=y kubetest --test --test_args="--ginkgo.skip=\[Serial\] --delete-namespace-on-failure=false" - -# Flags can be combined, and their actions will take place in this order: -# --build, --up, --test, --down -# -# You can also specify an alternative provider, such as 'aws' -# -# e.g.: -kubetest --provider=aws --build --up --test --down - -# -ctl can be used to quickly call kubectl against your e2e cluster. Useful for -# cleaning up after a failed test or viewing logs. -# kubectl output is default on, you can use --verbose-commands=false to suppress output. -kubetest -ctl='get events' -kubetest -ctl='delete pod foobar' -``` - -The tests are built into a single binary which can be used to deploy a -Kubernetes system or run tests against an already-deployed Kubernetes system. -See `kubetest --help` (or the flag definitions in `hack/e2e.go`) for -more options, such as reusing an existing cluster. - -### Cleaning up - -During a run, pressing `control-C` should result in an orderly shutdown, but if -something goes wrong and you still have some VMs running you can force a cleanup -with this command: - -```sh -kubetest --down -``` - -## Advanced testing - -### Extracting a specific version of kubernetes - -The `kubetest` binary can download and extract a specific version of kubernetes, -both the server, client and test binaries. The `--extract=E` flag enables this -functionality. - -There are a variety of values to pass this flag: - -```sh -# Official builds: <ci|release>/<latest|stable>[-N.N] -kubetest --extract=ci/latest --up # Deploy the latest ci build. -kubetest --extract=ci/latest-1.5 --up # Deploy the latest 1.5 CI build. -kubetest --extract=release/latest --up # Deploy the latest RC. -kubetest --extract=release/stable-1.5 --up # Deploy the 1.5 release. - -# A specific version: -kubetest --extract=v1.5.1 --up # Deploy 1.5.1 -kubetest --extract=v1.5.2-beta.0 --up # Deploy 1.5.2-beta.0 -kubetest --extract=gs://foo/bar --up # --stage=gs://foo/bar - -# Whatever GKE is using (gke, gke-staging, gke-test): -kubetest --extract=gke --up # Deploy whatever GKE prod uses - -# Using a GCI version: -kubetest --extract=gci/gci-canary --up # Deploy the version for next gci release -kubetest --extract=gci/gci-57 # Deploy the version bound to gci m57 -kubetest --extract=gci/gci-57/ci/latest # Deploy the latest CI build using gci m57 for the VM image - -# Reuse whatever is already built -kubetest --up # Most common. Note, no extract flag -kubetest --build --up # Most common. Note, no extract flag -kubetest --build --stage=gs://foo/bar --extract=local --up # Extract the staged version -``` - -### Bringing up a cluster for testing - -If you want, you may bring up a cluster in some other manner and run tests -against it. To do so, or to do other non-standard test things, you can pass -arguments into Ginkgo using `--test_args` (e.g. see above). For the purposes of -brevity, we will look at a subset of the options, which are listed below: - -``` ---ginkgo.dryRun=false: If set, ginkgo will walk the test hierarchy without -actually running anything. - ---ginkgo.failFast=false: If set, ginkgo will stop running a test suite after a -failure occurs. - ---ginkgo.failOnPending=false: If set, ginkgo will mark the test suite as failed -if any specs are pending. - ---ginkgo.focus="": If set, ginkgo will only run specs that match this regular -expression. - ---ginkgo.noColor="n": If set to "y", ginkgo will not use color in the output - ---ginkgo.skip="": If set, ginkgo will only run specs that do not match this -regular expression. - ---ginkgo.trace=false: If set, default reporter prints out the full stack trace -when a failure occurs - ---ginkgo.v=false: If set, default reporter print out all specs as they begin. - ---host="": The host, or api-server, to connect to - ---kubeconfig="": Path to kubeconfig containing embedded authinfo. - ---provider="": The name of the Kubernetes provider (gce, gke, local, vagrant, -etc.) - ---repo-root="../../": Root directory of kubernetes repository, for finding test -files. -``` - -Prior to running the tests, you may want to first create a simple auth file in -your home directory, e.g. `$HOME/.kube/config`, with the following: - -``` -{ - "User": "root", - "Password": "" -} -``` - -As mentioned earlier there are a host of other options that are available, but -they are left to the developer. - -**NOTE:** If you are running tests on a local cluster repeatedly, you may need -to periodically perform some manual cleanup: - - - `rm -rf /var/run/kubernetes`, clear kube generated credentials, sometimes -stale permissions can cause problems. - - - `sudo iptables -F`, clear ip tables rules left by the kube-proxy. - -### Reproducing failures in flaky tests -You can run a test repeatedly until it fails. This is useful when debugging -flaky tests. In order to do so, you need to set the following environment -variable: -```sh -$ export GINKGO_UNTIL_IT_FAILS=true -``` - -After setting the environment variable, you can run the tests as before. The e2e -script adds `--untilItFails=true` to ginkgo args if the environment variable is -set. The flags asks ginkgo to run the test repeatedly until it fails. - -### Federation e2e tests - -By default, `e2e.go` provisions a single Kubernetes cluster, and any `Feature:Federation` ginkgo tests will be skipped. - -Federation e2e testing involve bringing up multiple "underlying" Kubernetes clusters, -and deploying the federation control plane as a Kubernetes application on the underlying clusters. - -The federation e2e tests are still managed via `e2e.go`, but require some extra configuration items. - -#### Configuring federation e2e tests - -The following environment variables will enable federation e2e building, provisioning and testing. - -```sh -$ export FEDERATION=true -$ export E2E_ZONES="us-central1-a us-central1-b us-central1-f" -``` - -A Kubernetes cluster will be provisioned in each zone listed in `E2E_ZONES`. A zone can only appear once in the `E2E_ZONES` list. - -#### Image Push Repository - -Next, specify the docker repository where your ci images will be pushed. - -* **If `--provider=gce` or `--provider=gke`**: - - If you use the same GCP project where you to run the e2e tests as the container image repository, - FEDERATION_PUSH_REPO_BASE environment variable will be defaulted to "gcr.io/${DEFAULT_GCP_PROJECT_NAME}". - You can skip ahead to the **Build** section. - - You can simply set your push repo base based on your project name, and the necessary repositories will be - auto-created when you first push your container images. - - ```sh - $ export FEDERATION_PUSH_REPO_BASE="gcr.io/${GCE_PROJECT_NAME}" - ``` - - Skip ahead to the **Build** section. - -* **For all other providers**: - - You'll be responsible for creating and managing access to the repositories manually. - - ```sh - $ export FEDERATION_PUSH_REPO_BASE="quay.io/colin_hom" - ``` - - Given this example, the `federation-apiserver` container image will be pushed to the repository - `quay.io/colin_hom/federation-apiserver`. - - The docker client on the machine running `e2e.go` must have push access for the following pre-existing repositories: - - * `${FEDERATION_PUSH_REPO_BASE}/federation-apiserver` - * `${FEDERATION_PUSH_REPO_BASE}/federation-controller-manager` - - These repositories must allow public read access, as the e2e node docker daemons will not have any credentials. If you're using - GCE/GKE as your provider, the repositories will have read-access by default. - -#### Build - -* Compile the binaries and build container images: - - ```sh - $ KUBE_RELEASE_RUN_TESTS=n KUBE_FASTBUILD=true kubetest -build - ``` - -* Push the federation container images - - ```sh - $ federation/develop/push-federation-images.sh - ``` - -#### Deploy federation control plane - -The following command will create the underlying Kubernetes clusters in each of `E2E_ZONES`, and then provision the -federation control plane in the cluster occupying the last zone in the `E2E_ZONES` list. - -```sh -$ kubetest --up -``` - -#### Run the Tests - -This will run only the `Feature:Federation` e2e tests. You can omit the `ginkgo.focus` argument to run the entire e2e suite. - -```sh -$ kubetest --test --test_args="--ginkgo.focus=\[Feature:Federation\]" -``` - -#### Teardown - -```sh -$ kubetest --down -``` - -#### Shortcuts for test developers - -* To speed up `--up`, provision a single-node kubernetes cluster in a single e2e zone: - - `NUM_NODES=1 E2E_ZONES="us-central1-f"` - - Keep in mind that some tests may require multiple underlying clusters and/or minimum compute resource availability. - -* If you're hacking around with the federation control plane deployment itself, - you can quickly re-deploy the federation control plane Kubernetes manifests without tearing any resources down. - To re-deploy the federation control plane after running `--up` for the first time: - - ```sh - $ federation/cluster/federation-up.sh - ``` - -### Debugging clusters - -If a cluster fails to initialize, or you'd like to better understand cluster -state to debug a failed e2e test, you can use the `cluster/log-dump.sh` script -to gather logs. - -This script requires that the cluster provider supports ssh. Assuming it does, -running: - -```sh -$ federation/cluster/log-dump.sh <directory> -``` - -will ssh to the master and all nodes and download a variety of useful logs to -the provided directory (which should already exist). - -The Google-run Jenkins builds automatically collected these logs for every -build, saving them in the `artifacts` directory uploaded to GCS. - -### Local clusters - -It can be much faster to iterate on a local cluster instead of a cloud-based -one. To start a local cluster, you can run: - -```sh -# The PATH construction is needed because PATH is one of the special-cased -# environment variables not passed by sudo -E -sudo PATH=$PATH hack/local-up-cluster.sh -``` - -This will start a single-node Kubernetes cluster than runs pods using the local -docker daemon. Press Control-C to stop the cluster. - -You can generate a valid kubeconfig file by following instructions printed at the -end of aforementioned script. - -#### Testing against local clusters - -In order to run an E2E test against a locally running cluster, first make sure -to have a local build of the tests: - -```sh -kubetest --build -``` - -Then point the tests at a custom host directly: - -```sh -export KUBECONFIG=/path/to/kubeconfig -kubetest --provider=local --test -``` - -To control the tests that are run: - -```sh -kubetest --provider=local --test --test_args="--ginkgo.focus=Secrets" -``` - -You will also likely need to specify `minStartupPods` to match the number of -nodes in your cluster. If you're testing against a cluster set up by -`local-up-cluster.sh`, you will need to do the following: - -```sh -kubetest --provider=local --test --test_args="--minStartupPods=1 --ginkgo.focus=Secrets" -``` - -### Version-skewed and upgrade testing - -We run version-skewed tests to check that newer versions of Kubernetes work -similarly enough to older versions. The general strategy is to cover the following cases: - -1. One version of `kubectl` with another version of the cluster and tests (e.g. - that v1.2 and v1.4 `kubectl` doesn't break v1.3 tests running against a v1.3 - cluster). -1. A newer version of the Kubernetes master with older nodes and tests (e.g. - that upgrading a master to v1.3 with nodes at v1.2 still passes v1.2 tests). -1. A newer version of the whole cluster with older tests (e.g. that a cluster - upgraded---master and nodes---to v1.3 still passes v1.2 tests). -1. That an upgraded cluster functions the same as a brand-new cluster of the - same version (e.g. a cluster upgraded to v1.3 passes the same v1.3 tests as - a newly-created v1.3 cluster). - -[kubetest](https://git.k8s.io/test-infra/kubetest) is -the authoritative source on how to run version-skewed tests, but below is a -quick-and-dirty tutorial. - -```sh -# Assume you have two copies of the Kubernetes repository checked out, at -# ./kubernetes and ./kubernetes_old - -# If using GKE: -export CLUSTER_API_VERSION=${OLD_VERSION} - -# Deploy a cluster at the old version; see above for more details -cd ./kubernetes_old -kubetest --up - -# Upgrade the cluster to the new version -# -# If using GKE, add --upgrade-target=${NEW_VERSION} -# -# You can target Feature:MasterUpgrade or Feature:ClusterUpgrade -cd ../kubernetes -kubetest --provider=gke --test --check-version-skew=false --test_args="--ginkgo.focus=\[Feature:MasterUpgrade\]" - -# Run old tests with new kubectl -cd ../kubernetes_old -kubetest --provider=gke --test --test_args="--kubectl-path=$(pwd)/../kubernetes/cluster/kubectl.sh" -``` - -If you are just testing version-skew, you may want to just deploy at one -version and then test at another version, instead of going through the whole -upgrade process: - -```sh -# With the same setup as above - -# Deploy a cluster at the new version -cd ./kubernetes -kubetest --up - -# Run new tests with old kubectl -kubetest --test --test_args="--kubectl-path=$(pwd)/../kubernetes_old/cluster/kubectl.sh" - -# Run old tests with new kubectl -cd ../kubernetes_old -kubetest --test --test_args="--kubectl-path=$(pwd)/../kubernetes/cluster/kubectl.sh" -``` - -#### Test jobs naming convention - -**Version skew tests** are named as -`<cloud-provider>-<master&node-version>-<kubectl-version>-<image-name>-kubectl-skew` -e.g: `gke-1.5-1.6-cvm-kubectl-skew` means cloud provider is GKE; -master and nodes are built from `release-1.5` branch; -`kubectl` is built from `release-1.6` branch; -image name is cvm (container_vm). -The test suite is always the older one in version skew tests. e.g. from release-1.5 in this case. - -**Upgrade tests**: - -If a test job name ends with `upgrade-cluster`, it means we first upgrade -the cluster (i.e. master and nodes) and then run the old test suite with new kubectl. - -If a test job name ends with `upgrade-cluster-new`, it means we first upgrade -the cluster (i.e. master and nodes) and then run the new test suite with new kubectl. - -If a test job name ends with `upgrade-master`, it means we first upgrade -the master and keep the nodes in old version and then run the old test suite with new kubectl. - -There are some examples in the table, -where `->` means upgrading; container_vm (cvm) and gci are image names. - -| test name | test suite | master version (image) | node version (image) | kubectl -| --------- | :--------: | :----: | :---:| :---: -| gce-1.5-1.6-upgrade-cluster | 1.5 | 1.5->1.6 | 1.5->1.6 | 1.6 -| gce-1.5-1.6-upgrade-cluster-new | 1.6 | 1.5->1.6 | 1.5->1.6 | 1.6 -| gce-1.5-1.6-upgrade-master | 1.5 | 1.5->1.6 | 1.5 | 1.6 -| gke-container_vm-1.5-container_vm-1.6-upgrade-cluster | 1.5 | 1.5->1.6 (cvm) | 1.5->1.6 (cvm) | 1.6 -| gke-gci-1.5-container_vm-1.6-upgrade-cluster-new | 1.6 | 1.5->1.6 (gci) | 1.5->1.6 (cvm) | 1.6 -| gke-gci-1.5-container_vm-1.6-upgrade-master | 1.5 | 1.5->1.6 (gci) | 1.5 (cvm) | 1.6 - -## Kinds of tests - -We are working on implementing clearer partitioning of our e2e tests to make -running a known set of tests easier (#10548). Tests can be labeled with any of -the following labels, in order of increasing precedence (that is, each label -listed below supersedes the previous ones): - - - If a test has no labels, it is expected to run fast (under five minutes), be -able to be run in parallel, and be consistent. - - - `[Slow]`: If a test takes more than five minutes to run (by itself or in -parallel with many other tests), it is labeled `[Slow]`. This partition allows -us to run almost all of our tests quickly in parallel, without waiting for the -stragglers to finish. - - - `[Serial]`: If a test cannot be run in parallel with other tests (e.g. it -takes too many resources or restarts nodes), it is labeled `[Serial]`, and -should be run in serial as part of a separate suite. - - - `[Disruptive]`: If a test restarts components that might cause other tests -to fail or break the cluster completely, it is labeled `[Disruptive]`. Any -`[Disruptive]` test is also assumed to qualify for the `[Serial]` label, but -need not be labeled as both. These tests are not run against soak clusters to -avoid restarting components. - - - `[Flaky]`: If a test is found to be flaky and we have decided that it's too -hard to fix in the short term (e.g. it's going to take a full engineer-week), it -receives the `[Flaky]` label until it is fixed. The `[Flaky]` label should be -used very sparingly, and should be accompanied with a reference to the issue for -de-flaking the test, because while a test remains labeled `[Flaky]`, it is not -monitored closely in CI. `[Flaky]` tests are by default not run, unless a -`focus` or `skip` argument is explicitly given. - - - `[Feature:.+]`: If a test has non-default requirements to run or targets -some non-core functionality, and thus should not be run as part of the standard -suite, it receives a `[Feature:.+]` label, e.g. `[Feature:Performance]` or -`[Feature:Ingress]`. `[Feature:.+]` tests are not run in our core suites, -instead running in custom suites. If a feature is experimental or alpha and is -not enabled by default due to being incomplete or potentially subject to -breaking changes, it does *not* block PR merges, and thus should run in -some separate test suites owned by the feature owner(s) -(see [Continuous Integration](#continuous-integration) below). - - - `[Conformance]`: Designate that this test is included in the Conformance -test suite for [Conformance Testing](conformance-tests.md). This test must -meet a number of [requirements](conformance-tests.md#conformance-test-requirements) -to be eligible for this tag. This tag does not supersed any other labels. - - - The following tags are not considered to be exhaustively applied, but are -intended to further categorize existing `[Conformance]` tests, or tests that are -being considered as candidate for promotion to `[Conformance]` as we work to -refine requirements: - - `[Privileged]`: This is a test that requires privileged access - - `[Internet]`: This is a test that assumes access to the public internet - - `[Deprecated]`: This is a test that exercises a deprecated feature - - `[Alpha]`: This is a test that exercises an alpha feature - - `[Beta]`: This is a test that exercises a beta feature - -Every test should be owned by a [SIG](/sig-list.md), -and have a corresponding `[sig-<name>]` label. - -### Viper configuration and hierarchichal test parameters. - -The future of e2e test configuration idioms will be increasingly defined using viper, and decreasingly via flags. - -Flags in general fall apart once tests become sufficiently complicated. So, even if we could use another flag library, it wouldn't be ideal. - -To use viper, rather than flags, to configure your tests: - -- Just add "e2e.json" to the current directory you are in, and define parameters in it... i.e. `"kubeconfig":"/tmp/x"`. - -Note that advanced testing parameters, and hierarchichally defined parameters, are only defined in viper, to see what they are, you can dive into [TestContextType](https://git.k8s.io/kubernetes/test/e2e/framework/test_context.go). - -In time, it is our intent to add or autogenerate a sample viper configuration that includes all e2e parameters, to ship with kubernetes. - -### Conformance tests - -For more information on Conformance tests please see the [Conformance Testing](conformance-tests.md) - -## Continuous Integration - -A quick overview of how we run e2e CI on Kubernetes. - -### What is CI? - -We run a battery of [release-blocking jobs](https://k8s-testgrid.appspot.com/sig-release-master-blocking) -against `HEAD` of the master branch on a continuous basis, and block merges -via [Tide](https://git.k8s.io/test-infra/prow/cmd/tide) on a subset of those -tests if they fail. - -CI results can be found at [ci-test.k8s.io](http://ci-test.k8s.io), e.g. -[ci-test.k8s.io/kubernetes-e2e-gce/10594](http://ci-test.k8s.io/kubernetes-e2e-gce/10594). - -### What runs in CI? - -We run all default tests (those that aren't marked `[Flaky]` or `[Feature:.+]`) -against GCE and GKE. To minimize the time from regression-to-green-run, we -partition tests across different jobs: - - - `kubernetes-e2e-<provider>` runs all non-`[Slow]`, non-`[Serial]`, -non-`[Disruptive]`, non-`[Flaky]`, non-`[Feature:.+]` tests in parallel. - - - `kubernetes-e2e-<provider>-slow` runs all `[Slow]`, non-`[Serial]`, -non-`[Disruptive]`, non-`[Flaky]`, non-`[Feature:.+]` tests in parallel. - - - `kubernetes-e2e-<provider>-serial` runs all `[Serial]` and `[Disruptive]`, -non-`[Flaky]`, non-`[Feature:.+]` tests in serial. - -We also run non-default tests if the tests exercise general-availability ("GA") -features that require a special environment to run in, e.g. -`kubernetes-e2e-gce-scalability` and `kubernetes-kubemark-gce`, which test for -Kubernetes performance. - -#### Non-default tests - -Many `[Feature:.+]` tests we don't run in CI. These tests are for features that -are experimental (often in the `experimental` API), and aren't enabled by -default. - -### The PR-builder - -We also run a battery of tests against every PR before we merge it. These tests -are equivalent to `kubernetes-gce`: it runs all non-`[Slow]`, non-`[Serial]`, -non-`[Disruptive]`, non-`[Flaky]`, non-`[Feature:.+]` tests in parallel. These -tests are considered "smoke tests" to give a decent signal that the PR doesn't -break most functionality. Results for your PR can be found at -[pr-test.k8s.io](http://pr-test.k8s.io), e.g. -[pr-test.k8s.io/20354](http://pr-test.k8s.io/20354) for #20354. - -### Adding a test to CI - -As mentioned above, prior to adding a new test, it is a good idea to perform a -`-ginkgo.dryRun=true` on the system, in order to see if a behavior is already -being tested, or to determine if it may be possible to augment an existing set -of tests for a specific use case. - -If a behavior does not currently have coverage and a developer wishes to add a -new e2e test, navigate to the ./test/e2e directory and create a new test using -the existing suite as a guide. - -**NOTE:** To build/run with tests in a new directory within ./test/e2e, add the -directory to import list in ./test/e2e/e2e_test.go - -TODO(#20357): Create a self-documented example which has been disabled, but can -be copied to create new tests and outlines the capabilities and libraries used. - -When writing a test, consult #kinds-of-tests above to determine how your test -should be marked, (e.g. `[Slow]`, `[Serial]`; remember, by default we assume a -test can run in parallel with other tests!). - -When first adding a test it should *not* go straight into CI, because failures -block ordinary development. A test should only be added to CI after is has been -running in some non-CI suite long enough to establish a track record showing -that the test does not fail when run against *working* software. Note also that -tests running in CI are generally running on a well-loaded cluster, so must -contend for resources; see above about [kinds of tests](#kinds_of_tests). - -Generally, a feature starts as `experimental`, and will be run in some suite -owned by the team developing the feature. If a feature is in beta or GA, it -*should* block PR merges and releases. In moving from experimental to beta or GA, tests -that are expected to pass by default should simply remove the `[Feature:.+]` -label, and will be incorporated into our core suites. If tests are not expected -to pass by default, (e.g. they require a special environment such as added -quota,) they should remain with the `[Feature:.+]` label. - -Occasionally, we'll want to add tests to better exercise features that are -already GA. These tests also shouldn't go straight to CI. They should begin by -being marked as `[Flaky]` to be run outside of CI, and once a track-record for -them is established, they may be promoted out of `[Flaky]`. - -### Moving a test out of CI - -If we have determined that a test is known-flaky and cannot be fixed in the -short-term, we may move it out of CI indefinitely. This move should be used -sparingly, as it effectively means that we have no coverage of that test. When a -test is demoted, it should be marked `[Flaky]` with a comment accompanying the -label with a reference to an issue opened to fix the test. - -## Performance Evaluation - -Another benefit of the e2e tests is the ability to create reproducible loads on -the system, which can then be used to determine the responsiveness, or analyze -other characteristics of the system. For example, the density tests load the -system to 30,50,100 pods per/node and measures the different characteristics of -the system, such as throughput, api-latency, etc. - -For a good overview of how we analyze performance data, please read the -following [post](https://kubernetes.io/blog/2015/09/kubernetes-performance-measurements-and/) - -For developers who are interested in doing their own performance analysis, we -recommend setting up [prometheus](http://prometheus.io/) for data collection, -and using [grafana](https://prometheus.io/docs/visualization/grafana/) to -visualize the data. There also exists the option of pushing your own metrics in -from the tests using a -[prom-push-gateway](http://prometheus.io/docs/instrumenting/pushing/). -Containers for all of these components can be found -[here](https://hub.docker.com/u/prom/). - -For more accurate measurements, you may wish to set up prometheus external to -kubernetes in an environment where it can access the major system components -(api-server, controller-manager, scheduler). This is especially useful when -attempting to gather metrics in a load-balanced api-server environment, because -all api-servers can be analyzed independently as well as collectively. On -startup, configuration file is passed to prometheus that specifies the endpoints -that prometheus will scrape, as well as the sampling interval. - -``` -#prometheus.conf -job: { - name: "kubernetes" - scrape_interval: "1s" - target_group: { - # apiserver(s) - target: "http://localhost:8080/metrics" - # scheduler - target: "http://localhost:10251/metrics" - # controller-manager - target: "http://localhost:10252/metrics" - } -} -``` - -Once prometheus is scraping the kubernetes endpoints, that data can then be -plotted using promdash, and alerts can be created against the assortment of -metrics that kubernetes provides. - -## One More Thing - -You should also know the [testing conventions](../guide/coding-conventions.md#testing-conventions). - -**HAPPY TESTING!** +This file is a placeholder to preserve links. Please remove after 2019-07-01 or the release of kubernetes 1.15, whichever comes first. diff --git a/contributors/devel/event-style-guide.md b/contributors/devel/event-style-guide.md index bc4ba22b..d91c62ac 100644 --- a/contributors/devel/event-style-guide.md +++ b/contributors/devel/event-style-guide.md @@ -1,51 +1,3 @@ -# Event style guide - -Status: During Review - -Author: Marek Grabowski (gmarek@) - -## Why the guide? - -The Event API change proposal is the first step towards having useful Events in the system. Another step is to formalize the Event style guide, i.e. set of properties that developers need to ensure when adding new Events to the system. This is necessary to ensure that we have a system in which all components emit consistently structured Events. - -## When to emit an Event? - -Events are expected to provide important insights for the application developer/operator on the state of their application. Events relevant to cluster administrators are acceptable, as well, though they usually also have the option of looking at component logs. Events are much more expensive than logs, thus they're not expected to provide in-depth system debugging information. Instead concentrate on things that are important from the application developer's perspective. Events need to be either actionable, or be useful to understand past or future system's behavior. Events are not intended to drive automation. Watching resource status should be sufficient for controllers. - -Following are the guidelines for adding Events to the system. Those are not hard-and-fast rules, but should be considered by all contributors adding new Events and members doing reviews. -1. Emit events only when state of the system changes/attempts to change. Events "it's still running" are not interesting. Also, changes that do not add information beyond what is observable by watching the altered resources should not be duplicated as events. Note that adding a reason for some action that can't be inferred from the state change is considered additional information. -1. Limit Events to no more than one per change/attempt. There's no need for Events on "About to do X" AND "Did X"/"Failed to do X". Result is more interesting and implies an attempt. - 1. It may give impression that this gets tricky with scale events, e.g. Deployment scales ReplicaSet which creates/deletes Pods. For us those are 3 (or more) separate Events (3 different objects are affected) so it's fine to emit multiple Events. -1. When an error occurs that prevents a user application from starting or from enacting other normal system behavior, such as object creation, an Event should be emitted (e.g. invalid image). - 1. Note that Events are garbage collected so every user-actionable error needs to be surfaced via resource status as well. - 1. It's usually OK to emit failure Events for each failure. Dedup mechanism will deal with that. The exception is failures that are frequent but typically ephemeral and automatically repairable/recoverable, such as broken socket connections, in which case they should only be reported if persistent and unrepairable, in order to mitigate event spam. -1. When a user application stops running for any reason, an Event should be emitted (e.g. Pod evicted because Node is under memory pressure) -1. If it's a system-wide change of state that may impact currently running applications or have an may have severe impact on future workload schedulability, an Event should be emitted (e.g. Node became unreachable, 1. Failed to create route for Node). -1. If it doesn't fit any of above scenarios you should consider not emitting Event. - -## How to structure an Event? -New Event API tries to use more descriptive field names to influence how Events are structured. Event has following fields: -* Regarding -* Related -* ReportingController -* ReportingInstance -* Action -* Reason -* Type -* Note - -The Event should be structured in a way that following sentence "makes sense": -"Regarding <Event.Regarding>: <Event.Action> <Event.Related> - <Event.Reason>", e.g. -* Regarding Node X: BecameNotReady - NodeUnreachable -* Regarding Pod X: ScheduledOnNode Node Y - <nil> -* Regarding PVC X: BoundToNode Node Y - <nil> -* Regarding Pod X: KilledContainer Container Y - NodeMemoryPressure - -1. ReportingController is a type of a Controller reporting an Event, e.g. k8s.io/node-controller, k8s.io/kubelet. There will be a standard list for controller names for Kubernetes components. Third-party components must namespace themselves in the same manner as label keys. Validation ensures it's a proper qualified name. This shouldn’t be needed in order for users to understand the event, but is provided in case the controller’s logs need to be accessed for further debugging. -1. ReportingInstance is an identifier of the instance of the ReportingController which needs to uniquely identify it. I.e. host name can be used only for controllers that are guaranteed to be unique on the host. This requirement isn't met e.g. for scheduler, so it may need a secondary index. For singleton controllers use Node name (or hostname if controller is not running on the Node). Can have at most 128 alpha-numeric characters. -1. Regarding and Related are ObjectReferences. Regarding should represent the object that's implemented by the ReportingController, Related can contain additional information about another object that takes part in or is affected by the Action (see examples). -1. Action is a low-cardinality (meaning that there's a restricted, predefined set of values allowed) CamelCase string field (i.e. its value has to be determined at compile time) that explains what happened with Regarding/what action did the ReportingController take in Regarding's name. The tuple of {ReportingController, Action, Reason} must be unique, such that a user could look up documentation. Can have at most 128 characters. -1. Reason is a low-cardinality CamelCase string field (i.e. its value has to be determined at compile time) that explains why ReportingController took Action. Can have at most 128 characters. -1. Type can be either "Normal" or "Warning". "Warning" types are reserved for Events that represent a situation that's not expected in a healthy cluster and/or healthy workload: something unexpected and/or undesirable, at least if it occurs frequently enough and/or for a long enough duration. -1. Note can contain an arbitrary, high-cardinality, user readable summary of the Event. This field can lose data if deduplication is triggered. Can have at most 1024 characters. +This file has moved to https://git.k8s.io/community/contributors/devel/sig-instrumentation/event-style-guide.md. +This file is a placeholder to preserve links. Please remove after 2019-07-01 or the release of kubernetes 1.15, whichever comes first. diff --git a/contributors/devel/flaky-tests.md b/contributors/devel/flaky-tests.md index 14302592..13eb57fb 100644 --- a/contributors/devel/flaky-tests.md +++ b/contributors/devel/flaky-tests.md @@ -1,201 +1,3 @@ -# Flaky tests - -Any test that fails occasionally is "flaky". Since our merges only proceed when -all tests are green, and we have a number of different CI systems running the -tests in various combinations, even a small percentage of flakes results in a -lot of pain for people waiting for their PRs to merge. - -Therefore, it's very important that we write tests defensively. Situations that -"almost never happen" happen with some regularity when run thousands of times in -resource-constrained environments. Since flakes can often be quite hard to -reproduce while still being common enough to block merges occasionally, it's -additionally important that the test logs be useful for narrowing down exactly -what caused the failure. - -Note that flakes can occur in unit tests, integration tests, or end-to-end -tests, but probably occur most commonly in end-to-end tests. - -## Hunting Flakes - -You may notice lots of your PRs or ones you watch are having a common -pre-submit failure, but less frequent issues that are still of concern take -more analysis over time. There are metrics recorded and viewable in: -- [TestGrid](https://k8s-testgrid.appspot.com/presubmits-kubernetes-blocking#Summary) -- [Velodrome](http://velodrome.k8s.io/dashboard/db/bigquery-metrics?orgId=1) - -It is worth noting tests are going to fail in presubmit a lot due -to unbuildable code, but that wont happen as much on the same commit unless -there's a true issue in the code or a broader problem like a dep failed to -pull in. - -## Filing issues for flaky tests - -Because flakes may be rare, it's very important that all relevant logs be -discoverable from the issue. - -1. Search for the test name. If you find an open issue and you're 90% sure the - flake is exactly the same, add a comment instead of making a new issue. -2. If you make a new issue, you should title it with the test name, prefixed by - "e2e/unit/integration flake:" (whichever is appropriate) -3. Reference any old issues you found in step one. Also, make a comment in the - old issue referencing your new issue, because people monitoring only their - email do not see the backlinks github adds. Alternatively, tag the person or - people who most recently worked on it. -4. Paste, in block quotes, the entire log of the individual failing test, not - just the failure line. -5. Link to durable storage with the rest of the logs. This means (for all the - tests that Google runs) the GCS link is mandatory! The Jenkins test result - link is nice but strictly optional: not only does it expire more quickly, - it's not accessible to non-Googlers. - -## Finding failed flaky test cases - -Find flaky tests issues on GitHub under the [kind/flake issue label][flake]. -There are significant numbers of flaky tests reported on a regular basis and P2 -flakes are under-investigated. Fixing flakes is a quick way to gain expertise -and community goodwill. - -[flake]: https://github.com/kubernetes/kubernetes/issues?q=is%3Aopen+is%3Aissue+label%3Akind%2Fflake - -## Expectations when a flaky test is assigned to you - -Note that we won't randomly assign these issues to you unless you've opted in or -you're part of a group that has opted in. We are more than happy to accept help -from anyone in fixing these, but due to the severity of the problem when merges -are blocked, we need reasonably quick turn-around time on test flakes. Therefore -we have the following guidelines: - -1. If a flaky test is assigned to you, it's more important than anything else - you're doing unless you can get a special dispensation (in which case it will - be reassigned). If you have too many flaky tests assigned to you, or you - have such a dispensation, then it's *still* your responsibility to find new - owners (this may just mean giving stuff back to the relevant Team or SIG Lead). -2. You should make a reasonable effort to reproduce it. Somewhere between an - hour and half a day of concentrated effort is "reasonable". It is perfectly - reasonable to ask for help! -3. If you can reproduce it (or it's obvious from the logs what happened), you - should then be able to fix it, or in the case where someone is clearly more - qualified to fix it, reassign it with very clear instructions. -4. Once you have made a change that you believe fixes a flake, it is conservative - to keep the issue for the flake open and see if it manifests again after the - change is merged. -5. If you can't reproduce a flake: __don't just close it!__ Every time a flake comes - back, at least 2 hours of merge time is wasted. So we need to make monotonic - progress towards narrowing it down every time a flake occurs. If you can't - figure it out from the logs, add log messages that would have help you figure - it out. If you make changes to make a flake more reproducible, please link - your pull request to the flake you're working on. -6. If a flake has been open, could not be reproduced, and has not manifested in - 3 months, it is reasonable to close the flake issue with a note saying - why. - -# Reproducing unit test flakes - -Try the [stress command](https://godoc.org/golang.org/x/tools/cmd/stress). - -Just - -``` -$ go install golang.org/x/tools/cmd/stress -``` - -Then build your test binary - -``` -$ go test -c -race -``` - -Then run it under stress - -``` -$ stress ./package.test -test.run=FlakyTest -``` - -It runs the command and writes output to `/tmp/gostress-*` files when it fails. -It periodically reports with run counts. Be careful with tests that use the -`net/http/httptest` package; they could exhaust the available ports on your -system! - -# Hunting flaky unit tests in Kubernetes - -Sometimes unit tests are flaky. This means that due to (usually) race -conditions, they will occasionally fail, even though most of the time they pass. - -We have a goal of 99.9% flake free tests. This means that there is only one -flake in one thousand runs of a test. - -Running a test 1000 times on your own machine can be tedious and time consuming. -Fortunately, there is a better way to achieve this using Kubernetes. - -_Note: these instructions are mildly hacky for now, as we get run once semantics -and logging they will get better_ - -There is a testing image `brendanburns/flake` up on the docker hub. We will use -this image to test our fix. - -Create a replication controller with the following config: - -```yaml -apiVersion: v1 -kind: ReplicationController -metadata: - name: flakecontroller -spec: - replicas: 24 - template: - metadata: - labels: - name: flake - spec: - containers: - - name: flake - image: brendanburns/flake - env: - - name: TEST_PACKAGE - value: pkg/tools - - name: REPO_SPEC - value: https://github.com/kubernetes/kubernetes -``` - -Note that we omit the labels and the selector fields of the replication -controller, because they will be populated from the labels field of the pod -template by default. - -```sh -kubectl create -f ./controller.yaml -``` - -This will spin up 24 instances of the test. They will run to completion, then -exit, and the kubelet will restart them, accumulating more and more runs of the -test. - -You can examine the recent runs of the test by calling `docker ps -a` and -looking for tasks that exited with non-zero exit codes. Unfortunately, docker -ps -a only keeps around the exit status of the last 15-20 containers with the -same image, so you have to check them frequently. - -You can use this script to automate checking for failures, assuming your cluster -is running on GCE and has four nodes: - -```sh -echo "" > output.txt -for i in {1..4}; do - echo "Checking kubernetes-node-${i}" - echo "kubernetes-node-${i}:" >> output.txt - gcloud compute ssh "kubernetes-node-${i}" --command="sudo docker ps -a" >> output.txt -done -grep "Exited ([^0])" output.txt -``` - -Eventually you will have sufficient runs for your purposes. At that point you -can delete the replication controller by running: - -```sh -kubectl delete replicationcontroller flakecontroller -``` - -If you do a final check for flakes with `docker ps -a`, ignore tasks that -exited -1, since that's what happens when you stop the replication controller. - -Happy flake hunting! +This file has moved to https://git.k8s.io/community/contributors/devel/sig-testing/flaky-tests.md. +This file is a placeholder to preserve links. Please remove after 2019-07-01 or the release of kubernetes 1.15, whichever comes first. diff --git a/contributors/devel/flexvolume.md b/contributors/devel/flexvolume.md index 12c46382..f731c6df 100644 --- a/contributors/devel/flexvolume.md +++ b/contributors/devel/flexvolume.md @@ -1,155 +1,3 @@ -# Flexvolume +This file has moved to https://git.k8s.io/community/contributors/devel/sig-storage/flexvolume.md. -Flexvolume enables users to write their own drivers and add support for their volumes in Kubernetes. Vendor drivers should be installed in the volume plugin path on every node, and on master if the driver requires attach capability (unless `--enable-controller-attach-detach` Kubelet option is set to false, but this is highly discouraged because it is a legacy mode of operation). - -Flexvolume is a GA feature from Kubernetes 1.8 release onwards. - -## Prerequisites - -Install the vendor driver on all nodes (also on master nodes if "--enable-controller-attach-detach" Kubelet option is enabled) in the plugin path. Path for installing the plugin: `<plugindir>/<vendor~driver>/<driver>`. The default plugin directory is `/usr/libexec/kubernetes/kubelet-plugins/volume/exec/`. It can be changed in kubelet via the `--volume-plugin-dir` flag, and in controller manager via the `--flex-volume-plugin-dir` flag. - -For example to add a `cifs` driver, by vendor `foo` install the driver at: `/usr/libexec/kubernetes/kubelet-plugins/volume/exec/foo~cifs/cifs` - -The vendor and driver names must match flexVolume.driver in the volume spec, with '~' replaced with '/'. For example, if `flexVolume.driver` is set to `foo/cifs`, then the vendor is `foo`, and driver is `cifs`. - -## Dynamic Plugin Discovery -Beginning in v1.8, Flexvolume supports the ability to detect drivers on the fly. Instead of requiring drivers to exist at system initialization time or having to restart kubelet or controller manager, drivers can be installed, upgraded/downgraded, and uninstalled while the system is running. -For more information, please refer to the [design document](/contributors/design-proposals/storage/flexvolume-deployment.md). - -## Automated Plugin Installation/Upgrade -One possible way to install and upgrade your Flexvolume drivers is by using a DaemonSet. See [Recommended Driver Deployment Method](/contributors/design-proposals/storage/flexvolume-deployment.md#recommended-driver-deployment-method) for details, and see [here](https://git.k8s.io/examples/staging/volumes/flexvolume/deploy/) for an example. - -## Plugin details -The plugin expects the following call-outs are implemented for the backend drivers. Some call-outs are optional. Call-outs are invoked from Kubelet and Controller Manager. - -### Driver invocation model: - -#### Init: -Initializes the driver. Called during Kubelet & Controller manager initialization. On success, the function returns a capabilities map showing whether each Flexvolume capability is supported by the driver. -Current capabilities: -* `attach` - a boolean field indicating whether the driver requires attach and detach operations. This field is *required*, although for backward-compatibility the default value is set to `true`, i.e. requires attach and detach. -See [Driver output](#driver-output) for the capabilities map format. -``` -<driver executable> init -``` - -#### Attach: -Attach the volume specified by the given spec on the given node. On success, returns the device path where the device is attached on the node. Called from Controller Manager. - -This call-out does not pass "secrets" specified in Flexvolume spec. If your driver requires secrets, do not implement this call-out and instead use "mount" call-out and implement attach and mount in that call-out. - -``` -<driver executable> attach <json options> <node name> -``` - -#### Detach: -Detach the volume from the node. Called from Controller Manager. -``` -<driver executable> detach <mount device> <node name> -``` - -#### Wait for attach: -Wait for the volume to be attached on the remote node. On success, the path to the device is returned. Called from Controller Manager. The timeout should be 10m (based on https://git.k8s.io/kubernetes/pkg/kubelet/volumemanager/volume_manager.go#L88 ) - -``` -<driver executable> waitforattach <mount device> <json options> -``` - -#### Volume is Attached: -Check the volume is attached on the node. Called from Controller Manager. - -``` -<driver executable> isattached <json options> <node name> -``` - -#### Mount device: -Mount device mounts the device to a global path which individual pods can then bind mount. Called only from Kubelet. - -This call-out does not pass "secrets" specified in Flexvolume spec. If your driver requires secrets, do not implement this call-out and instead use "mount" call-out and implement attach and mount in that call-out. - -``` -<driver executable> mountdevice <mount dir> <mount device> <json options> -``` - -#### Unmount device: -Unmounts the global mount for the device. This is called once all bind mounts have been unmounted. Called only from Kubelet. - -``` -<driver executable> unmountdevice <mount device> -``` -In addition to the user-specified options and [default JSON options](#default-json-options), the following options capturing information about the pod are passed through and generated automatically. - -``` -kubernetes.io/pod.name -kubernetes.io/pod.namespace -kubernetes.io/pod.uid -kubernetes.io/serviceAccount.name -``` - -#### Mount: -Mount the volume at the mount dir. This call-out defaults to bind mount for drivers which implement attach & mount-device call-outs. Called only from Kubelet. - -``` -<driver executable> mount <mount dir> <json options> -``` - -#### Unmount: -Unmount the volume. This call-out defaults to bind mount for drivers which implement attach & mount-device call-outs. Called only from Kubelet. - -``` -<driver executable> unmount <mount dir> -``` - -See [lvm] & [nfs] for a quick example on how to write a simple flexvolume driver. - -### Driver output: - -Flexvolume expects the driver to reply with the status of the operation in the -following format. - -``` -{ - "status": "<Success/Failure/Not supported>", - "message": "<Reason for success/failure>", - "device": "<Path to the device attached. This field is valid only for attach & waitforattach call-outs>" - "volumeName": "<Cluster wide unique name of the volume. Valid only for getvolumename call-out>" - "attached": <True/False (Return true if volume is attached on the node. Valid only for isattached call-out)> - "capabilities": <Only included as part of the Init response> - { - "attach": <True/False (Return true if the driver implements attach and detach)> - } -} -``` - -### Default Json options - -In addition to the flags specified by the user in the Options field of the FlexVolumeSource, the following flags (set through their corresponding FlexVolumeSource fields) are also passed to the executable. -Note: Secrets are passed only to "mount/unmount" call-outs. - -``` -"kubernetes.io/fsType":"<FS type>", -"kubernetes.io/readwrite":"<rw>", -"kubernetes.io/fsGroup":"<FS group>", -"kubernetes.io/mountsDir":"<string>", -"kubernetes.io/pvOrVolumeName":"<Volume name if the volume is in-line in the pod spec; PV name if the volume is a PV>" - -"kubernetes.io/pod.name":"<string>", -"kubernetes.io/pod.namespace":"<string>", -"kubernetes.io/pod.uid":"<string>", -"kubernetes.io/serviceAccount.name":"<string>", - -"kubernetes.io/secret/key1":"<secret1>" -... -"kubernetes.io/secret/keyN":"<secretN>" -``` - -### Example of Flexvolume - -Please refer to the [Flexvolume example directory]. See [nginx-lvm.yaml] & [nginx-nfs.yaml] for a quick example on how to use Flexvolume in a pod. - - -[lvm]: https://git.k8s.io/examples/staging/volumes/flexvolume/lvm -[nfs]: https://git.k8s.io/examples/staging/volumes/flexvolume/nfs -[nginx-lvm.yaml]: https://git.k8s.io/examples/staging/volumes/flexvolume/nginx-lvm.yaml -[nginx-nfs.yaml]: https://git.k8s.io/examples/staging/volumes/flexvolume/nginx-nfs.yaml -[Flexvolume example directory]: https://git.k8s.io/examples/staging/volumes/flexvolume/ +This file is a placeholder to preserve links. Please remove after 2019-07-01 or the release of kubernetes 1.15, whichever comes first. diff --git a/contributors/devel/generating-clientset.md b/contributors/devel/generating-clientset.md index bf12e92c..4141df61 100644 --- a/contributors/devel/generating-clientset.md +++ b/contributors/devel/generating-clientset.md @@ -1,50 +1,3 @@ -# Generation and release cycle of clientset +This file has moved to https://git.k8s.io/community/contributors/devel/sig-api-machinery/generating-clientset.md. -Client-gen is an automatic tool that generates [clientset](../design-proposals/api-machinery/client-package-structure.md#high-level-client-sets) based on API types. This doc introduces the use of client-gen, and the release cycle of the generated clientsets. - -## Using client-gen - -The workflow includes three steps: - -**1.** Marking API types with tags: in `pkg/apis/${GROUP}/${VERSION}/types.go`, mark the types (e.g., Pods) that you want to generate clients for with the `// +genclient` tag. If the resource associated with the type is not namespace scoped (e.g., PersistentVolume), you need to append the `// +genclient:nonNamespaced` tag as well. - -The following `// +genclient` are supported: - -- `// +genclient` - generate default client verb functions (*create*, *update*, *delete*, *get*, *list*, *update*, *patch*, *watch* and depending on the existence of `.Status` field in the type the client is generated for also *updateStatus*). -- `// +genclient:nonNamespaced` - all verb functions are generated without namespace. -- `// +genclient:onlyVerbs=create,get` - only listed verb functions will be generated. -- `// +genclient:skipVerbs=watch` - all default client verb functions will be generated **except** *watch* verb. -- `// +genclient:noStatus` - skip generation of *updateStatus* verb even thought the `.Status` field exists. - -In some cases you want to generate non-standard verbs (eg. for sub-resources). To do that you can use the following generator tag: - -- `// +genclient:method=Scale,verb=update,subresource=scale,input=k8s.io/api/extensions/v1beta1.Scale,result=k8s.io/api/extensions/v1beta1.Scale` - in this case a new function `Scale(string, *v1beta.Scale) *v1beta.Scale` will be added to the default client and the body of the function will be based on the *update* verb. The optional *subresource* argument will make the generated client function use subresource `scale`. Using the optional *input* and *result* arguments you can override the default type with a custom type. If the import path is not given, the generator will assume the type exists in the same package. - -In addition, the following optional tags influence the client generation: - -- `// +groupName=policy.authorization.k8s.io` – used in the fake client as the full group name (defaults to the package name), -- `// +groupGoName=AuthorizationPolicy` – a CamelCase Golang identifier to de-conflict groups with non-unique prefixes like `policy.authorization.k8s.io` and `policy.k8s.io`. These would lead to two `Policy()` methods in the clientset otherwise (defaults to the upper-case first segement of the group name). - -**2a.** If you are developing in the k8s.io/kubernetes repository, you just need to run hack/update-codegen.sh. - -**2b.** If you are running client-gen outside of k8s.io/kubernetes, you need to use the command line argument `--input` to specify the groups and versions of the APIs you want to generate clients for, client-gen will then look into `pkg/apis/${GROUP}/${VERSION}/types.go` and generate clients for the types you have marked with the `genclient` tags. For example, to generated a clientset named "my_release" including clients for api/v1 objects and extensions/v1beta1 objects, you need to run: - -``` -$ client-gen --input="api/v1,extensions/v1beta1" --clientset-name="my_release" -``` - -**3.** ***Adding expansion methods***: client-gen only generates the common methods, such as CRUD. You can manually add additional methods through the expansion interface. For example, this [file](https://git.k8s.io/kubernetes/pkg/client/clientset_generated/internalclientset/typed/core/internalversion/pod_expansion.go) adds additional methods to Pod's client. As a convention, we put the expansion interface and its methods in file ${TYPE}_expansion.go. In most cases, you don't want to remove existing expansion files. So to make life easier, instead of creating a new clientset from scratch, ***you can copy and rename an existing clientset (so that all the expansion files are copied)***, and then run client-gen. - -## Output of client-gen - -- clientset: the clientset will be generated at `pkg/client/clientset_generated/` by default, and you can change the path via the `--clientset-path` command line argument. - -- Individual typed clients and client for group: They will be generated at `pkg/client/clientset_generated/${clientset_name}/typed/generated/${GROUP}/${VERSION}/` - -## Released clientsets - -If you are contributing code to k8s.io/kubernetes, try to use the generated clientset [here](https://git.k8s.io/kubernetes/pkg/client/clientset_generated/internalclientset). - -If you need a stable Go client to build your own project, please refer to the [client-go repository](https://github.com/kubernetes/client-go). - -We are migrating k8s.io/kubernetes to use client-go as well, see issue [#35159](https://github.com/kubernetes/kubernetes/issues/35159). +This file is a placeholder to preserve links. Please remove after 2019-07-01 or the release of kubernetes 1.15, whichever comes first. diff --git a/contributors/devel/getting-builds.md b/contributors/devel/getting-builds.md index 0ae7031b..bbbcfa44 100644 --- a/contributors/devel/getting-builds.md +++ b/contributors/devel/getting-builds.md @@ -1,48 +1,3 @@ -# Getting Kubernetes Builds +This file has moved to https://git.k8s.io/community/contributors/devel/sig-release/getting-builds.md. -You can use [hack/get-build.sh](http://releases.k8s.io/HEAD/hack/get-build.sh) -to get a build or to use as a reference on how to get the most recent builds -with curl. With `get-build.sh` you can grab the most recent stable build, the -most recent release candidate, or the most recent build to pass our ci and gce -e2e tests (essentially a nightly build). - -Run `./hack/get-build.sh -h` for its usage. - -To get a build at a specific version (v1.1.1) use: - -```console -./hack/get-build.sh v1.1.1 -``` - -To get the latest stable release: - -```console -./hack/get-build.sh release/stable -``` - -Use the "-v" option to print the version number of a build without retrieving -it. For example, the following prints the version number for the latest ci -build: - -```console -./hack/get-build.sh -v ci/latest -``` - -You can also use the gsutil tool to explore the Google Cloud Storage release -buckets. Here are some examples: - -```sh -gsutil cat gs://kubernetes-release-dev/ci/latest.txt # output the latest ci version number -gsutil cat gs://kubernetes-release-dev/ci/latest-green.txt # output the latest ci version number that passed gce e2e -gsutil ls gs://kubernetes-release-dev/ci/v0.20.0-29-g29a55cc/ # list the contents of a ci release -gsutil ls gs://kubernetes-release/release # list all official releases and rcs -``` - -## Install `gsutil` - -Example installation: - -```console -$ curl -sSL https://storage.googleapis.com/pub/gsutil.tar.gz | sudo tar -xz -C /usr/local/src -$ sudo ln -s /usr/local/src/gsutil/gsutil /usr/bin/gsutil -``` +This file is a placeholder to preserve links. Please remove after 2019-07-01 or the release of kubernetes 1.15, whichever comes first. diff --git a/contributors/devel/godep.md b/contributors/devel/godep.md index 4b10a7d5..6e896b94 100644 --- a/contributors/devel/godep.md +++ b/contributors/devel/godep.md @@ -1,251 +1,3 @@ -# Using godep to manage dependencies +This file has moved to https://git.k8s.io/community/contributors/devel/sig-architecture/godep.md. -This document is intended to show a way for managing `vendor/` tree dependencies -in Kubernetes. If you do not need to manage vendored dependencies, you probably -do not need to read this. - -## Background - -As a tool, `godep` leaves much to be desired. It builds on `go get`, and adds -the ability to pin dependencies to exact git version. The `go get` tool itself -doesn't have any concept of versions, and tends to blow up if it finds a git -repo synced to anything but `master`, but that is exactly the state that -`godep` leaves repos. This is a recipe for frustration when people try to use -the tools. - -This doc will focus on predictability and reproducibility. - -## Justifications for an update - -Before you update a dependency, take a moment to consider why it should be -updated. Valid reasons include: - 1. We need new functionality that is in a later version. - 2. New or improved APIs in the dependency significantly improve Kubernetes code. - 3. Bugs were fixed that impact Kubernetes. - 4. Security issues were fixed even if they don't impact Kubernetes yet. - 5. Performance, scale, or efficiency was meaningfully improved. - 6. We need dependency A and there is a transitive dependency B. - 7. Kubernetes has an older level of a dependency that is precluding being able -to work with other projects in the ecosystem. - -## Theory of operation - -The `go` toolchain assumes a global workspace that hosts all of your Go code. - -The `godep` tool operates by first "restoring" dependencies into your `$GOPATH`. -This reads the `Godeps.json` file, downloads all of the dependencies from the -internet, and syncs them to the specified revisions. You can then make -changes - sync to different revisions or edit Kubernetes code to use new -dependencies (and satisfy them with `go get`). When ready, you tell `godep` to -"save" everything, which it does by walking the Kubernetes code, finding all -required dependencies, copying them from `$GOPATH` into the `vendor/` directory, -and rewriting `Godeps.json`. - -This does not work well, when combined with a global Go workspace. Instead, we -will set up a private workspace for this process. - -The Kubernetes build process uses this same technique, and offers a tool called -`run-in-gopath.sh` which sets up and switches to a local, private workspace, -including setting up `$GOPATH` and `$PATH`. If you wrap commands with this -tool, they will use the private workspace, which will not conflict with other -projects and is easily cleaned up and recreated. - -To see this in action, you can run an interactive shell in this environment: - -```sh -# Run a shell, but don't run your own shell initializations. -hack/run-in-gopath.sh bash --norc --noprofile -``` - -## Restoring deps - -To extract and download dependencies into `$GOPATH` we provide a script: -`hack/godep-restore.sh`. If you run this tool, it will restore into your own -`$GOPATH`. If you wrap it in `run-in-gopath.sh` it will restore into your -`_output/` directory. - -```sh -hack/run-in-gopath.sh hack/godep-restore.sh -``` - -This script will try to optimize what it needs to download, and if it seems the -dependencies are all present already, it will return very quickly. - -If there's every any doubt about the correctness of your dependencies, you can -simply `make clean` or `rm -rf _output`, and run it again. - -Now you should have a clean copy of all of the Kubernetes dependencies. - -Downloading dependencies might take a while, so if you want to see progress -information use the `-v` flag: - -```sh -hack/run-in-gopath.sh hack/godep-restore.sh -v -``` - -## Making changes - -The most common things people need to do with deps are add and update them. -These are similar but different. - -### Adding a dep - -For the sake of examples, consider that we have discovered a wonderful Go -library at `example.com/go/frob`. The first thing you need to do is get that -code into your workspace: - -```sh -hack/run-in-gopath.sh go get -d example.com/go/frob -``` - -This will fetch, but not compile (omit the `-d` if you want to compile it now), -the library into your private `$GOPATH`. It will pull whatever the default -revision of that library is, typically the `master` branch for git repositories. -If this is not the revision you need, you can change it, for example to -`v1.0.0`: - -```sh -hack/run-in-gopath.sh bash -c 'git -C $GOPATH/src/example.com/go/frob checkout v1.0.0' -``` - -Now that the code is present, you can start to use it in Kubernetes code. -Because it is in your private workspace's `$GOPATH`, it might not be part of -your own `$GOPATH`, so tools like `goimports` might not find it. This is an -unfortunate side-effect of this process. You can either add the whole private -workspace to your own `$GOPATH` or you can `go get` the library into your own -`$GOPATH` until it is properly vendored into Kubernetes. - -Another possible complication is a dep that uses `gopdep` itself. In that case, -you need to restore its dependencies, too: - -```sh -hack/run-in-gopath.sh bash -c 'cd $GOPATH/src/example.com/go/frob && godep restore' -``` - -If the transitive deps collide with Kubernetes deps, you may have to manually -resolve things. This is where the ability to run a shell in this environment -comes in handy: - -```sh -hack/run-in-gopath.sh bash --norc --noprofile -``` - -### Updating a dep - -Sometimes we already have a dep, but the version of it is wrong. Because of the -way that `godep` and `go get` interact (badly) it's generally easiest to hit it -with a big hammer: - -```sh -hack/run-in-gopath.sh bash -c 'rm -rf $GOPATH/src/example.com/go/frob' -hack/run-in-gopath.sh go get -d example.com/go/frob -hack/run-in-gopath.sh bash -c 'git -C $GOPATH/src/example.com/go/frob checkout v2.0.0' -``` - -This will remove the code, re-fetch it, and sync to your desired version. - -### Removing a dep - -This happens almost for free. If you edit Kubernetes code and remove the last -use of a given dependency, you only need to restore and save the deps, and the -`godep` tool will figure out that you don't need that dep any more: - -## Saving deps - -Now that you have made your changes - adding, updating, or removing the use of a -dep - you need to rebuild the dependency database and make changes to the -`vendor/` directory. - -```sh -hack/run-in-gopath.sh hack/godep-save.sh -``` - -This will run through all of the primary targets for the Kubernetes project, -calculate which deps are needed, and rebuild the database. It will also -regenerate other metadata files which the project needs, such as BUILD files and -the LICENSE database. - -Commit the changes before updating deps in staging repos. - -## Saving deps in staging repos - -Kubernetes stores some code in a directory called `staging` which is handled -specially, and is not covered by the above. If you modified any code under -staging, or if you changed a dependency of code under staging (even -transitively), you'll also need to update deps there: - -```sh -./hack/update-staging-godeps.sh -``` - -Then commit the changes generated by the above script. - -## Commit messages - -Terse messages like "Update foo.org/bar to 0.42" are problematic -for maintainability. Please include in your commit message the -detailed reason why the dependencies were modified. - -Too commonly dependency changes have a ripple effect where something -else breaks unexpectedly. The first instinct during issue triage -is to revert a change. If the change was made to fix some other -issue and that issue was not documented, then a revert simply -continues the ripple by fixing one issue and reintroducing another -which then needs refixed. This can needlessly span multiple days -as CI results bubble in and subsequent patches fix and refix and -rerefix issues. This may be avoided if the original modifications -recorded artifacts of the change rationale. - -## Sanity checking - -After all of this is done, `git status` should show you what files have been -modified and added/removed. Make sure to sanity-check them with `git diff`, and -to `git add` and `git rm` them, as needed. It is commonly advised to make one -`git commit` which includes just the dependencies and Godeps files, and -another `git commit` that includes changes to Kubernetes code to use (or stop -using) the new/updated/removed dependency. These commits can go into a single -pull request. - -Before sending your PR, it's a good idea to sanity check that your -Godeps.json file and the contents of `vendor/ `are ok: - -```sh -hack/run-in-gopath.sh hack/verify-godeps.sh -``` - -All this script will do is a restore, followed by a save, and then look for -changes. If you followed the above instructions, it should be clean. If it is -not, you get to figure out why. - -## Manual updates - -It is sometimes expedient to manually fix the `Godeps.json` file to -minimize the changes. However, without great care this can lead to failures -with the verifier scripts. The kubernetes codebase does "interesting things" -with symlinks between `vendor/` and `staging/` to allow multiple Go import -paths to coexist in the same git repo. - -The verifiers, including `hack/verify-godeps.sh` *must* pass for every pull -request. - -## Reviewing and approving dependency changes - -Particular attention to detail should be exercised when reviewing and approving -PRs that add/remove/update dependencies. Importing a new dependency should bring -a certain degree of value as there is a maintenance overhead for maintaining -dependencies into the future. - -When importing a new dependency, be sure to keep an eye out for the following: -- Is the dependency maintained? -- Does the dependency bring value to the project? Could this be done without - adding a new dependency? -- Is the target dependency the original source, or a fork? -- Is there already a dependency in the project that does something similar? -- Does the dependency have a license that is compatible with the Kubernetes - project? - -All new dependency licenses should be reviewed by either Tim Hockin (@thockin) -or the Steering Committee (@kubernetes/steering-committee) to ensure that they -are compatible with the Kubernetes project license. It is also important to note -and flag if a license has changed when updating a dependency, so that these can -also be reviewed. +This file is a placeholder to preserve links. Please remove after 2019-07-01 or the release of kubernetes 1.15, whichever comes first. diff --git a/contributors/devel/gubernator.md b/contributors/devel/gubernator.md index b03d11a1..c5361697 100644 --- a/contributors/devel/gubernator.md +++ b/contributors/devel/gubernator.md @@ -1,136 +1,3 @@ -# Gubernator +This file has moved to https://git.k8s.io/community/contributors/devel/sig-testing/gubernator.md. -*This document is oriented at developers who want to use Gubernator to debug while developing for Kubernetes.* - - -- [Gubernator](#gubernator) - - [What is Gubernator?](#what-is-gubernator) - - [Gubernator Features](#gubernator-features) - - [Test Failures list](#test-failures-list) - - [Log Filtering](#log-filtering) - - [Gubernator for Local Tests](#gubernator-for-local-tests) - - [Future Work](#future-work) - - -## What is Gubernator? - -[Gubernator](https://k8s-gubernator.appspot.com/) is a webpage for viewing and filtering Kubernetes -test results. - -Gubernator simplifies the debugging process and makes it easier to track down failures by automating many -steps commonly taken in searching through logs, and by offering tools to filter through logs to find relevant lines. -Gubernator automates the steps of finding the failed tests, displaying relevant logs, and determining the -failed pods and the corresponding pod UID, namespace, and container ID. -It also allows for filtering of the log files to display relevant lines based on selected keywords, and -allows for multiple logs to be woven together by timestamp. - -Gubernator runs on Google App Engine and fetches logs stored on Google Cloud Storage. - -## Gubernator Features - -### Test Failures list - -Comments made by k8s-ci-robot will post a link to a page listing the failed tests. -Each failed test comes with the corresponding error log from a junit file and a link -to filter logs for that test. - -Based on the message logged in the junit file, the pod name may be displayed. - - - -[Test Failures List Example](https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/kubernetes-e2e-gke/11721) - -### Log Filtering - -The log filtering page comes with checkboxes and textboxes to aid in filtering. Filtered keywords will be bolded -and lines including keywords will be highlighted. Up to four lines around the line of interest will also be displayed. - - - -If less than 100 lines are skipped, the "... skipping xx lines ..." message can be clicked to expand and show -the hidden lines. - -Before expansion: - -After expansion: - - -If the pod name was displayed in the Test Failures list, it will automatically be included in the filters. -If it is not found in the error message, it can be manually entered into the textbox. Once a pod name -is entered, the Pod UID, Namespace, and ContainerID may be automatically filled in as well. These can be -altered as well. To apply the filter, check off the options corresponding to the filter. - - - -To add a filter, type the term to be filtered into the textbox labeled "Add filter:" and press enter. -Additional filters will be displayed as checkboxes under the textbox. - - - -To choose which logs to view check off the checkboxes corresponding to the logs of interest. If multiple logs are -included, the "Weave by timestamp" option can weave the selected logs together based on the timestamp in each line. - - - -[Log Filtering Example 1](https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/kubelet-gce-e2e-ci/5535/nodelog?pod=pod-configmaps-b5b876cb-3e1e-11e6-8956-42010af0001d&junit=junit_03.xml&wrap=on&logfiles=%2Fkubernetes-jenkins%2Flogs%2Fkubelet-gce-e2e-ci%2F5535%2Fartifacts%2Ftmp-node-e2e-7a5a3b40-e2e-node-coreos-stable20160622-image%2Fkube-apiserver.log&logfiles=%2Fkubernetes-jenkins%2Flogs%2Fkubelet-gce-e2e-ci%2F5535%2Fartifacts%2Ftmp-node-e2e-7a5a3b40-e2e-node-coreos-stable20160622-image%2Fkubelet.log&UID=on&poduid=b5b8a59e-3e1e-11e6-b358-42010af0001d&ns=e2e-tests-configmap-oi12h&cID=tmp-node-e2e-7a5a3b40-e2e-node-coreos-stable20160622-image) - -[Log Filtering Example 2](https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/kubernetes-e2e-gke/11721/nodelog?pod=client-containers-a53f813c-503e-11e6-88dd-0242ac110003&junit=junit_19.xml&wrap=on) - - -### Gubernator for Local Tests - -*Currently Gubernator can only be used with remote node e2e tests.* - -**NOTE: Using Gubernator with local tests will publicly upload your test logs to Google Cloud Storage** - -To use Gubernator to view logs from local test runs, set the GUBERNATOR tag to true. -A URL link to view the test results will be printed to the console. -Please note that running with the Gubernator tag will bypass the user confirmation for uploading to GCS. - -```console - -$ make test-e2e-node REMOTE=true GUBERNATOR=true -... -================================================================ -Running gubernator.sh - -Gubernator linked below: -k8s-gubernator.appspot.com/build/yourusername-g8r-logs/logs/e2e-node/timestamp -``` - -The gubernator.sh script can be run after running a remote node e2e test for the same effect. - -```console -$ ./test/e2e_node/gubernator.sh -Do you want to run gubernator.sh and upload logs publicly to GCS? [y/n]y -... -Gubernator linked below: -k8s-gubernator.appspot.com/build/yourusername-g8r-logs/logs/e2e-node/timestamp -``` - -## Future Work - -Gubernator provides a framework for debugging failures and introduces useful features. -There is still a lot of room for more features and growth to make the debugging process more efficient. - -How to contribute (see https://git.k8s.io/test-infra/gubernator/README.md) - -* Extend GUBERNATOR flag to all local tests - -* More accurate identification of pod name, container ID, etc. - * Change content of logged strings for failures to include more information - * Better regex in Gubernator - -* Automate discovery of more keywords - * Volume Name - * Disk Name - * Pod IP - -* Clickable API objects in the displayed lines in order to add them as filters - -* Construct story of pod's lifetime - * Have concise view of what a pod went through from when pod was started to failure - -* Improve UI - * Have separate folders of logs in rows instead of in one long column - * Improve interface for adding additional features (maybe instead of textbox and checkbox, have chips) +This file is a placeholder to preserve links. Please remove after 2019-07-01 or the release of kubernetes 1.15, whichever comes first. diff --git a/contributors/devel/help-wanted.md b/contributors/devel/help-wanted.md index bd923b3c..984deb6a 100644 --- a/contributors/devel/help-wanted.md +++ b/contributors/devel/help-wanted.md @@ -1,3 +1,3 @@ -This document has moved [here](https://git.k8s.io/community/tree/master/contributors/guide/help-wanted.md). +This file has moved to https://git.k8s.io/community/contributors/guide/help-wanted.md. -*This file is a redirect stub. It should be deleted within 90 days from the current date.*
\ No newline at end of file +This file is a placeholder to preserve links. Please remove after 2019-07-01 or the release of kubernetes 1.15, whichever comes first. diff --git a/contributors/devel/instrumentation.md b/contributors/devel/instrumentation.md index b0a11193..6681740e 100644 --- a/contributors/devel/instrumentation.md +++ b/contributors/devel/instrumentation.md @@ -1,215 +1,3 @@ -## Instrumenting Kubernetes - -The following references and outlines general guidelines for metric instrumentation -in Kubernetes components. Components are instrumented using the -[Prometheus Go client library](https://github.com/prometheus/client_golang). For non-Go -components. [Libraries in other languages](https://prometheus.io/docs/instrumenting/clientlibs/) -are available. - -The metrics are exposed via HTTP in the -[Prometheus metric format](https://prometheus.io/docs/instrumenting/exposition_formats/), -which is open and well-understood by a wide range of third party applications and vendors -outside of the Prometheus eco-system. - -The [general instrumentation advice](https://prometheus.io/docs/practices/instrumentation/) -from the Prometheus documentation applies. This document reiterates common pitfalls and some -Kubernetes specific considerations. - -Prometheus metrics are cheap as they have minimal internal memory state. Set and increment -operations are thread safe and take 10-25 nanoseconds (Go & Java). -Thus, instrumentation can and should cover all operationally relevant aspects of an application, -internal and external. - -## Quick Start - -The following describes the basic steps required to add a new metric (in Go). - -1. Import "github.com/prometheus/client_golang/prometheus". - -2. Create a top-level var to define the metric. For this, you have to: - - 1. Pick the type of metric. Use a Gauge for things you want to set to a -particular value, a Counter for things you want to increment, or a Histogram or -Summary for histograms/distributions of values (typically for latency). -Histograms are better if you're going to aggregate the values across jobs, while -summaries are better if you just want the job to give you a useful summary of -the values. - 2. Give the metric a name and description. - 3. Pick whether you want to distinguish different categories of things using -labels on the metric. If so, add "Vec" to the name of the type of metric you -want and add a slice of the label names to the definition. - - [Example](https://github.com/kubernetes/kubernetes/blob/cd3299307d44665564e1a5c77d0daa0286603ff5/pkg/apiserver/apiserver.go#L53) - ```go - requestCounter = prometheus.NewCounterVec( - prometheus.CounterOpts{ - Name: "apiserver_request_count", - Help: "Counter of apiserver requests broken out for each verb, API resource, client, and HTTP response code.", - }, - []string{"verb", "resource", "client", "code"}, - ) - ``` - -3. Register the metric so that prometheus will know to export it. - - [Example](https://github.com/kubernetes/kubernetes/blob/cd3299307d44665564e1a5c77d0daa0286603ff5/pkg/apiserver/apiserver.go#L78) - ```go - func init() { - prometheus.MustRegister(requestCounter) - prometheus.MustRegister(requestLatencies) - prometheus.MustRegister(requestLatenciesSummary) - } - ``` - -4. Use the metric by calling the appropriate method for your metric type (Set, -Inc/Add, or Observe, respectively for Gauge, Counter, or Histogram/Summary), -first calling WithLabelValues if your metric has any labels - - [Example](https://github.com/kubernetes/kubernetes/blob/cd3299307d44665564e1a5c77d0daa0286603ff5/pkg/apiserver/apiserver.go#L87) - ```go - requestCounter.WithLabelValues(*verb, *resource, client, strconv.Itoa(*httpCode)).Inc() - ``` - - -## Instrumentation types - -Components have metrics capturing events and states that are inherent to their -application logic. Examples are request and error counters, request latency -histograms, or internal garbage collection cycles. Those metrics are instrumented -directly in the application code. - -Secondly, there are business logic metrics. Those are not about observed application -behavior but abstract system state, such as desired replicas for a deployment. -They are not directly instrumented but collected from otherwise exposed data. - -In Kubernetes they are generally captured in the [kube-state-metrics](https://github.com/kubernetes/kube-state-metrics) -component, which reads them from the API server. -For this types of metric exposition, the -[exporter guidelines](https://prometheus.io/docs/instrumenting/writing_exporters/) -apply additionally. - -## Naming - -Metrics added directly by application or package code should have a unique name. -This avoids collisions of metrics added via dependencies. They also clearly -distinguish metrics collected with different semantics. This is solved through -prefixes: - -``` -<component_name>_<metric> -``` - -For example, suppose the kubelet instrumented its HTTP requests but also uses -an HTTP router providing its own implementation. Both expose metrics on total -http requests. They should be distinguishable as in: - -``` -kubelet_http_requests_total{path=”/some/path”,status=”200”} -routerpkg_http_requests_total{path=”/some/path”,status=”200”,method=”GET”} -``` - -As we can see they expose different labels and thus a naming collision would -not have been possible to resolve even if both metrics counted the exact same -requests. - -Resource objects that occur in names should inherit the spelling that is used -in kubectl, i.e. daemon sets are `daemonset` rather than `daemon_set`. - -## Dimensionality & Cardinality - -Metrics can often replace more expensive logging as they are time-aggregated -over a sampling interval. The [multidimensional data model](https://prometheus.io/docs/concepts/data_model/) -enables deep insights and all metrics should use those label dimensions -where appropriate. - -A common error that often causes performance issues in the ingesting metric -system is considering dimensions that inhibit or eliminate time aggregation -by being too specific. Typically those are user IDs or error messages. -More generally: one should know a comprehensive list of all possible values -for a label at instrumentation time. - -Notable exceptions are exporters like kube-state-metrics, which expose per-pod -or per-deployment metrics, which are theoretically unbound over time as one could -constantly create new ones, with new names. However, they have -a reasonable upper bound for a given size of infrastructure they refer to and -its typical frequency of changes. - -In general, “external” labels like pod or node name do not belong in the -instrumentation itself. They are to be attached to metrics by the collecting -system that has the external knowledge ([blog post](https://www.robustperception.io/target-labels-are-for-life-not-just-for-christmas/)). - -## Normalization - -Metrics should be normalized with respect to their dimensions. They should -expose the minimal set of labels, each of which provides additional information. -Labels that are composed from values of different labels are not desirable. -For example: - -``` -example_metric{pod=”abc”,container=”proxy”,container_long=”abc/proxy”} -``` - -It often seems feasible to add additional meta information about an object -to all metrics about that object, e.g.: - -``` -kube_pod_container_restarts{namespace=...,pod=...,container=...} -``` - -A common use case is wanting to look at such metrics w.r.t to the node the -pod is scheduled on. So it seems convenient to add a “node” label. - -``` -kube_pod_container_restarts{namespace=...,pod=...,container=...,node=...} -``` - -This however only caters to one specific query use case. There are many more -pieces of metadata that could be added, effectively blowing up the instrumentation. -They are also not guaranteed to be stable over time. What if pods at some -point can be live migrated? -Those pieces of information should be normalized into an info-level metric -([blog post](https://www.robustperception.io/exposing-the-software-version-to-prometheus/)), -which is always set to 1. For example: - -``` -kube_pod_info{pod=...,namespace=...,pod_ip=...,host_ip=..,node=..., ...} -``` - -The metric system can later denormalize those along the identifying labels -“pod” and “namespace” labels. This leads to... - -## Resource Referencing - -It is often desirable to correlate different metrics about a common object, -such as a pod. Label dimensions can be used to match up different metrics. -This is most easy if label names and values are following a common pattern. -For metrics exposed by the same application, that often happens naturally. - -For a system composed of several independent, and also pluggable components, -it makes sense to set cross-component standards to allow easy querying in -metric systems without extensive post-processing of data. -In Kubernetes, those are the resource objects such as deployments, -pods, or services and the namespace they belong to. - -The following should be consistently used: - -``` -example_metric_ccc{pod=”example-app-5378923”, namespace=”default”} -``` - -An object is referenced by its unique name in a label named after the resource -itself (i.e. `pod`/`deployment`/... and not `pod_name`/`deployment_name`) -and the namespace it belongs to in the `namespace` label. - -Note: namespace/name combinations are only unique at a certain point in time. -For time series this is given by the timestamp associated with any data point. -UUIDs are truly unique but not convenient to use in user-facing time series -queries. -They can still be incorporated using an info level metric as described above for -`kube_pod_info`. A query to a metric system selecting by UUID via a the info level -metric could look as follows: - -``` -kube_pod_restarts and on(namespace, pod) kube_pod_info{uuid=”ABC”} -``` +This file has moved to https://git.k8s.io/community/contributors/devel/sig-instrumentation/instrumentation.md. +This file is a placeholder to preserve links. Please remove after 2019-07-01 or the release of kubernetes 1.15, whichever comes first. diff --git a/contributors/devel/kubectl-conventions.md b/contributors/devel/kubectl-conventions.md index 5b009657..4cc1c7e0 100644 --- a/contributors/devel/kubectl-conventions.md +++ b/contributors/devel/kubectl-conventions.md @@ -1,458 +1,3 @@ -# Kubectl Conventions - -Updated: 3/23/2017 - -**Table of Contents** - -- [Kubectl Conventions](#kubectl-conventions) - - [Principles](#principles) - - [Command conventions](#command-conventions) - - [Create commands](#create-commands) - - [Rules for extending special resource alias - "all"](#rules-for-extending-special-resource-alias---all) - - [Flag conventions](#flag-conventions) - - [Output conventions](#output-conventions) - - [Documentation conventions](#documentation-conventions) - - [kubectl Factory conventions](#kubectl-Factory-conventions) - - [Command implementation conventions](#command-implementation-conventions) - - [Exit code conventions](#exit-code-conventions) - - [Generators](#generators) - - -## Principles - -* Strive for consistency across commands - -* Explicit should always override implicit - - * Environment variables should override default values - - * Command-line flags should override default values and environment variables - - * `--namespace` should also override the value specified in a specified -resource - -* Most kubectl commands should be able to operate in bulk on resources, of mixed types. - -* Kubectl should not make any decisions based on its nor the server's release version string. Instead, API - discovery and/or OpenAPI should be used to determine available features. - -* We currently only guarantee one release of version skew is supported, but we strive to make old releases of kubectl - continue to work with newer servers in compliance with our API compatibility guarantees. This means, for instance, that - kubectl should not fully parse objects returned by the server into full Go types and then re-encode them, since that - would drop newly added fields. ([#3955](https://github.com/kubernetes/kubernetes/issues/3955)) - -* General-purpose kubectl commands (e.g., get, delete, create -f, replace, patch, apply) should work for all resource types, - even those not present when that release of kubectl was built, such as APIs added in newer releases, aggregated APIs, - and third-party resources. - -* While functionality may be added to kubectl out of expedience, commonly needed functionality should be provided by - the server to make it easily accessible to all API clients. ([#12143](https://github.com/kubernetes/kubernetes/issues/12143)) - -* Remaining non-trivial functionality remaining in kubectl should be made available to other clients via libraries - ([#7311](https://github.com/kubernetes/kubernetes/issues/7311)) - -## Command conventions - -* Command names are all lowercase, and hyphenated if multiple words. - -* kubectl VERB NOUNs for commands that apply to multiple resource types. - -* Command itself should not have built-in aliases. - -* NOUNs may be specified as `TYPE name1 name2` or `TYPE/name1 TYPE/name2` or -`TYPE1,TYPE2,TYPE3/name1`; TYPE is omitted when only a single type is expected. - -* Resource types are all lowercase, with no hyphens; both singular and plural -forms are accepted. - -* NOUNs may also be specified by one or more file arguments: `-f file1 -f file2 -...` - -* Resource types may have 2- or 3-letter aliases. - -* Business logic should be decoupled from the command framework, so that it can -be reused independently of kubectl, cobra, etc. - * Ideally, commonly needed functionality would be implemented server-side in -order to avoid problems typical of "fat" clients and to make it readily -available to non-Go clients. - -* Commands that generate resources, such as `run` or `expose`, should obey -specific conventions, see [generators](#generators). - -* A command group (e.g., `kubectl config`) may be used to group related -non-standard commands, such as custom generators, mutations, and computations. - - -### Create commands - -`kubectl create <resource>` commands fill the gap between "I want to try -Kubernetes, but I don't know or care what gets created" (`kubectl run`) and "I -want to create exactly this" (author yaml and run `kubectl create -f`). They -provide an easy way to create a valid object without having to know the vagaries -of particular kinds, nested fields, and object key typos that are ignored by the -yaml/json parser. Because editing an already created object is easier than -authoring one from scratch, these commands only need to have enough parameters -to create a valid object and set common immutable fields. It should default as -much as is reasonably possible. Once that valid object is created, it can be -further manipulated using `kubectl edit` or the eventual `kubectl set` commands. - -`kubectl create <resource> <special-case>` commands help in cases where you need -to perform non-trivial configuration generation/transformation tailored for a -common use case. `kubectl create secret` is a good example, there's a `generic` -flavor with keys mapping to files, then there's a `docker-registry` flavor that -is tailored for creating an image pull secret, and there's a `tls` flavor for -creating tls secrets. You create these as separate commands to get distinct -flags and separate help that is tailored for the particular usage. - - -### Rules for extending special resource alias - "all" - -Here are the rules to add a new resource to the `kubectl get all` output. - -* No cluster scoped resources - -* No namespace admin level resources (limits, quota, policy, authorization -rules) - -* No resources that are potentially unrecoverable (secrets and pvc) - -* Resources that are considered "similar" to #3 should be grouped -the same (configmaps) - - -## Flag conventions - -* Flags are all lowercase, with words separated by hyphens - -* Flag names and single-character aliases should have the same meaning across -all commands - -* Flag descriptions should start with an uppercase letter and not have a -period at the end of a sentence - -* Command-line flags corresponding to API fields should accept API enums -exactly (e.g., `--restart=Always`) - -* Do not reuse flags for different semantic purposes, and do not use different -flag names for the same semantic purpose -- grep for `"Flags()"` before adding a -new flag - -* Use short flags sparingly, only for the most frequently used options, prefer -lowercase over uppercase for the most common cases, try to stick to well known -conventions for UNIX commands and/or Docker, where they exist, and update this -list when adding new short flags - - * `-f`: Resource file - * also used for `--follow` in `logs`, but should be deprecated in favor of `-F` - * `-n`: Namespace scope - * `-l`: Label selector - * also used for `--labels` in `expose`, but should be deprecated - * `-L`: Label columns - * `-c`: Container - * also used for `--client` in `version`, but should be deprecated - * `-i`: Attach stdin - * `-t`: Allocate TTY - * `-w`: Watch (currently also used for `--www` in `proxy`, but should be deprecated) - * `-p`: Previous - * also used for `--pod` in `exec`, but deprecated - * also used for `--patch` in `patch`, but should be deprecated - * also used for `--port` in `proxy`, but should be deprecated - * `-P`: Static file prefix in `proxy`, but should be deprecated - * `-r`: Replicas - * `-u`: Unix socket - * `-v`: Verbose logging level - - -* `--dry-run`: Don't modify the live state; simulate the mutation and display -the output. All mutations should support it. - -* `--local`: Don't contact the server; just do local read, transformation, -generation, etc., and display the output - -* `--output-version=...`: Convert the output to a different API group/version - -* `--short`: Output a compact summary of normal output; the format is subject -to change and is optimized for reading not parsing. - -* `--validate`: Validate the resource schema - -## Output conventions - -* By default, output is intended for humans rather than programs - * However, affordances are made for simple parsing of `get` output - -* Only errors should be directed to stderr - -* `get` commands should output one row per resource, and one resource per row - - * Column titles and values should not contain spaces in order to facilitate -commands that break lines into fields: cut, awk, etc. Instead, use `-` as the -word separator. - - * By default, `get` output should fit within about 80 columns - - * Eventually we could perhaps auto-detect width - * `-o wide` may be used to display additional columns - - - * The first column should be the resource name, titled `NAME` (may change this -to an abbreviation of resource type) - - * NAMESPACE should be displayed as the first column when --all-namespaces is -specified - - * The last default column should be time since creation, titled `AGE` - - * `-Lkey` should append a column containing the value of label with key `key`, -with `<none>` if not present - - * json, yaml, Go template, and jsonpath template formats should be supported -and encouraged for subsequent processing - - * Users should use --api-version or --output-version to ensure the output -uses the version they expect - - -* `describe` commands may output on multiple lines and may include information -from related resources, such as events. Describe should add additional -information from related resources that a normal user may need to know - if a -user would always run "describe resource1" and the immediately want to run a -"get type2" or "describe resource2", consider including that info. Examples, -persistent volume claims for pods that reference claims, events for most -resources, nodes and the pods scheduled on them. When fetching related -resources, a targeted field selector should be used in favor of client side -filtering of related resources. - -* For fields that can be explicitly unset (booleans, integers, structs), the -output should say `<unset>`. Likewise, for arrays `<none>` should be used; for -external IP, `<nodes>` should be used; for load balancer, `<pending>` should be -used. Lastly `<unknown>` should be used where unrecognized field type was -specified. - -* Mutations should output TYPE/name verbed by default, where TYPE is singular; -`-o name` may be used to just display TYPE/name, which may be used to specify -resources in other commands - -## Documentation conventions - -* Commands are documented using Cobra; docs are then auto-generated by -`hack/update-generated-docs.sh`. - - * Use should contain a short usage string for the most common use case(s), not -an exhaustive specification - - * Short should contain a one-line explanation of what the command does - * Short descriptions should start with an uppercase case letter and not - have a period at the end of a sentence - * Short descriptions should (if possible) start with a first person - (singular present tense) verb - - * Long may contain multiple lines, including additional information about -input, output, commonly used flags, etc. - * Long descriptions should use proper grammar, start with an uppercase - letter and have a period at the end of a sentence - - - * Example should contain examples - * A comment should precede each example command. Comment should start with - an uppercase letter - * Command examples should not include a `$` prefix - -* Use "FILENAME" for filenames - -* Use "TYPE" for the particular flavor of resource type accepted by kubectl, -rather than "RESOURCE" or "KIND" - -* Use "NAME" for resource names - -## kubectl Factory conventions - -The kubectl `Factory` is a large interface which is used to provide access to clients, -polymorphic inspection, and polymorphic mutation. The `Factory` is layered in -"rings" in which one ring may reference inner rings, but not peers or outer rings. -This is done to allow composition by extenders. - -In order for composers to be able to provide alternative factory implementations -they need to provide low level pieces of *certain* functions so that when the factory -calls back into itself it uses the custom version of the function. Rather than try -to enumerate everything that someone would want to override we split the factory into -rings, where each ring can depend on methods an earlier ring, but cannot depend upon -peer methods in its own ring. - - -## Command implementation conventions - -For every command there should be a `NewCmd<CommandName>` function that creates -the command and returns a pointer to a `cobra.Command`, which can later be added -to other parent commands to compose the structure tree. There should also be a -`<CommandName>Options` struct with a variable to every flag and argument -declared by the command (and any other variable required for the command to -run). This makes tests and mocking easier. The struct ideally exposes three -methods: - -* `Complete`: Completes the struct fields with values that may or may not be -directly provided by the user, for example, by flags pointers, by the `args` -slice, by using the Factory, etc. - -* `Validate`: performs validation on the struct fields and returns appropriate -errors. - -* `Run<CommandName>`: runs the actual logic of the command, taking as assumption -that the struct is complete with all required values to run, and they are valid. - -Sample command skeleton: - -```go -// MineRecommendedName is the recommended command name for kubectl mine. -const MineRecommendedName = "mine" - -// Long command description and examples. -var ( - mineLong = templates.LongDesc(` - mine which is described here - with lots of details.`) - - mineExample = templates.Examples(` - # Run my command's first action - kubectl mine first_action - - # Run my command's second action on latest stuff - kubectl mine second_action --flag`) -) - -// MineOptions contains all the options for running the mine cli command. -type MineOptions struct { - mineLatest bool -} - -// NewCmdMine implements the kubectl mine command. -func NewCmdMine(parent, name string, f *cmdutil.Factory, out io.Writer) *cobra.Command { - opts := &MineOptions{} - - cmd := &cobra.Command{ - Use: fmt.Sprintf("%s [--latest]", name), - Short: "Run my command", - Long: mineLong, - Example: fmt.Sprintf(mineExample, parent+" "+name), - Run: func(cmd *cobra.Command, args []string) { - if err := opts.Complete(f, cmd, args, out); err != nil { - cmdutil.CheckErr(err) - } - if err := opts.Validate(); err != nil { - cmdutil.CheckErr(cmdutil.UsageError(cmd, err.Error())) - } - if err := opts.RunMine(); err != nil { - cmdutil.CheckErr(err) - } - }, - } - - cmd.Flags().BoolVar(&options.mineLatest, "latest", false, "Use latest stuff") - return cmd -} - -// Complete completes all the required options for mine. -func (o *MineOptions) Complete(f *cmdutil.Factory, cmd *cobra.Command, args []string, out io.Writer) error { - return nil -} - -// Validate validates all the required options for mine. -func (o MineOptions) Validate() error { - return nil -} - -// RunMine implements all the necessary functionality for mine. -func (o MineOptions) RunMine() error { - return nil -} -``` - -The `Run<CommandName>` method should contain the business logic of the command -and as noted in [command conventions](#command-conventions), ideally that logic -should exist server-side so any client could take advantage of it. Notice that -this is not a mandatory structure and not every command is implemented this way, -but this is a nice convention so try to be compliant with it. As an example, -have a look at how [kubectl logs](https://git.k8s.io/kubernetes/pkg/kubectl/cmd/logs.go) is implemented. - -## Exit code conventions - -Generally, for all the command exit code, result of `zero` means success and `non-zero` means errors. - -For idempotent ("make-it-so") commands, we should return `zero` when success even if no changes were provided, user can request treating "make-it-so" as "already-so" via flag `--error-unchanged` to make it return `non-zero` exit code. - -For non-idempotent ("already-so") commands, we should return `non-zero` by default, user can request treating "already-so" as "make-it-so" via flag `--ignore-unchanged` to make it return `zero` exit code. - - -| Exit Code Number | Meaning | Enable | -| :--- | :--- | :--- | -| 0 | Command exited success | By default, By flag `--ignore-unchanged` | -| 1 | Command exited for general errors | By default | -| 3 | Command was successful, but the user requested a distinct exit code when no change was made | By flag `--error-unchanged`| - -## Generators - -Generators are kubectl commands that generate resources based on a set of inputs -(other resources, flags, or a combination of both). - -The point of generators is: - -* to enable users using kubectl in a scripted fashion to pin to a particular -behavior which may change in the future. Explicit use of a generator will always -guarantee that the expected behavior stays the same. - -* to enable potential expansion of the generated resources for scenarios other -than just creation, similar to how -f is supported for most general-purpose -commands. - -Generator commands should obey the following conventions: - -* A `--generator` flag should be defined. Users then can choose between -different generators, if the command supports them (for example, `kubectl run` -currently supports generators for pods, jobs, replication controllers, and -deployments), or between different versions of a generator so that users -depending on a specific behavior may pin to that version (for example, `kubectl -expose` currently supports two different versions of a service generator). - -* Generation should be decoupled from creation. A generator should implement the -`kubectl.StructuredGenerator` interface and have no dependencies on cobra or the -Factory. See, for example, how the first version of the namespace generator is -defined: - -```go -// NamespaceGeneratorV1 supports stable generation of a namespace -type NamespaceGeneratorV1 struct { - // Name of namespace - Name string -} - -// Ensure it supports the generator pattern that uses parameters specified during construction -var _ StructuredGenerator = &NamespaceGeneratorV1{} - -// StructuredGenerate outputs a namespace object using the configured fields -func (g *NamespaceGeneratorV1) StructuredGenerate() (runtime.Object, error) { - if err := g.validate(); err != nil { - return nil, err - } - namespace := &api.Namespace{} - namespace.Name = g.Name - return namespace, nil -} - -// validate validates required fields are set to support structured generation -func (g *NamespaceGeneratorV1) validate() error { - if len(g.Name) == 0 { - return fmt.Errorf("name must be specified") - } - return nil -} -``` - -The generator struct (`NamespaceGeneratorV1`) holds the necessary fields for -namespace generation. It also satisfies the `kubectl.StructuredGenerator` -interface by implementing the `StructuredGenerate() (runtime.Object, error)` -method which configures the generated namespace that callers of the generator -(`kubectl create namespace` in our case) need to create. - -* `--dry-run` should output the resource that would be created, without -creating it. +This file has moved to https://git.k8s.io/community/contributors/devel/sig-cli/kubectl-conventions.md. +This file is a placeholder to preserve links. Please remove after 2019-07-01 or the release of kubernetes 1.15, whichever comes first.
\ No newline at end of file diff --git a/contributors/devel/kubelet-cri-networking.md b/contributors/devel/kubelet-cri-networking.md index 1446e6d3..148d9ae6 100644 --- a/contributors/devel/kubelet-cri-networking.md +++ b/contributors/devel/kubelet-cri-networking.md @@ -1,56 +1,3 @@ -# Container Runtime Interface (CRI) Networking Specifications +This file has moved to https://git.k8s.io/community/contributors/devel/sig-node/kubelet-cri-networking.md. -## Introduction -[Container Runtime Interface (CRI)](container-runtime-interface.md) is -an ongoing project to allow container -runtimes to integrate with kubernetes via a newly-defined API. This document -specifies the network requirements for container runtime -interface (CRI). CRI networking requirements expand upon kubernetes pod -networking requirements. This document does not specify requirements -from upper layers of kubernetes network stack, such as `Service`. More -background on k8s networking could be found -[here](http://kubernetes.io/docs/admin/networking/) - -## Requirements -1. Kubelet expects the runtime shim to manage pod's network life cycle. Pod -networking should be handled accordingly along with pod sandbox operations. - * `RunPodSandbox` must set up pod's network. This includes, but is not limited -to allocating a pod IP, configuring the pod's network interfaces and default -network route. Kubelet expects the pod sandbox to have an IP which is -routable within the k8s cluster, if `RunPodSandbox` returns successfully. -`RunPodSandbox` must return an error if it fails to set up the pod's network. -If the pod's network has already been set up, `RunPodSandbox` must skip -network setup and proceed. - * `StopPodSandbox` must tear down the pod's network. The runtime shim -must return error on network tear down failure. If pod's network has -already been torn down, `StopPodSandbox` must skip network tear down and proceed. - * `RemovePodSandbox` may tear down pod's network, if the networking has -not been torn down already. `RemovePodSandbox` must return error on -network tear down failure. - * Response from `PodSandboxStatus` must include pod sandbox network status. -The runtime shim must return an empty network status if it failed -to construct a network status. - -2. User supplied pod networking configurations, which are NOT directly -exposed by the kubernetes API, should be handled directly by runtime -shims. For instance, `hairpin-mode`, `cni-bin-dir`, `cni-conf-dir`, `network-plugin`, -`network-plugin-mtu` and `non-masquerade-cidr`. Kubelet will no longer handle -these configurations after the transition to CRI is complete. -3. Network configurations that are exposed through the kubernetes API -are communicated to the runtime shim through `UpdateRuntimeConfig` -interface, e.g. `podCIDR`. For each runtime and network implementation, -some configs may not be applicable. The runtime shim may handle or ignore -network configuration updates from `UpdateRuntimeConfig` interface. - -## Extensibility -* Kubelet is oblivious to how the runtime shim manages networking, i.e -runtime shim is free to use [CNI](https://github.com/containernetworking/cni), -[CNM](https://github.com/docker/libnetwork/blob/master/docs/design.md) or -any other implementation as long as the CRI networking requirements and -k8s networking requirements are satisfied. -* Runtime shims have full visibility into pod networking configurations. -* As more network feature arrives, CRI will evolve. - -## Related Issues -* Kubelet network plugin for client/server container runtimes [#28667](https://github.com/kubernetes/kubernetes/issues/28667) -* CRI networking umbrella issue [#37316](https://github.com/kubernetes/kubernetes/issues/37316) +This file is a placeholder to preserve links. Please remove after 2019-07-01 or the release of kubernetes 1.15, whichever comes first. diff --git a/contributors/devel/kubemark-guide.md b/contributors/devel/kubemark-guide.md index ce5727e8..719e2c76 100644 --- a/contributors/devel/kubemark-guide.md +++ b/contributors/devel/kubemark-guide.md @@ -1,256 +1,3 @@ -# Kubemark User Guide - -## Introduction - -Kubemark is a performance testing tool which allows users to run experiments on -simulated clusters. The primary use case is scalability testing, as simulated -clusters can be much bigger than the real ones. The objective is to expose -problems with the master components (API server, controller manager or -scheduler) that appear only on bigger clusters (e.g. small memory leaks). - -This document serves as a primer to understand what Kubemark is, what it is not, -and how to use it. - -## Architecture - -On a very high level, Kubemark cluster consists of two parts: a real master -and a set of “Hollow” Nodes. The prefix “Hollow” to any component means an -implementation/instantiation of the actual component with all “moving” -parts mocked out. The best example is HollowKubelet, which pretends to be an -ordinary Kubelet, but does not start anything, nor mount any volumes - it just -lies it does. More detailed design and implementation details are at the end -of this document. - -Currently, master components run on a dedicated machine as pods that are -created/managed by kubelet, which itself runs as either a systemd or a supervisord -service on the master VM depending on the VM distro (though currently it is -only systemd as we use a GCI image). Having a dedicated machine for the master -has a slight advantage over running the master components on an external cluster, -which is being able to completely isolate master resources from everything else. -The HollowNodes on the other hand are run on an ‘external’ Kubernetes cluster -as pods in an isolated namespace (named kubemark). This idea of using pods on a -real cluster behave (or act) as nodes on the kubemark cluster lies at the heart of -kubemark's design. - -## Requirements - -To run Kubemark you need a Kubernetes cluster (called `external cluster`) -for running all your HollowNodes and a dedicated machine for a master. -Master machine has to be directly routable from HollowNodes. You also need -access to a Docker repository (which is gcr.io in the case of GCE) that has the -container images for etcd, hollow-node and node-problem-detector. - -Currently, scripts are written to be easily usable by GCE, but it should be -relatively straightforward to port them to different providers or bare metal. -There is an ongoing effort to refactor kubemark code into provider-specific (gce) -and provider-independent code, which should make it relatively simple to run -kubemark clusters on other cloud providers as well. - -## Common use cases and helper scripts - -Common workflow for Kubemark is: -- starting a Kubemark cluster (on GCE) -- running e2e tests on Kubemark cluster -- monitoring test execution and debugging problems -- turning down Kubemark cluster - -(For now) Included in descriptions there will be comments helpful for anyone who’ll -want to port Kubemark to different providers. -(Later) When the refactoring mentioned in the above section finishes, we would replace -these comments with a clean API that would allow kubemark to run on top of any provider. - -### Starting a Kubemark cluster - -To start a Kubemark cluster on GCE you need to create an external kubernetes -cluster (it can be GCE, GKE or anything else) by yourself, make sure that kubeconfig -points to it by default, build a kubernetes release (e.g. by running -`make quick-release`) and run `test/kubemark/start-kubemark.sh` script. -This script will create a VM for master (along with mounted PD and firewall rules set), -then start kubelet and run the pods for the master components. Following this, it -sets up the HollowNodes as Pods on the external cluster and do all the setup necessary -to let them talk to the kubemark apiserver. It will use the configuration stored in -`cluster/kubemark/config-default.sh` - you can tweak it however you want, but note that -some features may not be implemented yet, as implementation of Hollow components/mocks -will probably be lagging behind ‘real’ one. For performance tests interesting variables -are `NUM_NODES` and `KUBEMARK_MASTER_SIZE`. After start-kubemark script is finished, -you’ll have a ready Kubemark cluster, and a kubeconfig file for talking to the Kubemark -cluster is stored in `test/kubemark/resources/kubeconfig.kubemark`. - -Currently we're running HollowNode with a limit of 0.09 CPU core/pod and 220MB of memory. -However, if we also take into account the resources absorbed by default cluster addons -and fluentD running on the 'external' cluster, this limit becomes ~0.1 CPU core/pod, -thus allowing ~10 HollowNodes to run per core (on an "n1-standard-8" VM node). - -#### Behind the scene details: - -start-kubemark.sh script does quite a lot of things: - -- Prepare a master machine named MASTER_NAME (this variable's value should be set by this point): - (*the steps below use gcloud, and should be easy to do outside of GCE*) - 1. Creates a Persistent Disk for use by the master (one more for etcd-events, if flagged) - 2. Creates a static IP address for the master in the cluster and assign it to variable MASTER_IP - 3. Creates a VM instance for the master, configured with the PD and IP created above. - 4. Set firewall rule in the master to open port 443\* for all TCP traffic by default. - -<sub>\* Port 443 is a secured port on the master machine which is used for all -external communication with the API server. In the last sentence *external* -means all traffic coming from other machines, including all the Nodes, not only -from outside of the cluster. Currently local components, i.e. ControllerManager -and Scheduler talk with API server using insecure port 8080.</sub> - -- [Optional to read] Establish necessary certs/keys required for setting up the PKI for kubemark cluster: - (*the steps below are independent of GCE and work for all providers*) - 1. Generate a randomly named temporary directory for storing PKI certs/keys which is delete-trapped on EXIT. - 2. Create a bearer token for 'admin' in master. - 3. Generate certificate for CA and (certificate + private-key) pair for each of master, kubelet and kubecfg. - 4. Generate kubelet and kubeproxy tokens for master. - 5. Write a kubeconfig locally to `test/kubemark/resources/kubeconfig.kubemark` for enabling local kubectl use. - -- Set up environment and start master components (through `start-kubemark-master.sh` script): - (*the steps below use gcloud for SSH and SCP to master, and should be easy to do outside of GCE*) - 1. SSH to the master machine and create a new directory (`/etc/srv/kubernetes`) and write all the - certs/keys/tokens/passwords to it. - 2. SCP all the master pod manifests, shell scripts (`start-kubemark-master.sh`, `configure-kubectl.sh`, etc), - config files for passing env variables (`kubemark-master-env.sh`) from the local machine to the master. - 3. SSH to the master machine and run the startup script `start-kubemark-master.sh` (and possibly others). - - Note: The directory structure and the functions performed by the startup script(s) can vary based on master distro. - We currently support the GCI image `gci-dev-56-8977-0-0` in GCE. - -- Set up and start HollowNodes (as pods) on the external cluster: - (*the steps below (except 2nd and 3rd) are independent of GCE and work for all providers*) - 1. Identify the right kubemark binary from the current kubernetes repo for the platform linux/amd64. - 2. Create a Docker image for HollowNode using this binary and upload it to a remote Docker repository. - (We use gcr.io/ as our remote docker repository in GCE, should be different for other providers) - 3. [One-off] Create and upload a Docker image for NodeProblemDetector (see kubernetes/node-problem-detector repo), - which is one of the containers in the HollowNode pod, besides HollowKubelet and HollowProxy. However we - use it with a hollow config that essentially has an empty set of rules and conditions to be detected. - This step is required only for other cloud providers, as the docker image for GCE already exists on GCR. - 4. Create secret which stores kubeconfig for use by HollowKubelet/HollowProxy, addons, and configMaps - for the HollowNode and the HollowNodeProblemDetector. - 5. Create a ReplicationController for HollowNodes that starts them up, after replacing all variables in - the hollow-node_template.json resource. - 6. Wait until all HollowNodes are in the Running phase. - -### Running e2e tests on Kubemark cluster - -To run standard e2e test on your Kubemark cluster created in the previous step -you execute `test/kubemark/run-e2e-tests.sh` script. It will configure ginkgo to -use Kubemark cluster instead of something else and start an e2e test. This -script should not need any changes to work on other cloud providers. - -By default (if nothing will be passed to it) the script will run a Density '30 -test. If you want to run a different e2e test you just need to provide flags you want to be -passed to `hack/ginkgo-e2e.sh` script, e.g. `--ginkgo.focus="Load"` to run the -Load test. - -By default, at the end of each test, it will delete namespaces and everything -under it (e.g. events, replication controllers) on Kubemark master, which takes -a lot of time. Such work aren't needed in most cases: if you delete your -Kubemark cluster after running `run-e2e-tests.sh`; you don't care about -namespace deletion performance, specifically related to etcd; etc. There is a -flag that enables you to avoid namespace deletion: `--delete-namespace=false`. -Adding the flag should let you see in logs: `Found DeleteNamespace=false, -skipping namespace deletion!` - -### Monitoring test execution and debugging problems - -Run-e2e-tests prints the same output on Kubemark as on ordinary e2e cluster, but -if you need to dig deeper you need to learn how to debug HollowNodes and how -Master machine (currently) differs from the ordinary one. - -If you need to debug master machine you can do similar things as you do on your -ordinary master. The difference between Kubemark setup and ordinary setup is -that in Kubemark etcd is run as a plain docker container, and all master -components are run as normal processes. There's no Kubelet overseeing them. Logs -are stored in exactly the same place, i.e. `/var/logs/` directory. Because -binaries are not supervised by anything they won't be restarted in the case of a -crash. - -To help you with debugging from inside the cluster startup script puts a -`~/configure-kubectl.sh` script on the master. It downloads `gcloud` and -`kubectl` tool and configures kubectl to work on unsecured master port (useful -if there are problems with security). After the script is run you can use -kubectl command from the master machine to play with the cluster. - -Debugging HollowNodes is a bit more tricky, as if you experience a problem on -one of them you need to learn which hollow-node pod corresponds to a given -HollowNode known by the Master. During self-registeration HollowNodes provide -their cluster IPs as Names, which means that if you need to find a HollowNode -named `10.2.4.5` you just need to find a Pod in external cluster with this -cluster IP. There's a helper script -`test/kubemark/get-real-pod-for-hollow-node.sh` that does this for you. - -When you have a Pod name you can use `kubectl logs` on external cluster to get -logs, or use a `kubectl describe pod` call to find an external Node on which -this particular HollowNode is running so you can ssh to it. - -E.g. you want to see the logs of HollowKubelet on which pod `my-pod` is running. -To do so you can execute: - -``` -$ kubectl kubernetes/test/kubemark/resources/kubeconfig.kubemark describe pod my-pod -``` - -Which outputs pod description and among it a line: - -``` -Node: 1.2.3.4/1.2.3.4 -``` - -To learn the `hollow-node` pod corresponding to node `1.2.3.4` you use -aforementioned script: - -``` -$ kubernetes/test/kubemark/get-real-pod-for-hollow-node.sh 1.2.3.4 -``` - -which will output the line: - -``` -hollow-node-1234 -``` - -Now you just use ordinary kubectl command to get the logs: - -``` -kubectl --namespace=kubemark logs hollow-node-1234 -``` - -All those things should work exactly the same on all cloud providers. - -### Turning down Kubemark cluster - -On GCE you just need to execute `test/kubemark/stop-kubemark.sh` script, which -will delete HollowNode ReplicationController and all the resources for you. On -other providers you’ll need to delete all this stuff by yourself. As part of -the effort mentioned above to refactor kubemark into provider-independent and -provider-specific parts, the resource deletion logic specific to the provider -would move out into a clean API. - -## Some current implementation details and future roadmap - -Kubemark master uses exactly the same binaries as ordinary Kubernetes does. This -means that it will never be out of date. On the other hand HollowNodes use -existing fake for Kubelet (called SimpleKubelet), which mocks its runtime -manager with `pkg/kubelet/dockertools/fake_manager.go`, where most logic sits. -Because there's no easy way of mocking other managers (e.g. VolumeManager), they -are not supported in Kubemark (e.g. we can't schedule Pods with volumes in them -yet). - -We currently plan to extend kubemark along the following directions: -- As you would have noticed at places above, we aim to make kubemark more structured - and easy to run across various providers without having to tweak the setup scripts, - using a well-defined kubemark-provider API. -- Allow kubemark to run on various distros (GCI, debian, redhat, etc) for any - given provider. -- Make Kubemark performance on ci-tests mimic real cluster ci-tests on metrics such as - CPU, memory and network bandwidth usage and realizing this goal through measurable - objectives (like the kubemark metric should vary no more than X% with the real - cluster metric). We could also use metrics reported by Prometheus. -- Improve logging of CI-test metrics (such as aggregated API call latencies, scheduling - call latencies, %ile for CPU/mem usage of different master components in density/load - tests) by packing them into well-structured artifacts instead of the (current) dumping - to logs. -- Create a Dashboard that lets easy viewing and comparison of these metrics across tests. +This file has moved to https://git.k8s.io/community/contributors/devel/sig-scalability/kubemark-guide.md. +This file is a placeholder to preserve links. Please remove after 2019-07-01 or the release of kubernetes 1.15, whichever comes first. diff --git a/contributors/devel/logging.md b/contributors/devel/logging.md index c4da6829..a22fc799 100644 --- a/contributors/devel/logging.md +++ b/contributors/devel/logging.md @@ -1,34 +1,3 @@ -## Logging Conventions +This file has moved to https://git.k8s.io/community/contributors/devel/sig-instrumentation/logging.md. -The following conventions for the klog levels to use. -[klog](http://godoc.org/github.com/kubernetes/klog) is globally preferred to -[log](http://golang.org/pkg/log/) for better runtime control. - -* klog.Errorf() - Always an error - -* klog.Warningf() - Something unexpected, but probably not an error - -* klog.Infof() has multiple levels: - * klog.V(0) - Generally useful for this to ALWAYS be visible to an operator - * Programmer errors - * Logging extra info about a panic - * CLI argument handling - * klog.V(1) - A reasonable default log level if you don't want verbosity. - * Information about config (listening on X, watching Y) - * Errors that repeat frequently that relate to conditions that can be corrected (pod detected as unhealthy) - * klog.V(2) - Useful steady state information about the service and important log messages that may correlate to significant changes in the system. This is the recommended default log level for most systems. - * Logging HTTP requests and their exit code - * System state changing (killing pod) - * Controller state change events (starting pods) - * Scheduler log messages - * klog.V(3) - Extended information about changes - * More info about system state changes - * klog.V(4) - Debug level verbosity - * Logging in particularly thorny parts of code where you may want to come back later and check it - * klog.V(5) - Trace level verbosity - * Context to understand the steps leading up to errors and warnings - * More information for troubleshooting reported issues - -As per the comments, the practical default level is V(2). Developers and QE -environments may wish to run at V(3) or V(4). If you wish to change the log -level, you can pass in `-v=X` where X is the desired maximum level to log. +This file is a placeholder to preserve links. Please remove after 2019-07-01 or the release of kubernetes 1.15, whichever comes first. diff --git a/contributors/devel/node-performance-testing.md b/contributors/devel/node-performance-testing.md index d43737a8..e51417ed 100644 --- a/contributors/devel/node-performance-testing.md +++ b/contributors/devel/node-performance-testing.md @@ -1,121 +1,3 @@ -# Measuring Node Performance - -This document outlines the issues and pitfalls of measuring Node performance, as -well as the tools available. - -## Cluster Set-up - -There are lots of factors which can affect node performance numbers, so care -must be taken in setting up the cluster to make the intended measurements. In -addition to taking the following steps into consideration, it is important to -document precisely which setup was used. For example, performance can vary -wildly from commit-to-commit, so it is very important to **document which commit -or version** of Kubernetes was used, which Docker version was used, etc. - -### Addon pods - -Be aware of which addon pods are running on which nodes. By default Kubernetes -runs 8 addon pods, plus another 2 per node (`fluentd-elasticsearch` and -`kube-proxy`) in the `kube-system` namespace. The addon pods can be disabled for -more consistent results, but doing so can also have performance implications. - -For example, Heapster polls each node regularly to collect stats data. Disabling -Heapster will hide the performance cost of serving those stats in the Kubelet. - -#### Disabling Add-ons - -Disabling addons is simple. Just ssh into the Kubernetes master and move the -addon from `/etc/kubernetes/addons/` to a backup location. More details -[here](https://git.k8s.io/kubernetes/cluster/addons/). - -### Which / how many pods? - -Performance will vary a lot between a node with 0 pods and a node with 100 pods. -In many cases you'll want to make measurements with several different amounts of -pods. On a single node cluster scaling a replication controller makes this easy, -just make sure the system reaches a steady-state before starting the -measurement. E.g. `kubectl scale replicationcontroller pause --replicas=100` - -In most cases pause pods will yield the most consistent measurements since the -system will not be affected by pod load. However, in some special cases -Kubernetes has been tuned to optimize pods that are not doing anything, such as -the cAdvisor housekeeping (stats gathering). In these cases, performing a very -light task (such as a simple network ping) can make a difference. - -Finally, you should also consider which features yours pods should be using. For -example, if you want to measure performance with probing, you should obviously -use pods with liveness or readiness probes configured. Likewise for volumes, -number of containers, etc. - -### Other Tips - -**Number of nodes** - On the one hand, it can be easier to manage logs, pods, -environment etc. with a single node to worry about. On the other hand, having -multiple nodes will let you gather more data in parallel for more robust -sampling. - -## E2E Performance Test - -There is an end-to-end test for collecting overall resource usage of node -components: [kubelet_perf.go](https://git.k8s.io/kubernetes/test/e2e/node/kubelet_perf.go). To -run the test, simply make sure you have an e2e cluster running (`go run -hack/e2e.go -- -up`) and [set up](#cluster-set-up) correctly. - -Run the test with `go run hack/e2e.go -- -v -test ---test_args="--ginkgo.focus=resource\susage\stracking"`. You may also wish to -customise the number of pods or other parameters of the test (remember to rerun -`make WHAT=test/e2e/e2e.test` after you do). - -## Profiling - -Kubelet installs the [go pprof handlers](https://golang.org/pkg/net/http/pprof/), which can be queried for CPU profiles: - -```console -$ kubectl proxy & -Starting to serve on 127.0.0.1:8001 -$ curl -G "http://localhost:8001/api/v1/proxy/nodes/${NODE}:10250/debug/pprof/profile?seconds=${DURATION_SECONDS}" > $OUTPUT -$ KUBELET_BIN=_output/dockerized/bin/linux/amd64/kubelet -$ go tool pprof -web $KUBELET_BIN $OUTPUT -``` - -`pprof` can also provide heap usage, from the `/debug/pprof/heap` endpoint -(e.g. `http://localhost:8001/api/v1/proxy/nodes/${NODE}:10250/debug/pprof/heap`). - -More information on go profiling can be found -[here](http://blog.golang.org/profiling-go-programs). - -## Benchmarks - -Before jumping through all the hoops to measure a live Kubernetes node in a real -cluster, it is worth considering whether the data you need can be gathered -through a Benchmark test. Go provides a really simple benchmarking mechanism, -just add a unit test of the form: - -```go -// In foo_test.go -func BenchmarkFoo(b *testing.B) { - b.StopTimer() - setupFoo() // Perform any global setup - b.StartTimer() - for i := 0; i < b.N; i++ { - foo() // Functionality to measure - } -} -``` - -Then: - -```console -$ go test -bench=. -benchtime=${SECONDS}s foo_test.go -``` - -More details on benchmarking [here](https://golang.org/pkg/testing/). - -## TODO - -- (taotao) Measuring docker performance -- Expand cluster set-up section -- (vishh) Measuring disk usage -- (yujuhong) Measuring memory usage -- Add section on monitoring kubelet metrics (e.g. with prometheus) +This file has moved to https://git.k8s.io/community/contributors/devel/sig-node/node-performance-testing.md. +This file is a placeholder to preserve links. Please remove after 2019-07-01 or the release of kubernetes 1.15, whichever comes first. diff --git a/contributors/devel/on-call-federation-build-cop.md b/contributors/devel/on-call-federation-build-cop.md deleted file mode 100644 index c153b02a..00000000 --- a/contributors/devel/on-call-federation-build-cop.md +++ /dev/null @@ -1,109 +0,0 @@ -# Federation Buildcop Guide and Playbook - -Federation runs two classes of tests: CI and Pre-submits. - -## CI - -* These tests run on the HEADs of master and release branches (starting - from Kubernetes v1.7). -* As a result, they run on code that's already merged. -* As the name suggests, they run continuously. Currently, they are - configured to run at least once every 30 minutes. -* Federation CI tests run as periodic jobs on prow. -* CI jobs always run sequentially. In other words, no single CI job - can have two instances of the job running at the same time. -* Latest build results can be viewed in [testgrid](https://k8s-testgrid.appspot.com/sig-multicluster) - -### Configuration - -Configuration steps are described in https://github.com/kubernetes/test-infra#create-a-new-job. -Federation CI e2e job names are as below: -* master branch - `ci-federation-e2e-gce` and `ci-federation-e2e-gce-serial` -* 1.8 release branch - `ci-kubernetes-e2e-gce-federation-release-1-8` -* 1.7 release branch - `ci-kubernetes-e2e-gce-federation-release-1-7` - -Search for the above job names in various configuration files as below: - -* Prow config: https://git.k8s.io/test-infra/prow/config.yaml -* Test job/bootstrap config: https://git.k8s.io/test-infra/jobs/config.json -* Test grid config: https://git.k8s.io/test-infra/testgrid/config.yaml -* Job specific config: https://git.k8s.io/test-infra/jobs/env - -### Results - -Results of all the federation CI tests are listed in the corresponding -tabs on the Cluster Federation page in the testgrid. -https://k8s-testgrid.appspot.com/sig-multicluster - -### Playbook - -#### Triggering a new run - -Please ping someone who has access to the prow project and ask -them to click the `rerun` button from, for example -http://prow.k8s.io/?type=periodic&job=ci-federation-e2e-gce, -and execute the kubectl command. - -#### Quota cleanup - -Please ping someone who has access to the GCP project. Ask them to -look at the quotas and delete the leaked resources by clicking the -delete button corresponding to those leaked resources on Google Cloud -Console. - - -## Pre-submit - -* The pre-submit test is currently configured to run on the master - branch and any release branch that's 1.9 or newer. -* Multiple pre-submit jobs could be running in parallel(one per pr). -* Latest build results can be viewed in [testgrid](https://k8s-testgrid.appspot.com/presubmits-federation) -* We have following pre-submit jobs in federation - * bazel-test - Runs all the bazel test targets in federation. - * e2e-gce - Runs federation e2e tests on gce. - * verify - Runs federation unit, integration tests and few verify scripts. - -### Configuration - -Configuration steps are described in https://github.com/kubernetes/test-infra#create-a-new-job. -Federation pre-submit jobs have following names. -* bazel-test - `pull-federation-bazel-test` -* verify - `pull-federation-verify` -* e2e-gce - `pull-federation-e2e-gce` - -Search for the above job names in various configuration files as below: - -* Prow config: https://git.k8s.io/test-infra/prow/config.yaml -* Test job/bootstrap config: https://git.k8s.io/test-infra/jobs/config.json -* Test grid config: https://git.k8s.io/test-infra/testgrid/config.yaml -* Job specific config: https://git.k8s.io/test-infra/jobs/env - -### Results - -Aggregated results are available on the Gubernator dashboard page for -the federation pre-submit tests. - -https://k8s-gubernator.appspot.com/builds/kubernetes-jenkins/pr-logs/directory/pull-federation-e2e-gce - -### Metrics - -We track the flakiness metrics of all the pre-submit jobs and -individual tests that run against PRs in -[kubernetes/federation](https://github.com/kubernetes/federation). - -* The metrics that we track are documented in https://git.k8s.io/test-infra/metrics/README.md#metrics. -* Job-level metrics are available in http://storage.googleapis.com/k8s-metrics/job-flakes-latest.json. - -### Playbook - -#### Triggering a new run - -Use the `/test` command on the PR to re-trigger the test. The exact -incantation is: `/test pull-federation-e2e-gce` - -#### Quota cleanup - -Please ping someone who has access to `k8s-jkns-pr-bldr-e2e-gce-fdrtn` -GCP project. Ask them to look at the quotas and delete the leaked -resources by clicking the delete button corresponding to those leaked -resources on Google Cloud Console. diff --git a/contributors/devel/profiling.md b/contributors/devel/profiling.md index f7c8b2e5..5c7e5a2e 100644 --- a/contributors/devel/profiling.md +++ b/contributors/devel/profiling.md @@ -1,76 +1,3 @@ -# Profiling Kubernetes +This file has moved to https://git.k8s.io/community/contributors/devel/sig-scalability/profiling.md. -This document explain how to plug in profiler and how to profile Kubernetes services. To get familiar with the tools mentioned below, it is strongly recommended to read [Profiling Go Programs](https://blog.golang.org/profiling-go-programs). - -## Profiling library - -Go comes with inbuilt 'net/http/pprof' profiling library and profiling web service. The way service works is binding debug/pprof/ subtree on a running webserver to the profiler. Reading from subpages of debug/pprof returns pprof-formatted profiles of the running binary. The output can be processed offline by the tool of choice, or used as an input to handy 'go tool pprof', which can graphically represent the result. - -## Adding profiling to services to APIserver. - -TL;DR: Add lines: - -```go -m.mux.HandleFunc("/debug/pprof/", pprof.Index) -m.mux.HandleFunc("/debug/pprof/profile", pprof.Profile) -m.mux.HandleFunc("/debug/pprof/symbol", pprof.Symbol) -``` - -to the `init(c *Config)` method in 'pkg/master/master.go' and import 'net/http/pprof' package. - -In most use cases to use profiler service it's enough to do 'import _ net/http/pprof', which automatically registers a handler in the default http.Server. Slight inconvenience is that APIserver uses default server for intra-cluster communication, so plugging profiler to it is not really useful. In 'pkg/kubelet/server/server.go' more servers are created and started as separate goroutines. The one that is usually serving external traffic is secureServer. The handler for this traffic is defined in 'pkg/master/master.go' and stored in Handler variable. It is created from HTTP multiplexer, so the only thing that needs to be done is adding profiler handler functions to this multiplexer. This is exactly what lines after TL;DR do. - -## Connecting to the profiler - -Even when running profiler I found not really straightforward to use 'go tool pprof' with it. The problem is that at least for dev purposes certificates generated for APIserver are not signed by anyone trusted and because secureServer serves only secure traffic it isn't straightforward to connect to the service. The best workaround I found is by creating an ssh tunnel from the kubernetes_master open unsecured port to some external server, and use this server as a proxy. To save everyone looking for correct ssh flags, it is done by running: - -```sh -ssh kubernetes_master -L<local_port>:localhost:8080 -``` - -or analogous one for you Cloud provider. Afterwards you can e.g. run - -```sh -go tool pprof http://localhost:<local_port>/debug/pprof/profile -``` - -to get 30 sec. CPU profile. - -## Contention profiling - -To enable contention profiling you need to add line `rt.SetBlockProfileRate(1)` in addition to `m.mux.HandleFunc(...)` added before (`rt` stands for `runtime` in `master.go`). This enables 'debug/pprof/block' subpage, which can be used as an input to `go tool pprof`. - -## Profiling in tests - -To gather a profile from a test, the HTTP interface is probably not suitable. Instead, you can add the `-cpuprofile` flag to your KUBE_TEST_ARGS, e.g. - -```sh -make test-integration WHAT="./test/integration/scheduler" KUBE_TEST_ARGS="-cpuprofile cpu.out" -go tool pprof cpu.out -``` - -See the ['go test' flags](https://golang.org/cmd/go/#hdr-Description_of_testing_flags) for how to capture other types of profiles. - -## Profiling in a benchmark test - -Gathering a profile from a benchmark test works in the same way as regular tests, but sometimes there may be expensive setup that you want excluded from the profile. (i.e. any time you would use `b.ResetTimer()`) - -To solve this problem, you can explicitly start the profile in your test code like so. - -```go -func BenchmarkMyFeature(b *testing.B) { - // Expensive test setup... - b.ResetTimer() - f, err := os.Create("bench_profile.out") - if err != nil { - log.Fatal("could not create profile file: ", err) - } - if err := pprof.StartCPUProfile(f); err != nil { - log.Fatal("could not start CPU profile: ", err) - } - defer pprof.StopCPUProfile() - // Rest of the test... -} -``` - -> Note: Code added to a test to gather CPU profiles should not be merged. It is meant to be temporary while you create an analyze profiles. +This file is a placeholder to preserve links. Please remove after 2019-07-01 or the release of kubernetes 1.15, whichever comes first. diff --git a/contributors/devel/release.md b/contributors/devel/release.md index b4e9224e..129efc96 100644 --- a/contributors/devel/release.md +++ b/contributors/devel/release.md @@ -1,307 +1,3 @@ -# Targeting Features, Issues and PRs to Release Milestones +This file has moved to https://git.k8s.io/community/contributors/devel/sig-release/release.md. -This document is focused on Kubernetes developers and contributors -who need to create a feature, issue, or pull request which targets a specific -release milestone. - -- [TL;DR](#tldr) -- [Definitions](#definitions) -- [The Release Cycle](#the-release-cycle) -- [Removal Of Items From The Milestone](#removal-of-items-from-the-milestone) -- [Adding An Item To The Milestone](#adding-an-item-to-the-milestone) - - [Milestone Maintainers](#milestone-maintainers) - - [Feature additions](#feature-additions) - - [Issue additions](#issue-additions) - - [PR Additions](#pr-additions) -- [Other Required Labels](#other-required-labels) - - [SIG Owner Label](#sig-owner-label) - - [Priority Label](#priority-label) - - [Issue Kind Label](#issue-kind-label) - -The process for shepherding features, issues, and pull requests -into a Kubernetes release spans multiple stakeholders: -* the feature, issue, or pull request owner -* SIG leadership -* the release team - -Information on workflows and interactions are described below. - -As the owner of a feature, issue, or pull request (PR), it is your -responsibility to ensure release milestone requirements are met. -Automation and the release team will be in contact with you if -updates are required, but inaction can result in your work being -removed from the milestone. Additional requirements exist when the -target milestone is a prior release (see [cherry pick -process](cherry-picks.md) for more information). - -## TL;DR - -If you want your PR to get merged, it needs the following required labels and milestones, represented here by the Prow /commands it would take to add them: -<table> -<tr> -<td></td> -<td>Normal Dev</td> -<td>Code Freeze</td> -<td>Post-Release</td> -</tr> -<tr> -<td></td> -<td>Weeks 1-8</td> -<td>Weeks 9-11</td> -<td>Weeks 11+</td> -</tr> -<tr> -<td>Required Labels</td> -<td> -<ul> -<!--Weeks 1-8--> -<li>/sig {name}</li> -<li>/kind {type}</li> -<li>/lgtm</li> -<li>/approved</li> -</ul> -</td> -<td> -<ul> -<!--Weeks 9-11--> -<li>/milestone {v1.y}</li> -<li>/sig {name}</li> -<li>/kind {bug, failing-test}</li> -<li>/priority critical-urgent</li> -<li>/lgtm</li> -<li>/approved</li> -</ul> -</td> -<td> -<!--Weeks 11+--> -Return to 'Normal Dev' phase requirements: -<ul> -<li>/sig {name}</li> -<li>/kind {type}</li> -<li>/lgtm</li> -<li>/approved</li> -</ul> - -Merges into the 1.y branch are now [via cherrypicks](https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md), approved by release branch manager. -</td> -<td> -<ul> -</td> -</tr> -</table> - -In the past there was a requirement for a milestone targeted pull -request to have an associated GitHub issue opened, but this is no -longer the case. Features are effectively GitHub issues or -[KEPs](https://git.k8s.io/community/keps) -which lead to subsequent PRs. The general labeling process should -be consistent across artifact types. - ---- - -## Definitions - -- *issue owners*: Creator, assignees, and user who moved the issue into a release milestone. -- *release team*: Each Kubernetes release has a team doing project - management tasks described - [here](https://git.k8s.io/sig-release/release-team/README.md). The - contact info for the team associated with any given release can be - found [here](https://git.k8s.io/sig-release/releases/). -- *Y days*: Refers to business days (using the location local to the release-manager M-F). -- *feature*: see "[Is My Thing a Feature?](http://git.k8s.io/features/README.md#is-my-thing-a-feature) -- *release milestone*: semantic version string or [GitHub milestone](https://help.github.com/articles/associating-milestones-with-issues-and-pull-requests/) referring to a release MAJOR.MINOR vX.Y version. See also [release versioning](http://git.k8s.io/community/contributors/design-proposals/release/versioning.md) -- *release branch*: Git branch "release-X.Y" created for the vX.Y milestone. Created at the time of the vX.Y-beta.0 release and maintained after the release for approximately 9 months with vX.Y.Z patch releases. - -## The Release Cycle - - - -Kubernetes releases currently happen four times per year. The release -process can be thought of as having three main phases: -* Feature Definition -* Implementation -* Stabilization - -But in reality this is an open source and agile project, with feature -planning and implementation happening at all times. Given the -project scale and globally distributed developer base, it is critical -to project velocity to not rely on a trailing stabilization phase and -rather have continuous integration testing which ensures the -project is always stable so that individual commits can be -flagged as having broken something. - -With ongoing feature definition through the year, some set of items -will bubble up as targeting a given release. The **enhancement freeze** -starts ~4 weeks into release cycle. By this point all intended -feature work for the given release has been defined in suitable -planning artifacts in conjunction with the Release Team's [enhancements -lead](https://git.k8s.io/sig-release/release-team/role-handbooks/enhancements/README.md). - -Implementation and bugfixing is ongoing across the cycle, but -culminates in a code freeze period: -* The **code freeze** starts in week ~10 and continues for ~2 weeks. - Only critical bug fixes are accepted into the release codebase. - -There are approximately two weeks following code freeze, and preceding -release, during which all remaining critical issues must be resolved -before release. This also gives time for documentation finalization. - -When the code base is sufficiently stable, the master branch re-opens -for general development and work begins there for the next release -milestone. Any remaining modifications for the current release are cherry -picked from master back to the release branch. The release is built from -the release branch. - -Following release, the [Release Branch -Manager](https://git.k8s.io/sig-release/release-team/role-handbooks/branch-manager/README.md) -cherry picks additional critical fixes from the master branch for -a period of around 9 months, leaving an overlap of three release -versions forward support. Thus, each release is part of a broader -Kubernetes lifecycle: - - - -## Removal Of Items From The Milestone - -Before getting too far into the process for adding an item to the -milestone, please note: - -Members of the Release Team may remove Issues from the milestone -if they or the responsible SIG determine that the issue is not -actually blocking the release and is unlikely to be resolved in a -timely fashion. - -Members of the Release Team may remove PRs from the milestone for -any of the following, or similar, reasons: - -* PR is potentially de-stabilizing and is not needed to resolve a blocking issue; -* PR is a new, late feature PR and has not gone through the features process or the exception process; -* There is no responsible SIG willing to take ownership of the PR and resolve any follow-up issues with it; -* PR is not correctly labelled; -* Work has visibly halted on the PR and delivery dates are uncertain or late. - -While members of the Release Team will help with labelling and -contacting SIG(s), it is the responsibility of the submitter to -categorize PRs, and to secure support from the relevant SIG to -guarantee that any breakage caused by the PR will be rapidly resolved. - -Where additional action is required, an attempt at human to human -escalation will be made by the release team through the following -channels: - -- Comment in GitHub mentioning the SIG team and SIG members as appropriate for the issue type -- Emailing the SIG mailing list - - bootstrapped with group email addresses from the [community sig list](/sig-list.md) - - optionally also directly addressing SIG leadership or other SIG members -- Messaging the SIG's Slack channel - - bootstrapped with the slackchannel and SIG leadership from the [community sig list](/sig-list.md) - - optionally directly "@" mentioning SIG leadership or others by handle - -## Adding An Item To The Milestone - -### Milestone Maintainers - -The members of the GitHub [“kubernetes-milestone-maintainers” -team](https://github.com/orgs/kubernetes/teams/kubernetes-milestone-maintainers/members) -are entrusted with the responsibility of specifying the release milestone on -GitHub artifacts. This group is [maintained by -SIG-Release](https://git.k8s.io/sig-release/release-team/README.md#milestone-maintainers) -and has representation from the various SIGs' leadership. - -### Feature additions - -Feature planning and definition takes many forms today, but a typical -example might be a large piece of work described in a -[KEP](https://git.k8s.io/community/keps), with associated -task issues in GitHub. When the plan has reached an implementable state and -work is underway, the feature or parts thereof are targeted for an upcoming -milestone by creating GitHub issues and marking them with the Prow "/milestone" -command. - -For the first ~4 weeks into the release cycle, the release team's -Enhancements Lead will interact with SIGs and feature owners via GitHub, -Slack, and SIG meetings to capture all required planning artifacts. - -If you have a feature to target for an upcoming release milestone, begin a -conversation with your SIG leadership and with that release's Enhancements -Lead. - -### Issue additions - -Issues are marked as targeting a milestone via the Prow -"/milestone" command. - -The release team's [Bug Triage -Lead](https://git.k8s.io/sig-release/release-team/role-handbooks/bug-triage/README.md) and overall community watch -incoming issues and triage them, as described in the contributor -guide section on [issue triage](/contributors/guide/issue-triage.md). - -Marking issues with the milestone provides the community better -visibility regarding when an issue was observed and by when the community -feels it must be resolved. During code freeze, to merge a PR it is required -that a release milestone is set. - -An open issue is no longer required for a PR, but open issues and -associated PRs should have synchronized labels. For example a high -priority bug issue might not have its associated PR merged if the PR is -only marked as lower priority. - -### PR Additions - -PRs are marked as targeting a milestone via the Prow -"/milestone" command. - -This is a blocking requirement during code freeze as described above. - -## Other Required Labels - -*Note* [Here is the list of labels and their use and purpose.](https://git.k8s.io/test-infra/label_sync/labels.md#labels-that-apply-to-all-repos-for-both-issues-and-prs) - -### SIG Owner Label - -The SIG owner label defines the SIG to which we escalate if a -milestone issue is languishing or needs additional attention. If -there are no updates after escalation, the issue may be automatically -removed from the milestone. - -These are added with the Prow "/sig" command. For example to add -the label indicating SIG Storage is responsible, comment with `/sig -storage`. - -### Priority Label - -Priority labels are used to determine an escalation path before -moving issues out of the release milestone. They are also used to -determine whether or not a release should be blocked on the resolution -of the issue. - -- `priority/critical-urgent`: Never automatically move out of a release milestone; continually escalate to contributor and SIG through all available channels. - - considered a release blocking issue - - code freeze: issue owner update frequency: daily - - would require a patch release if left undiscovered until after the minor release. -- `priority/important-soon`: Escalate to the issue owners and SIG owner; move out of milestone after several unsuccessful escalation attempts. - - not considered a release blocking issue - - would not require a patch release - - will automatically be moved out of the release milestone at code freeze after a 4 day grace period -- `priority/important-longterm`: Escalate to the issue owners; move out of the milestone after 1 attempt. - - even less urgent / critical than `priority/important-soon` - - moved out of milestone more aggressively than `priority/important-soon` - -### Issue/PR Kind Label - -The issue kind is used to help identify the types of changes going -into the release over time. This may allow the release team to -develop a better understanding of what sorts of issues we would -miss with a faster release cadence. - -For release targeted issues, including pull requests, one of the following -issue kind labels must be set: - -- `kind/api-change`: Adds, removes, or changes an API -- `kind/bug`: Fixes a newly discovered bug. -- `kind/cleanup`: Adding tests, refactoring, fixing old bugs. -- `kind/design`: Related to design -- `kind/documentation`: Adds documentation -- `kind/failing-test`: CI test case is failing consistently. -- `kind/feature`: New functionality. -- `kind/flake`: CI test case is showing intermittent failures. +This file is a placeholder to preserve links. Please remove after 2019-07-01 or the release of kubernetes 1.15, whichever comes first. diff --git a/contributors/devel/scheduler.md b/contributors/devel/scheduler.md index 486b04a9..479f5136 100644 --- a/contributors/devel/scheduler.md +++ b/contributors/devel/scheduler.md @@ -1,90 +1,3 @@ -# The Kubernetes Scheduler - -The Kubernetes scheduler runs as a process alongside the other master components such as the API server. -Its interface to the API server is to watch for Pods with an empty PodSpec.NodeName, -and for each Pod, it posts a binding indicating where the Pod should be scheduled. - -## Exploring the code - -We are dividing scheduler into three layers from high level: -- [cmd/kube-scheduler/scheduler.go](http://releases.k8s.io/HEAD/cmd/kube-scheduler/scheduler.go): - This is the main() entry that does initialization before calling the scheduler framework. -- [pkg/scheduler/scheduler.go](http://releases.k8s.io/HEAD/pkg/scheduler/scheduler.go): - This is the scheduler framework that handles stuff (e.g. binding) beyond the scheduling algorithm. -- [pkg/scheduler/core/generic_scheduler.go](http://releases.k8s.io/HEAD/pkg/scheduler/core/generic_scheduler.go): - The scheduling algorithm that assigns nodes for pods. - -## The scheduling algorithm - -``` -For given pod: - - +---------------------------------------------+ - | Schedulable nodes: | - | | - | +--------+ +--------+ +--------+ | - | | node 1 | | node 2 | | node 3 | | - | +--------+ +--------+ +--------+ | - | | - +-------------------+-------------------------+ - | - | - v - +-------------------+-------------------------+ - - Pred. filters: node 3 doesn't have enough resource - - +-------------------+-------------------------+ - | - | - v - +-------------------+-------------------------+ - | remaining nodes: | - | +--------+ +--------+ | - | | node 1 | | node 2 | | - | +--------+ +--------+ | - | | - +-------------------+-------------------------+ - | - | - v - +-------------------+-------------------------+ - - Priority function: node 1: p=2 - node 2: p=5 - - +-------------------+-------------------------+ - | - | - v - select max{node priority} = node 2 -``` - -The Scheduler tries to find a node for each Pod, one at a time. -- First it applies a set of "predicates" to filter out inappropriate nodes. For example, if the PodSpec specifies resource requests, then the scheduler will filter out nodes that don't have at least that much resources available (computed as the capacity of the node minus the sum of the resource requests of the containers that are already running on the node). -- Second, it applies a set of "priority functions" -that rank the nodes that weren't filtered out by the predicate check. For example, it tries to spread Pods across nodes and zones while at the same time favoring the least (theoretically) loaded nodes (where "load" - in theory - is measured as the sum of the resource requests of the containers running on the node, divided by the node's capacity). -- Finally, the node with the highest priority is chosen (or, if there are multiple such nodes, then one of them is chosen at random). The code for this main scheduling loop is in the function `Schedule()` in [pkg/scheduler/core/generic_scheduler.go](http://releases.k8s.io/HEAD/pkg/scheduler/core/generic_scheduler.go) - -### Predicates and priorities policies - -Predicates are a set of policies applied one by one to filter out inappropriate nodes. -Priorities are a set of policies applied one by one to rank nodes (that made it through the filter of the predicates). -By default, Kubernetes provides built-in predicates and priorities policies documented in [scheduler_algorithm.md](scheduler_algorithm.md). -The predicates and priorities code are defined in [pkg/scheduler/algorithm/predicates/predicates.go](http://releases.k8s.io/HEAD/pkg/scheduler/algorithm/predicates/predicates.go) and [pkg/scheduler/algorithm/priorities](http://releases.k8s.io/HEAD/pkg/scheduler/algorithm/priorities/) , respectively. - - -## Scheduler extensibility - -The scheduler is extensible: the cluster administrator can choose which of the pre-defined -scheduling policies to apply, and can add new ones. - -### Modifying policies - -The policies that are applied when scheduling can be chosen in one of two ways. -The default policies used are selected by the functions `defaultPredicates()` and `defaultPriorities()` in -[pkg/scheduler/algorithmprovider/defaults/defaults.go](http://releases.k8s.io/HEAD/pkg/scheduler/algorithmprovider/defaults/defaults.go). -However, the choice of policies can be overridden by passing the command-line flag `--policy-config-file` to the scheduler, pointing to a JSON file specifying which scheduling policies to use. See [examples/scheduler-policy-config.json](https://git.k8s.io/examples/staging/scheduler-policy-config.json) for an example -config file. (Note that the config file format is versioned; the API is defined in [pkg/scheduler/api](http://releases.k8s.io/HEAD/pkg/scheduler/api/)). -Thus to add a new scheduling policy, you should modify [pkg/scheduler/algorithm/predicates/predicates.go](http://releases.k8s.io/HEAD/pkg/scheduler/algorithm/predicates/predicates.go) or add to the directory [pkg/scheduler/algorithm/priorities](http://releases.k8s.io/HEAD/pkg/scheduler/algorithm/priorities/), and either register the policy in `defaultPredicates()` or `defaultPriorities()`, or use a policy config file. +This file has moved to https://git.k8s.io/community/contributors/devel/sig-scheduling/scheduler.md. +This file is a placeholder to preserve links. Please remove after 2019-07-01 or the release of kubernetes 1.15, whichever comes first. diff --git a/contributors/devel/scheduler_algorithm.md b/contributors/devel/scheduler_algorithm.md index e6596b47..a64a2c3b 100644 --- a/contributors/devel/scheduler_algorithm.md +++ b/contributors/devel/scheduler_algorithm.md @@ -1,40 +1,3 @@ -# Scheduler Algorithm in Kubernetes - -For each unscheduled Pod, the Kubernetes scheduler tries to find a node across the cluster according to a set of rules. A general introduction to the Kubernetes scheduler can be found at [scheduler.md](scheduler.md). In this document, the algorithm of how to select a node for the Pod is explained. There are two steps before a destination node of a Pod is chosen. The first step is filtering all the nodes and the second is ranking the remaining nodes to find a best fit for the Pod. - -## Filtering the nodes - -The purpose of filtering the nodes is to filter out the nodes that do not meet certain requirements of the Pod. For example, if the free resource on a node (measured by the capacity minus the sum of the resource requests of all the Pods that already run on the node) is less than the Pod's required resource, the node should not be considered in the ranking phase so it is filtered out. Currently, there are several "predicates" implementing different filtering policies, including: - -- `NoDiskConflict`: Evaluate if a pod can fit due to the volumes it requests, and those that are already mounted. Currently supported volumes are: AWS EBS, GCE PD, ISCSI and Ceph RBD. Only Persistent Volume Claims for those supported types are checked. Persistent Volumes added directly to pods are not evaluated and are not constrained by this policy. -- `NoVolumeZoneConflict`: Evaluate if the volumes a pod requests are available on the node, given the Zone restrictions. -- `PodFitsResources`: Check if the free resource (CPU and Memory) meets the requirement of the Pod. The free resource is measured by the capacity minus the sum of requests of all Pods on the node. To learn more about the resource QoS in Kubernetes, please check [QoS proposal](../design-proposals/node/resource-qos.md). -- `PodFitsHostPorts`: Check if any HostPort required by the Pod is already occupied on the node. -- `HostName`: Filter out all nodes except the one specified in the PodSpec's NodeName field. -- `MatchNodeSelector`: Check if the labels of the node match the labels specified in the Pod's `nodeSelector` field and, as of Kubernetes v1.2, also match the `nodeAffinity` if present. See [here](https://kubernetes.io/docs/user-guide/node-selection/) for more details on both. -- `MaxEBSVolumeCount`: Ensure that the number of attached ElasticBlockStore volumes does not exceed a maximum value (by default, 39, since Amazon recommends a maximum of 40 with one of those 40 reserved for the root volume -- see [Amazon's documentation](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/volume_limits.html#linux-specific-volume-limits)). The maximum value can be controlled by setting the `KUBE_MAX_PD_VOLS` environment variable. -- `MaxGCEPDVolumeCount`: Ensure that the number of attached GCE PersistentDisk volumes does not exceed a maximum value (by default, 16, which is the maximum GCE allows -- see [GCE's documentation](https://cloud.google.com/compute/docs/disks/persistent-disks#limits_for_predefined_machine_types)). The maximum value can be controlled by setting the `KUBE_MAX_PD_VOLS` environment variable. -- `CheckNodeMemoryPressure`: Check if a pod can be scheduled on a node reporting memory pressure condition. Currently, no ``BestEffort`` pods should be placed on a node under memory pressure as it gets automatically evicted by kubelet. -- `CheckNodeDiskPressure`: Check if a pod can be scheduled on a node reporting disk pressure condition. Currently, no pods should be placed on a node under disk pressure as it gets automatically evicted by kubelet. - -The details of the above predicates can be found in [pkg/scheduler/algorithm/predicates/predicates.go](http://releases.k8s.io/HEAD/pkg/scheduler/algorithm/predicates/predicates.go). All predicates mentioned above can be used in combination to perform a sophisticated filtering policy. Kubernetes uses some, but not all, of these predicates by default. You can see which ones are used by default in [pkg/scheduler/algorithmprovider/defaults/defaults.go](http://releases.k8s.io/HEAD/pkg/scheduler/algorithmprovider/defaults/defaults.go). - -## Ranking the nodes - -The filtered nodes are considered suitable to host the Pod, and it is often that there are more than one nodes remaining. Kubernetes prioritizes the remaining nodes to find the "best" one for the Pod. The prioritization is performed by a set of priority functions. For each remaining node, a priority function gives a score which scales from 0-10 with 10 representing for "most preferred" and 0 for "least preferred". Each priority function is weighted by a positive number and the final score of each node is calculated by adding up all the weighted scores. For example, suppose there are two priority functions, `priorityFunc1` and `priorityFunc2` with weighting factors `weight1` and `weight2` respectively, the final score of some NodeA is: - - finalScoreNodeA = (weight1 * priorityFunc1) + (weight2 * priorityFunc2) - -After the scores of all nodes are calculated, the node with highest score is chosen as the host of the Pod. If there are more than one nodes with equal highest scores, a random one among them is chosen. - -Currently, Kubernetes scheduler provides some practical priority functions, including: - -- `LeastRequestedPriority`: The node is prioritized based on the fraction of the node that would be free if the new Pod were scheduled onto the node. (In other words, (capacity - sum of requests of all Pods already on the node - request of Pod that is being scheduled) / capacity). CPU and memory are equally weighted. The node with the highest free fraction is the most preferred. Note that this priority function has the effect of spreading Pods across the nodes with respect to resource consumption. -- `BalancedResourceAllocation`: This priority function tries to put the Pod on a node such that the CPU and Memory utilization rate is balanced after the Pod is deployed. -- `SelectorSpreadPriority`: Spread Pods by minimizing the number of Pods belonging to the same service, replication controller, or replica set on the same node. If zone information is present on the nodes, the priority will be adjusted so that pods are spread across zones and nodes. -- `CalculateAntiAffinityPriority`: Spread Pods by minimizing the number of Pods belonging to the same service on nodes with the same value for a particular label. -- `ImageLocalityPriority`: Nodes are prioritized based on locality of images requested by a pod. Nodes with larger size of already-installed packages required by the pod will be preferred over nodes with no already-installed packages required by the pod or a small total size of already-installed packages required by the pod. -- `NodeAffinityPriority`: (Kubernetes v1.2) Implements `preferredDuringSchedulingIgnoredDuringExecution` node affinity; see [here](https://kubernetes.io/docs/user-guide/node-selection/) for more details. - -The details of the above priority functions can be found in [pkg/scheduler/algorithm/priorities](http://releases.k8s.io/HEAD/pkg/scheduler/algorithm/priorities/). Kubernetes uses some, but not all, of these priority functions by default. You can see which ones are used by default in [pkg/scheduler/algorithmprovider/defaults/defaults.go](http://releases.k8s.io/HEAD/pkg/scheduler/algorithmprovider/defaults/defaults.go). Similar as predicates, you can combine the above priority functions and assign weight factors (positive number) to them as you want (check [scheduler.md](scheduler.md) for how to customize). +This file has moved to https://git.k8s.io/community/contributors/devel/sig-scheduling/scheduler_algorithm.md. +This file is a placeholder to preserve links. Please remove after 2019-07-01 or the release of kubernetes 1.15, whichever comes first. diff --git a/contributors/devel/sig-api-machinery/OWNERS b/contributors/devel/sig-api-machinery/OWNERS new file mode 100644 index 00000000..ef142b0f --- /dev/null +++ b/contributors/devel/sig-api-machinery/OWNERS @@ -0,0 +1,8 @@ +# See the OWNERS docs at https://go.k8s.io/owners + +reviewers: + - sig-api-machinery-leads +approvers: + - sig-api-machinery-leads +labels: + - sig/api-machinery diff --git a/contributors/devel/sig-api-machinery/controllers.md b/contributors/devel/sig-api-machinery/controllers.md new file mode 100644 index 00000000..268e0d10 --- /dev/null +++ b/contributors/devel/sig-api-machinery/controllers.md @@ -0,0 +1,191 @@ +# Writing Controllers + +A Kubernetes controller is an active reconciliation process. That is, it watches some object for the world's desired state, and it watches the world's actual state, too. Then, it sends instructions to try and make the world's current state be more like the desired state. + +The simplest implementation of this is a loop: + +```go +for { + desired := getDesiredState() + current := getCurrentState() + makeChanges(desired, current) +} +``` + +Watches, etc, are all merely optimizations of this logic. + +## Guidelines + +When you're writing controllers, there are few guidelines that will help make sure you get the results and performance you're looking for. + +1. Operate on one item at a time. If you use a `workqueue.Interface`, you'll be able to queue changes for a particular resource and later pop them in multiple “worker” gofuncs with a guarantee that no two gofuncs will work on the same item at the same time. + + Many controllers must trigger off multiple resources (I need to "check X if Y changes"), but nearly all controllers can collapse those into a queue of “check this X” based on relationships. For instance, a ReplicaSet controller needs to react to a pod being deleted, but it does that by finding the related ReplicaSets and queuing those. + +1. Random ordering between resources. When controllers queue off multiple types of resources, there is no guarantee of ordering amongst those resources. + + Distinct watches are updated independently. Even with an objective ordering of “created resourceA/X” and “created resourceB/Y”, your controller could observe “created resourceB/Y” and “created resourceA/X”. + +1. Level driven, not edge driven. Just like having a shell script that isn't running all the time, your controller may be off for an indeterminate amount of time before running again. + + If an API object appears with a marker value of `true`, you can't count on having seen it turn from `false` to `true`, only that you now observe it being `true`. Even an API watch suffers from this problem, so be sure that you're not counting on seeing a change unless your controller is also marking the information it last made the decision on in the object's status. + +1. Use `SharedInformers`. `SharedInformers` provide hooks to receive notifications of adds, updates, and deletes for a particular resource. They also provide convenience functions for accessing shared caches and determining when a cache is primed. + + Use the factory methods down in https://git.k8s.io/kubernetes/staging/src/k8s.io/client-go/informers/factory.go to ensure that you are sharing the same instance of the cache as everyone else. + + This saves us connections against the API server, duplicate serialization costs server-side, duplicate deserialization costs controller-side, and duplicate caching costs controller-side. + + You may see other mechanisms like reflectors and deltafifos driving controllers. Those were older mechanisms that we later used to build the `SharedInformers`. You should avoid using them in new controllers. + +1. Never mutate original objects! Caches are shared across controllers, this means that if you mutate your "copy" (actually a reference or shallow copy) of an object, you'll mess up other controllers (not just your own). + + The most common point of failure is making a shallow copy, then mutating a map, like `Annotations`. Use `api.Scheme.Copy` to make a deep copy. + +1. Wait for your secondary caches. Many controllers have primary and secondary resources. Primary resources are the resources that you'll be updating `Status` for. Secondary resources are resources that you'll be managing (creating/deleting) or using for lookups. + + Use the `framework.WaitForCacheSync` function to wait for your secondary caches before starting your primary sync functions. This will make sure that things like a Pod count for a ReplicaSet isn't working off of known out of date information that results in thrashing. + +1. There are other actors in the system. Just because you haven't changed an object doesn't mean that somebody else hasn't. + + Don't forget that the current state may change at any moment--it's not sufficient to just watch the desired state. If you use the absence of objects in the desired state to indicate that things in the current state should be deleted, make sure you don't have a bug in your observation code (e.g., act before your cache has filled). + +1. Percolate errors to the top level for consistent re-queuing. We have a `workqueue.RateLimitingInterface` to allow simple requeuing with reasonable backoffs. + + Your main controller func should return an error when requeuing is necessary. When it isn't, it should use `utilruntime.HandleError` and return nil instead. This makes it very easy for reviewers to inspect error handling cases and to be confident that your controller doesn't accidentally lose things it should retry for. + +1. Watches and Informers will “sync”. Periodically, they will deliver every matching object in the cluster to your `Update` method. This is good for cases where you may need to take additional action on the object, but sometimes you know there won't be more work to do. + + In cases where you are *certain* that you don't need to requeue items when there are no new changes, you can compare the resource version of the old and new objects. If they are the same, you skip requeuing the work. Be careful when you do this. If you ever skip requeuing your item on failures, you could fail, not requeue, and then never retry that item again. + +1. If the primary resource your controller is reconciling supports ObservedGeneration in its status, make sure you correctly set it to metadata.Generation whenever the values between the two fields mismatches. + + This lets clients know that the controller has processed a resource. Make sure that your controller is the main controller that is responsible for that resource, otherwise if you need to communicate observation via your own controller, you will need to create a different kind of ObservedGeneration in the Status of the resource. + +1. Consider using owner references for resources that result in the creation of other resources (eg. a ReplicaSet results in creating Pods). Thus you ensure that children resources are going to be garbage-collected once a resource managed by your controller is deleted. For more information on owner references, read more [here](/contributors/design-proposals/api-machinery/controller-ref.md). + + Pay special attention in the way you are doing adoption. You shouldn't adopt children for a resource when either the parent or the children are marked for deletion. If you are using a cache for your resources, you will likely need to bypass it with a direct API read in case you observe that an owner reference has been updated for one of the children. Thus, you ensure your controller is not racing with the garbage collector. + + See [k8s.io/kubernetes/pull/42938](https://github.com/kubernetes/kubernetes/pull/42938) for more information. + +## Rough Structure + +Overall, your controller should look something like this: + +```go +type Controller struct { + // pods gives cached access to pods. + pods informers.PodLister + podsSynced cache.InformerSynced + + // queue is where incoming work is placed to de-dup and to allow "easy" + // rate limited requeues on errors + queue workqueue.RateLimitingInterface +} + +func NewController(pods informers.PodInformer) *Controller { + c := &Controller{ + pods: pods.Lister(), + podsSynced: pods.Informer().HasSynced, + queue: workqueue.NewNamedRateLimitingQueue(workqueue.DefaultControllerRateLimiter(), "controller-name"), + } + + // register event handlers to fill the queue with pod creations, updates and deletions + pods.Informer().AddEventHandler(cache.ResourceEventHandlerFuncs{ + AddFunc: func(obj interface{}) { + key, err := cache.MetaNamespaceKeyFunc(obj) + if err == nil { + c.queue.Add(key) + } + }, + UpdateFunc: func(old interface{}, new interface{}) { + key, err := cache.MetaNamespaceKeyFunc(new) + if err == nil { + c.queue.Add(key) + } + }, + DeleteFunc: func(obj interface{}) { + // IndexerInformer uses a delta nodeQueue, therefore for deletes we have to use this + // key function. + key, err := cache.DeletionHandlingMetaNamespaceKeyFunc(obj) + if err == nil { + c.queue.Add(key) + } + }, + },) + + return c +} + +func (c *Controller) Run(threadiness int, stopCh chan struct{}) { + // don't let panics crash the process + defer utilruntime.HandleCrash() + // make sure the work queue is shutdown which will trigger workers to end + defer c.queue.ShutDown() + + glog.Infof("Starting <NAME> controller") + + // wait for your secondary caches to fill before starting your work + if !cache.WaitForCacheSync(stopCh, c.podsSynced) { + return + } + + // start up your worker threads based on threadiness. Some controllers + // have multiple kinds of workers + for i := 0; i < threadiness; i++ { + // runWorker will loop until "something bad" happens. The .Until will + // then rekick the worker after one second + go wait.Until(c.runWorker, time.Second, stopCh) + } + + // wait until we're told to stop + <-stopCh + glog.Infof("Shutting down <NAME> controller") +} + +func (c *Controller) runWorker() { + // hot loop until we're told to stop. processNextWorkItem will + // automatically wait until there's work available, so we don't worry + // about secondary waits + for c.processNextWorkItem() { + } +} + +// processNextWorkItem deals with one key off the queue. It returns false +// when it's time to quit. +func (c *Controller) processNextWorkItem() bool { + // pull the next work item from queue. It should be a key we use to lookup + // something in a cache + key, quit := c.queue.Get() + if quit { + return false + } + // you always have to indicate to the queue that you've completed a piece of + // work + defer c.queue.Done(key) + + // do your work on the key. This method will contains your "do stuff" logic + err := c.syncHandler(key.(string)) + if err == nil { + // if you had no error, tell the queue to stop tracking history for your + // key. This will reset things like failure counts for per-item rate + // limiting + c.queue.Forget(key) + return true + } + + // there was a failure so be sure to report it. This method allows for + // pluggable error handling which can be used for things like + // cluster-monitoring + utilruntime.HandleError(fmt.Errorf("%v failed with : %v", key, err)) + + // since we failed, we should requeue the item to work on later. This + // method will add a backoff to avoid hotlooping on particular items + // (they're probably still not going to work right away) and overall + // controller protection (everything I've done is broken, this controller + // needs to calm down or it can starve other useful work) cases. + c.queue.AddRateLimited(key) + + return true +} +``` diff --git a/contributors/devel/sig-api-machinery/generating-clientset.md b/contributors/devel/sig-api-machinery/generating-clientset.md new file mode 100644 index 00000000..bf12e92c --- /dev/null +++ b/contributors/devel/sig-api-machinery/generating-clientset.md @@ -0,0 +1,50 @@ +# Generation and release cycle of clientset + +Client-gen is an automatic tool that generates [clientset](../design-proposals/api-machinery/client-package-structure.md#high-level-client-sets) based on API types. This doc introduces the use of client-gen, and the release cycle of the generated clientsets. + +## Using client-gen + +The workflow includes three steps: + +**1.** Marking API types with tags: in `pkg/apis/${GROUP}/${VERSION}/types.go`, mark the types (e.g., Pods) that you want to generate clients for with the `// +genclient` tag. If the resource associated with the type is not namespace scoped (e.g., PersistentVolume), you need to append the `// +genclient:nonNamespaced` tag as well. + +The following `// +genclient` are supported: + +- `// +genclient` - generate default client verb functions (*create*, *update*, *delete*, *get*, *list*, *update*, *patch*, *watch* and depending on the existence of `.Status` field in the type the client is generated for also *updateStatus*). +- `// +genclient:nonNamespaced` - all verb functions are generated without namespace. +- `// +genclient:onlyVerbs=create,get` - only listed verb functions will be generated. +- `// +genclient:skipVerbs=watch` - all default client verb functions will be generated **except** *watch* verb. +- `// +genclient:noStatus` - skip generation of *updateStatus* verb even thought the `.Status` field exists. + +In some cases you want to generate non-standard verbs (eg. for sub-resources). To do that you can use the following generator tag: + +- `// +genclient:method=Scale,verb=update,subresource=scale,input=k8s.io/api/extensions/v1beta1.Scale,result=k8s.io/api/extensions/v1beta1.Scale` - in this case a new function `Scale(string, *v1beta.Scale) *v1beta.Scale` will be added to the default client and the body of the function will be based on the *update* verb. The optional *subresource* argument will make the generated client function use subresource `scale`. Using the optional *input* and *result* arguments you can override the default type with a custom type. If the import path is not given, the generator will assume the type exists in the same package. + +In addition, the following optional tags influence the client generation: + +- `// +groupName=policy.authorization.k8s.io` – used in the fake client as the full group name (defaults to the package name), +- `// +groupGoName=AuthorizationPolicy` – a CamelCase Golang identifier to de-conflict groups with non-unique prefixes like `policy.authorization.k8s.io` and `policy.k8s.io`. These would lead to two `Policy()` methods in the clientset otherwise (defaults to the upper-case first segement of the group name). + +**2a.** If you are developing in the k8s.io/kubernetes repository, you just need to run hack/update-codegen.sh. + +**2b.** If you are running client-gen outside of k8s.io/kubernetes, you need to use the command line argument `--input` to specify the groups and versions of the APIs you want to generate clients for, client-gen will then look into `pkg/apis/${GROUP}/${VERSION}/types.go` and generate clients for the types you have marked with the `genclient` tags. For example, to generated a clientset named "my_release" including clients for api/v1 objects and extensions/v1beta1 objects, you need to run: + +``` +$ client-gen --input="api/v1,extensions/v1beta1" --clientset-name="my_release" +``` + +**3.** ***Adding expansion methods***: client-gen only generates the common methods, such as CRUD. You can manually add additional methods through the expansion interface. For example, this [file](https://git.k8s.io/kubernetes/pkg/client/clientset_generated/internalclientset/typed/core/internalversion/pod_expansion.go) adds additional methods to Pod's client. As a convention, we put the expansion interface and its methods in file ${TYPE}_expansion.go. In most cases, you don't want to remove existing expansion files. So to make life easier, instead of creating a new clientset from scratch, ***you can copy and rename an existing clientset (so that all the expansion files are copied)***, and then run client-gen. + +## Output of client-gen + +- clientset: the clientset will be generated at `pkg/client/clientset_generated/` by default, and you can change the path via the `--clientset-path` command line argument. + +- Individual typed clients and client for group: They will be generated at `pkg/client/clientset_generated/${clientset_name}/typed/generated/${GROUP}/${VERSION}/` + +## Released clientsets + +If you are contributing code to k8s.io/kubernetes, try to use the generated clientset [here](https://git.k8s.io/kubernetes/pkg/client/clientset_generated/internalclientset). + +If you need a stable Go client to build your own project, please refer to the [client-go repository](https://github.com/kubernetes/client-go). + +We are migrating k8s.io/kubernetes to use client-go as well, see issue [#35159](https://github.com/kubernetes/kubernetes/issues/35159). diff --git a/contributors/devel/sig-api-machinery/strategic-merge-patch.md b/contributors/devel/sig-api-machinery/strategic-merge-patch.md new file mode 100644 index 00000000..4f45ef8e --- /dev/null +++ b/contributors/devel/sig-api-machinery/strategic-merge-patch.md @@ -0,0 +1,449 @@ +Strategic Merge Patch +===================== + +# Background + +Kubernetes supports a customized version of JSON merge patch called strategic merge patch. This +patch format is used by `kubectl apply`, `kubectl edit` and `kubectl patch`, and contains +specialized directives to control how specific fields are merged. + +In the standard JSON merge patch, JSON objects are always merged but lists are +always replaced. Often that isn't what we want. Let's say we start with the +following Pod: + +```yaml +spec: + containers: + - name: nginx + image: nginx-1.0 +``` + +and we POST that to the server (as JSON). Then let's say we want to *add* a +container to this Pod. + +```yaml +PATCH /api/v1/namespaces/default/pods/pod-name +spec: + containers: + - name: log-tailer + image: log-tailer-1.0 +``` + +If we were to use standard Merge Patch, the entire container list would be +replaced with the single log-tailer container. However, our intent is for the +container lists to merge together based on the `name` field. + +To solve this problem, Strategic Merge Patch uses the go struct tag of the API +objects to determine what lists should be merged and which ones should not. +The metadata is available as struct tags on the API objects +themselves and also available to clients as [OpenAPI annotations](https://github.com/kubernetes/kubernetes/blob/master/api/openapi-spec/README.md#x-kubernetes-patch-strategy-and-x-kubernetes-patch-merge-key). +In the above example, the `patchStrategy` metadata for the `containers` +field would be `merge` and the `patchMergeKey` would be `name`. + + +# Basic Patch Format + +Strategic Merge Patch supports special operations through directives. + +There are multiple directives: + +- replace +- merge +- delete +- delete from primitive list + +`replace`, `merge` and `delete` are mutual exclusive. + +## `replace` Directive + +### Purpose + +`replace` directive indicates that the element that contains it should be replaced instead of being merged. + +### Syntax + +`replace` directive is used in both patch with directive marker and go struct tags. + +Example usage in the patch: + +``` +$patch: replace +``` + +### Example + +`replace` directive can be used on both map and list. + +#### Map + +To indicate that a map should not be merged and instead should be taken literally: + +```yaml +$patch: replace # recursive and applies to all fields of the map it's in +containers: +- name: nginx + image: nginx-1.0 +``` + +#### List of Maps + +To override the container list to be strictly replaced, regardless of the default: + +```yaml +containers: + - name: nginx + image: nginx-1.0 + - $patch: replace # any further $patch operations nested in this list will be ignored +``` + + +## `delete` Directive + +### Purpose + +`delete` directive indicates that the element that contains it should be deleted. + +### Syntax + +`delete` directive is used only in the patch with directive marker. +It can be used on both map and list of maps. +``` +$patch: delete +``` + +### Example + +#### List of Maps + +To delete an element of a list that should be merged: + +```yaml +containers: + - name: nginx + image: nginx-1.0 + - $patch: delete + name: log-tailer # merge key and value goes here +``` + +Note: Delete operation will delete all entries in the list that match the merge key. + +#### Maps + +One way to delete a map is using `delete` directive. +Applying this patch will delete the rollingUpdate map. +```yaml +rollingUpdate: + $patch: delete +``` + +An equivalent way to delete this map is +```yaml +rollingUpdate: null +``` + +## `merge` Directive + +### Purpose + +`merge` directive indicates that the element that contains it should be merged instead of being replaced. + +### Syntax + +`merge` directive is used only in the go struct tags. + + +## `deleteFromPrimitiveList` Directive + +### Purpose + +We have two patch strategies for lists of primitives: replace and merge. +Replace is the default patch strategy for list, which will replace the whole list on update and it will preserve the order; +while merge strategy works as an unordered set. We call a primitive list with merge strategy an unordered set. +The patch strategy is defined in the go struct tag of the API objects. + +`deleteFromPrimitiveList` directive indicates that the elements in this list should be deleted from the original primitive list. + +### Syntax + +It is used only as the prefix of the key in the patch. +``` +$deleteFromPrimitiveList/<keyOfPrimitiveList>: [a primitive list] +``` + +### Example + +##### List of Primitives (Unordered Set) + +`finalizers` uses `merge` as patch strategy. +```go +Finalizers []string `json:"finalizers,omitempty" patchStrategy:"merge" protobuf:"bytes,14,rep,name=finalizers"` +``` + +Suppose we have defined a `finalizers` and we call it the original finalizers: + +```yaml +finalizers: + - a + - b + - c +``` + +To delete items "b" and "c" from the original finalizers, the patch will be: + +```yaml +# The directive includes the prefix $deleteFromPrimitiveList and +# followed by a '/' and the name of the list. +# The values in this list will be deleted after applying the patch. +$deleteFromPrimitiveList/finalizers: + - b + - c +``` + +After applying the patch on the original finalizers, it will become: + +```yaml +finalizers: + - a +``` + +Note: When merging two set, the primitives are first deduplicated and then merged. +In an erroneous case, the set may be created with duplicates. Deleting an +item that has duplicates will delete all matching items. + +## `setElementOrder` Directive + +### Purpose + +`setElementOrder` directive provides a way to specify the order of a list. +The relative order specified in this directive will be retained. +Please refer to [proposal](/contributors/design-proposals/cli/preserve-order-in-strategic-merge-patch.md) for more information. + +### Syntax + +It is used only as the prefix of the key in the patch. +``` +$setElementOrder/<keyOfList>: [a list] +``` + +### Example + +#### List of Primitives + +Suppose we have a list of `finalizers`: +```yaml +finalizers: + - a + - b + - c +``` + +To reorder the elements order in the list, we can send a patch: +```yaml +# The directive includes the prefix $setElementOrder and +# followed by a '/' and the name of the list. +$setElementOrder/finalizers: + - b + - c + - a +``` + +After applying the patch, it will be: +```yaml +finalizers: + - b + - c + - a +``` + +#### List of Maps + +Suppose we have a list of `containers` whose `mergeKey` is `name`: +```yaml +containers: + - name: a + ... + - name: b + ... + - name: c + ... +``` + +To reorder the elements order in the list, we can send a patch: +```yaml +# each map in the list should only include the mergeKey +$setElementOrder/containers: + - name: b + - name: c + - name: a +``` + +After applying the patch, it will be: +```yaml +containers: + - name: b + ... + - name: c + ... + - name: a + ... +``` + + +## `retainKeys` Directive + +### Purpose + +`retainKeys` directive provides a mechanism for union types to clear mutual exclusive fields. +When this directive is present in the patch, all the fields not in this directive will be cleared. +Please refer to [proposal](/contributors/design-proposals/api-machinery/add-new-patchStrategy-to-clear-fields-not-present-in-patch.md) for more information. + +### Syntax + +``` +$retainKeys: [a list of field keys] +``` + +### Example + +#### Map + +Suppose we have a union type: +``` +union: + foo: a + other: b +``` + +And we have a patch: +``` +union: + retainKeys: + - another + - bar + another: d + bar: c +``` + +After applying this patch, we get: +``` +union: + # Field foo and other have been cleared w/o explicitly set them to null. + another: d + bar: c +``` + +# Changing patch format + +As issues and limitations have been discovered with the strategic merge +patch implementation, it has been necessary to change the patch format +to support additional semantics - such as merging lists of +primitives and defining order when merging lists. + +## Requirements for any changes to the patch format + +**Note:** Changes to the strategic merge patch must be backwards compatible such +that patch requests valid in previous versions continue to be valid. +That is, old patch formats sent by old clients to new servers with +must continue to function correctly. + +Previously valid patch requests do not need to keep the exact same +behavior, but do need to behave correctly. + +**Example:** if a patch request previously randomized the order of elements +in a list and we want to provide a deterministic order, we must continue +to support old patch format but we can make the ordering deterministic +for the old format. + +### Client version skew + +Because the server does not publish which patch versions it supports, +and it silently ignores patch directives that it does not recognize, +new patches should behave correctly when sent to old servers that +may not support all of the patch directives. + +While the patch API must be backwards compatible, it must also +be forward compatible for 1 version. This is needed because `kubectl` must +support talking to older and newer server versions without knowing what +parts of patch are supported on each, and generate patches that work correctly on both. + +## Strategies for introducing new patch behavior + +#### 1. Add optional semantic meaning to the existing patch format. + +**Note:** Must not require new data or elements to be present that was not required before. Meaning must not break old interpretation of old patches. + +**Good Example:** + +Old format + - ordering of elements in patch had no meaning and the final ordering was arbitrary + +New format + - ordering of elements in patch has meaning and the final ordering is deterministic based on the ordering in the patch + +**Bad Example:** + +Old format + - fields not present in a patch for Kind foo are ignored + - unmodified fields for Kind foo are optional in patch request + +New format + - fields not present in a patch for Kind foo are cleared + - unmodified fields for Kind foo are required in patch request + +This example won't work, because old patch formats will contain data that is now +considered required. To support this, introduce a new directive to guard the +new patch format. + +#### 2. Add support for new directives in the patch format + +- Optional directives may be introduced to change how the patch is applied by the server - **backwards compatible** (old patch against newer server). + - May control how the patch is applied + - May contain patch information - such as elements to delete from a list + - Must NOT impose new requirements on the old patch format + +- New patch requests should be a superset of old patch requests - **forwards compatible** (newer patch against older server) + - *Old servers will ignore directives they do not recognize* + - Must include the full patch that would have been sent before the new directives were added. + - Must NOT rely on the directive being supported by the server + +**Good Example:** + +Old format + - fields not present in a patch for Kind foo are ignored + - unmodified fields for Kind foo are optional in patch request + +New format *without* directive + - Same as old + +New format *with* directive + - fields not present in a patch for Kind foo are cleared + - unmodified fields for Kind foo are required in patch request + +In this example, the behavior was unchanged when the directive was missing, +retaining the old behavior for old patch requests. + +**Bad Example:** + +Old format + - fields not present in a patch for Kind foo are ignored + - unmodified fields for Kind foo are optional in patch request + +New format *with* directive + - Same as old + +New format *without* directive + - fields not present in a patch for Kind foo are cleared + - unmodified fields for Kind foo are required in patch request + +In this example, the behavior was changed when the directive was missing, +breaking compatibility. + +## Alternatives + +The previous strategy is necessary because there is no notion of +patch versions. Having the client negotiate the patch version +with the server would allow changing the patch format, but at +the cost of supporting multiple patch formats in the server and client. +Using client provided directives to evolve how a patch is merged +provides some limited support for multiple versions. + diff --git a/contributors/devel/sig-architecture/OWNERS b/contributors/devel/sig-architecture/OWNERS new file mode 100644 index 00000000..3baa861d --- /dev/null +++ b/contributors/devel/sig-architecture/OWNERS @@ -0,0 +1,10 @@ +# See the OWNERS docs at https://go.k8s.io/owners + +reviewers: + - sig-architecture-leads + - jbeda +approvers: + - sig-architecture-leads + - jbeda +labels: + - sig/architecture diff --git a/contributors/devel/sig-architecture/api-conventions.md b/contributors/devel/sig-architecture/api-conventions.md new file mode 100644 index 00000000..2e0bd7ad --- /dev/null +++ b/contributors/devel/sig-architecture/api-conventions.md @@ -0,0 +1,1367 @@ +API Conventions +=============== + +Updated: 3/7/2017 + +*This document is oriented at users who want a deeper understanding of the +Kubernetes API structure, and developers wanting to extend the Kubernetes API. +An introduction to using resources with kubectl can be found in [the object management overview](https://kubernetes.io/docs/tutorials/object-management-kubectl/object-management/).* + +**Table of Contents** + + + - [Types (Kinds)](#types-kinds) + - [Resources](#resources) + - [Objects](#objects) + - [Metadata](#metadata) + - [Spec and Status](#spec-and-status) + - [Typical status properties](#typical-status-properties) + - [References to related objects](#references-to-related-objects) + - [Lists of named subobjects preferred over maps](#lists-of-named-subobjects-preferred-over-maps) + - [Primitive types](#primitive-types) + - [Constants](#constants) + - [Unions](#unions) + - [Lists and Simple kinds](#lists-and-simple-kinds) + - [Differing Representations](#differing-representations) + - [Verbs on Resources](#verbs-on-resources) + - [PATCH operations](#patch-operations) + - [Strategic Merge Patch](#strategic-merge-patch) + - [Idempotency](#idempotency) + - [Optional vs. Required](#optional-vs-required) + - [Defaulting](#defaulting) + - [Late Initialization](#late-initialization) + - [Concurrency Control and Consistency](#concurrency-control-and-consistency) + - [Serialization Format](#serialization-format) + - [Units](#units) + - [Selecting Fields](#selecting-fields) + - [Object references](#object-references) + - [HTTP Status codes](#http-status-codes) + - [Success codes](#success-codes) + - [Error codes](#error-codes) + - [Response Status Kind](#response-status-kind) + - [Events](#events) + - [Naming conventions](#naming-conventions) + - [Label, selector, and annotation conventions](#label-selector-and-annotation-conventions) + - [WebSockets and SPDY](#websockets-and-spdy) + - [Validation](#validation) + + +The conventions of the [Kubernetes API](https://kubernetes.io/docs/api/) (and related APIs in the +ecosystem) are intended to ease client development and ensure that configuration +mechanisms can be implemented that work across a diverse set of use cases +consistently. + +The general style of the Kubernetes API is RESTful - clients create, update, +delete, or retrieve a description of an object via the standard HTTP verbs +(POST, PUT, DELETE, and GET) - and those APIs preferentially accept and return +JSON. Kubernetes also exposes additional endpoints for non-standard verbs and +allows alternative content types. All of the JSON accepted and returned by the +server has a schema, identified by the "kind" and "apiVersion" fields. Where +relevant HTTP header fields exist, they should mirror the content of JSON +fields, but the information should not be represented only in the HTTP header. + +The following terms are defined: + +* **Kind** the name of a particular object schema (e.g. the "Cat" and "Dog" +kinds would have different attributes and properties) +* **Resource** a representation of a system entity, sent or retrieved as JSON +via HTTP to the server. Resources are exposed via: + * Collections - a list of resources of the same type, which may be queryable + * Elements - an individual resource, addressable via a URL +* **API Group** a set of resources that are exposed together. Along +with the version is exposed in the "apiVersion" field as "GROUP/VERSION", e.g. +"policy.k8s.io/v1". + +Each resource typically accepts and returns data of a single kind. A kind may be +accepted or returned by multiple resources that reflect specific use cases. For +instance, the kind "Pod" is exposed as a "pods" resource that allows end users +to create, update, and delete pods, while a separate "pod status" resource (that +acts on "Pod" kind) allows automated processes to update a subset of the fields +in that resource. + +Resources are bound together in API groups - each group may have one or more +versions that evolve independent of other API groups, and each version within +the group has one or more resources. Group names are typically in domain name +form - the Kubernetes project reserves use of the empty group, all single +word names ("extensions", "apps"), and any group name ending in "*.k8s.io" for +its sole use. When choosing a group name, we recommend selecting a subdomain +your group or organization owns, such as "widget.mycompany.com". + +Resource collections should be all lowercase and plural, whereas kinds are +CamelCase and singular. Group names must be lower case and be valid DNS +subdomains. + + +## Types (Kinds) + +Kinds are grouped into three categories: + +1. **Objects** represent a persistent entity in the system. + + Creating an API object is a record of intent - once created, the system will +work to ensure that resource exists. All API objects have common metadata. + + An object may have multiple resources that clients can use to perform +specific actions that create, update, delete, or get. + + Examples: `Pod`, `ReplicationController`, `Service`, `Namespace`, `Node`. + +2. **Lists** are collections of **resources** of one (usually) or more +(occasionally) kinds. + + The name of a list kind must end with "List". Lists have a limited set of +common metadata. All lists use the required "items" field to contain the array +of objects they return. Any kind that has the "items" field must be a list kind. + + Most objects defined in the system should have an endpoint that returns the +full set of resources, as well as zero or more endpoints that return subsets of +the full list. Some objects may be singletons (the current user, the system +defaults) and may not have lists. + + In addition, all lists that return objects with labels should support label +filtering (see [the labels documentation](https://kubernetes.io/docs/user-guide/labels/)), and most +lists should support filtering by fields. + + Examples: `PodLists`, `ServiceLists`, `NodeLists`. + + TODO: Describe field filtering below or in a separate doc. + +3. **Simple** kinds are used for specific actions on objects and for +non-persistent entities. + + Given their limited scope, they have the same set of limited common metadata +as lists. + + For instance, the "Status" kind is returned when errors occur and is not +persisted in the system. + + Many simple resources are "subresources", which are rooted at API paths of +specific resources. When resources wish to expose alternative actions or views +that are closely coupled to a single resource, they should do so using new +sub-resources. Common subresources include: + + * `/binding`: Used to bind a resource representing a user request (e.g., Pod, +PersistentVolumeClaim) to a cluster infrastructure resource (e.g., Node, +PersistentVolume). + * `/status`: Used to write just the status portion of a resource. For +example, the `/pods` endpoint only allows updates to `metadata` and `spec`, +since those reflect end-user intent. An automated process should be able to +modify status for users to see by sending an updated Pod kind to the server to +the "/pods/<name>/status" endpoint - the alternate endpoint allows +different rules to be applied to the update, and access to be appropriately +restricted. + * `/scale`: Used to read and write the count of a resource in a manner that +is independent of the specific resource schema. + + Two additional subresources, `proxy` and `portforward`, provide access to +cluster resources as described in +[accessing the cluster](https://kubernetes.io/docs/user-guide/accessing-the-cluster/). + +The standard REST verbs (defined below) MUST return singular JSON objects. Some +API endpoints may deviate from the strict REST pattern and return resources that +are not singular JSON objects, such as streams of JSON objects or unstructured +text log data. + +A common set of "meta" API objects are used across all API groups and are +thus considered part of the server group named `meta.k8s.io`. These types may +evolve independent of the API group that uses them and API servers may allow +them to be addressed in their generic form. Examples are `ListOptions`, +`DeleteOptions`, `List`, `Status`, `WatchEvent`, and `Scale`. For historical +reasons these types are part of each existing API group. Generic tools like +quota, garbage collection, autoscalers, and generic clients like kubectl +leverage these types to define consistent behavior across different resource +types, like the interfaces in programming languages. + +The term "kind" is reserved for these "top-level" API types. The term "type" +should be used for distinguishing sub-categories within objects or subobjects. + +### Resources + +All JSON objects returned by an API MUST have the following fields: + +* kind: a string that identifies the schema this object should have +* apiVersion: a string that identifies the version of the schema the object +should have + +These fields are required for proper decoding of the object. They may be +populated by the server by default from the specified URL path, but the client +likely needs to know the values in order to construct the URL path. + +### Objects + +#### Metadata + +Every object kind MUST have the following metadata in a nested object field +called "metadata": + +* namespace: a namespace is a DNS compatible label that objects are subdivided +into. The default namespace is 'default'. See +[the namespace docs](https://kubernetes.io/docs/user-guide/namespaces/) for more. +* name: a string that uniquely identifies this object within the current +namespace (see [the identifiers docs](https://kubernetes.io/docs/user-guide/identifiers/)). +This value is used in the path when retrieving an individual object. +* uid: a unique in time and space value (typically an RFC 4122 generated +identifier, see [the identifiers docs](https://kubernetes.io/docs/user-guide/identifiers/)) +used to distinguish between objects with the same name that have been deleted +and recreated + +Every object SHOULD have the following metadata in a nested object field called +"metadata": + +* resourceVersion: a string that identifies the internal version of this object +that can be used by clients to determine when objects have changed. This value +MUST be treated as opaque by clients and passed unmodified back to the server. +Clients should not assume that the resource version has meaning across +namespaces, different kinds of resources, or different servers. (See +[concurrency control](#concurrency-control-and-consistency), below, for more +details.) +* generation: a sequence number representing a specific generation of the +desired state. Set by the system and monotonically increasing, per-resource. May +be compared, such as for RAW and WAW consistency. +* creationTimestamp: a string representing an RFC 3339 date of the date and time +an object was created +* deletionTimestamp: a string representing an RFC 3339 date of the date and time +after which this resource will be deleted. This field is set by the server when +a graceful deletion is requested by the user, and is not directly settable by a +client. The resource will be deleted (no longer visible from resource lists, and +not reachable by name) after the time in this field except when the object has +a finalizer set. In case the finalizer is set the deletion of the object is +postponed at least until the finalizer is removed. +Once the deletionTimestamp is set, this value may not be unset or be set further +into the future, although it may be shortened or the resource may be deleted +prior to this time. +* labels: a map of string keys and values that can be used to organize and +categorize objects (see [the labels docs](https://kubernetes.io/docs/user-guide/labels/)) +* annotations: a map of string keys and values that can be used by external +tooling to store and retrieve arbitrary metadata about this object (see +[the annotations docs](https://kubernetes.io/docs/user-guide/annotations/)) + +Labels are intended for organizational purposes by end users (select the pods +that match this label query). Annotations enable third-party automation and +tooling to decorate objects with additional metadata for their own use. + +#### Spec and Status + +By convention, the Kubernetes API makes a distinction between the specification +of the desired state of an object (a nested object field called "spec") and the +status of the object at the current time (a nested object field called +"status"). The specification is a complete description of the desired state, +including configuration settings provided by the user, +[default values](#defaulting) expanded by the system, and properties initialized +or otherwise changed after creation by other ecosystem components (e.g., +schedulers, auto-scalers), and is persisted in stable storage with the API +object. If the specification is deleted, the object will be purged from the +system. The status summarizes the current state of the object in the system, and +is usually persisted with the object by an automated processes but may be +generated on the fly. At some cost and perhaps some temporary degradation in +behavior, the status could be reconstructed by observation if it were lost. + +When a new version of an object is POSTed or PUT, the "spec" is updated and +available immediately. Over time the system will work to bring the "status" into +line with the "spec". The system will drive toward the most recent "spec" +regardless of previous versions of that stanza. In other words, if a value is +changed from 2 to 5 in one PUT and then back down to 3 in another PUT the system +is not required to 'touch base' at 5 before changing the "status" to 3. In other +words, the system's behavior is *level-based* rather than *edge-based*. This +enables robust behavior in the presence of missed intermediate state changes. + +The Kubernetes API also serves as the foundation for the declarative +configuration schema for the system. In order to facilitate level-based +operation and expression of declarative configuration, fields in the +specification should have declarative rather than imperative names and +semantics -- they represent the desired state, not actions intended to yield the +desired state. + +The PUT and POST verbs on objects MUST ignore the "status" values, to avoid +accidentally overwriting the status in read-modify-write scenarios. A `/status` +subresource MUST be provided to enable system components to update statuses of +resources they manage. + +Otherwise, PUT expects the whole object to be specified. Therefore, if a field +is omitted it is assumed that the client wants to clear that field's value. The +PUT verb does not accept partial updates. Modification of just part of an object +may be achieved by GETting the resource, modifying part of the spec, labels, or +annotations, and then PUTting it back. See +[concurrency control](#concurrency-control-and-consistency), below, regarding +read-modify-write consistency when using this pattern. Some objects may expose +alternative resource representations that allow mutation of the status, or +performing custom actions on the object. + +All objects that represent a physical resource whose state may vary from the +user's desired intent SHOULD have a "spec" and a "status". Objects whose state +cannot vary from the user's desired intent MAY have only "spec", and MAY rename +"spec" to a more appropriate name. + +Objects that contain both spec and status should not contain additional +top-level fields other than the standard metadata fields. + +Some objects which are not persisted in the system - such as `SubjectAccessReview` +and other webhook style calls - may choose to add spec and status to encapsulate +a "call and response" pattern. The spec is the request (often a request for +information) and the status is the response. For these RPC like objects the only +operation may be POST, but having a consistent schema between submission and +response reduces the complexity of these clients. + + +##### Typical status properties + +**Conditions** represent the latest available observations of an object's +state. They are an extension mechanism intended to be used when the details of +an observation are not a priori known or would not apply to all instances of a +given Kind. For observations that are well known and apply to all instances, a +regular field is preferred. An example of a Condition that probably should +have been a regular field is Pod's "Ready" condition - it is managed by core +controllers, it is well understood, and it applies to all Pods. + +Objects may report multiple conditions, and new types of conditions may be +added in the future or by 3rd party controllers. Therefore, conditions are +represented using a list/slice, where all have similar structure. + +The `FooCondition` type for some resource type `Foo` may include a subset of the +following fields, but must contain at least `type` and `status` fields: + +```go + Type FooConditionType `json:"type" description:"type of Foo condition"` + Status ConditionStatus `json:"status" description:"status of the condition, one of True, False, Unknown"` + + // +optional + Reason *string `json:"reason,omitempty" description:"one-word CamelCase reason for the condition's last transition"` + // +optional + Message *string `json:"message,omitempty" description:"human-readable message indicating details about last transition"` + + // +optional + LastHeartbeatTime *unversioned.Time `json:"lastHeartbeatTime,omitempty" description:"last time we got an update on a given condition"` + // +optional + LastTransitionTime *unversioned.Time `json:"lastTransitionTime,omitempty" description:"last time the condition transit from one status to another"` +``` + +Additional fields may be added in the future. + +Do not use fields that you don't need - simpler is better. + +Use of the `Reason` field is encouraged. + +Use the `LastHeartbeatTime` with great caution - frequent changes to this field +can cause a large fan-out effect for some resources. + +Conditions should be added to explicitly convey properties that users and +components care about rather than requiring those properties to be inferred from +other observations. Once defined, the meaning of a Condition can not be +changed arbitrarily - it becomes part of the API, and has the same backwards- +and forwards-compatibility concerns of any other part of the API. + +Condition status values may be `True`, `False`, or `Unknown`. The absence of a +condition should be interpreted the same as `Unknown`. How controllers handle +`Unknown` depends on the Condition in question. + +Condition types should indicate state in the "abnormal-true" polarity. For +example, if the condition indicates when a policy is invalid, the "is valid" +case is probably the norm, so the condition should be called "Invalid". + +The thinking around conditions has evolved over time, so there are several +non-normative examples in wide use. + +In general, condition values may change back and forth, but some condition +transitions may be monotonic, depending on the resource and condition type. +However, conditions are observations and not, themselves, state machines, nor do +we define comprehensive state machines for objects, nor behaviors associated +with state transitions. The system is level-based rather than edge-triggered, +and should assume an Open World. + +An example of an oscillating condition type is `Ready` (despite it running +afoul of current guidance), which indicates the object was believed to be fully +operational at the time it was last probed. A possible monotonic condition +could be `Failed`. A `True` status for `Failed` would imply failure with no +retry. An object that was still active would generally not have a `Failed` +condition. + +Some resources in the v1 API contain fields called **`phase`**, and associated +`message`, `reason`, and other status fields. The pattern of using `phase` is +deprecated. Newer API types should use conditions instead. Phase was +essentially a state-machine enumeration field, that contradicted [system-design +principles](../design-proposals/architecture/principles.md#control-logic) and +hampered evolution, since [adding new enum values breaks backward +compatibility](api_changes.md). Rather than encouraging clients to infer +implicit properties from phases, we prefer to explicitly expose the individual +conditions that clients need to monitor. Conditions also have the benefit that +it is possible to create some conditions with uniform meaning across all +resource types, while still exposing others that are unique to specific +resource types. See [#7856](http://issues.k8s.io/7856) for more details and +discussion. + +In condition types, and everywhere else they appear in the API, **`Reason`** is +intended to be a one-word, CamelCase representation of the category of cause of +the current status, and **`Message`** is intended to be a human-readable phrase +or sentence, which may contain specific details of the individual occurrence. +`Reason` is intended to be used in concise output, such as one-line +`kubectl get` output, and in summarizing occurrences of causes, whereas +`Message` is intended to be presented to users in detailed status explanations, +such as `kubectl describe` output. + +Historical information status (e.g., last transition time, failure counts) is +only provided with reasonable effort, and is not guaranteed to not be lost. + +Status information that may be large (especially proportional in size to +collections of other resources, such as lists of references to other objects -- +see below) and/or rapidly changing, such as +[resource usage](../design-proposals/scheduling/resources.md#usage-data), should be put into separate +objects, with possibly a reference from the original object. This helps to +ensure that GETs and watch remain reasonably efficient for the majority of +clients, which may not need that data. + +Some resources report the `observedGeneration`, which is the `generation` most +recently observed by the component responsible for acting upon changes to the +desired state of the resource. This can be used, for instance, to ensure that +the reported status reflects the most recent desired status. + +#### References to related objects + +References to loosely coupled sets of objects, such as +[pods](https://kubernetes.io/docs/user-guide/pods/) overseen by a +[replication controller](https://kubernetes.io/docs/user-guide/replication-controller/), are usually +best referred to using a [label selector](https://kubernetes.io/docs/user-guide/labels/). In order to +ensure that GETs of individual objects remain bounded in time and space, these +sets may be queried via separate API queries, but will not be expanded in the +referring object's status. + +References to specific objects, especially specific resource versions and/or +specific fields of those objects, are specified using the `ObjectReference` type +(or other types representing strict subsets of it). Unlike partial URLs, the +ObjectReference type facilitates flexible defaulting of fields from the +referring object or other contextual information. + +References in the status of the referee to the referrer may be permitted, when +the references are one-to-one and do not need to be frequently updated, +particularly in an edge-based manner. + +#### Lists of named subobjects preferred over maps + +Discussed in [#2004](http://issue.k8s.io/2004) and elsewhere. There are no maps +of subobjects in any API objects. Instead, the convention is to use a list of +subobjects containing name fields. + +For example: + +```yaml +ports: + - name: www + containerPort: 80 +``` + +vs. + +```yaml +ports: + www: + containerPort: 80 +``` + +This rule maintains the invariant that all JSON/YAML keys are fields in API +objects. The only exceptions are pure maps in the API (currently, labels, +selectors, annotations, data), as opposed to sets of subobjects. + +#### Primitive types + +* Avoid floating-point values as much as possible, and never use them in spec. + Floating-point values cannot be reliably round-tripped (encoded and + re-decoded) without changing, and have varying precision and representations + across languages and architectures. +* All numbers (e.g., uint32, int64) are converted to float64 by Javascript and + some other languages, so any field which is expected to exceed that either in + magnitude or in precision (specifically integer values > 53 bits) should be + serialized and accepted as strings. +* Do not use unsigned integers, due to inconsistent support across languages and + libraries. Just validate that the integer is non-negative if that's the case. +* Do not use enums. Use aliases for string instead (e.g., `NodeConditionType`). +* Look at similar fields in the API (e.g., ports, durations) and follow the + conventions of existing fields. +* All public integer fields MUST use the Go `(u)int32` or Go `(u)int64` types, + not `(u)int` (which is ambiguous depending on target platform). Internal + types may use `(u)int`. +* Think twice about `bool` fields. Many ideas start as boolean but eventually + trend towards a small set of mutually exclusive options. Plan for future + expansions by describing the policy options explicitly as a string type + alias (e.g. `TerminationMessagePolicy`). + +#### Constants + +Some fields will have a list of allowed values (enumerations). These values will +be strings, and they will be in CamelCase, with an initial uppercase letter. +Examples: `ClusterFirst`, `Pending`, `ClientIP`. + +#### Unions + +Sometimes, at most one of a set of fields can be set. For example, the +[volumes] field of a PodSpec has 17 different volume type-specific fields, such +as `nfs` and `iscsi`. All fields in the set should be +[Optional](#optional-vs-required). + +Sometimes, when a new type is created, the api designer may anticipate that a +union will be needed in the future, even if only one field is allowed initially. +In this case, be sure to make the field [Optional](#optional-vs-required) +optional. In the validation, you may still return an error if the sole field is +unset. Do not set a default value for that field. + +### Lists and Simple kinds + +Every list or simple kind SHOULD have the following metadata in a nested object +field called "metadata": + +* resourceVersion: a string that identifies the common version of the objects +returned by in a list. This value MUST be treated as opaque by clients and +passed unmodified back to the server. A resource version is only valid within a +single namespace on a single kind of resource. + +Every simple kind returned by the server, and any simple kind sent to the server +that must support idempotency or optimistic concurrency should return this +value. Since simple resources are often used as input alternate actions that +modify objects, the resource version of the simple resource should correspond to +the resource version of the object. + + +## Differing Representations + +An API may represent a single entity in different ways for different clients, or +transform an object after certain transitions in the system occur. In these +cases, one request object may have two representations available as different +resources, or different kinds. + +An example is a Service, which represents the intent of the user to group a set +of pods with common behavior on common ports. When Kubernetes detects a pod +matches the service selector, the IP address and port of the pod are added to an +Endpoints resource for that Service. The Endpoints resource exists only if the +Service exists, but exposes only the IPs and ports of the selected pods. The +full service is represented by two distinct resources - under the original +Service resource the user created, as well as in the Endpoints resource. + +As another example, a "pod status" resource may accept a PUT with the "pod" +kind, with different rules about what fields may be changed. + +Future versions of Kubernetes may allow alternative encodings of objects beyond +JSON. + + +## Verbs on Resources + +API resources should use the traditional REST pattern: + +* GET /<resourceNamePlural> - Retrieve a list of type +<resourceName>, e.g. GET /pods returns a list of Pods. +* POST /<resourceNamePlural> - Create a new resource from the JSON object +provided by the client. +* GET /<resourceNamePlural>/<name> - Retrieves a single resource +with the given name, e.g. GET /pods/first returns a Pod named 'first'. Should be +constant time, and the resource should be bounded in size. +* DELETE /<resourceNamePlural>/<name> - Delete the single resource +with the given name. DeleteOptions may specify gracePeriodSeconds, the optional +duration in seconds before the object should be deleted. Individual kinds may +declare fields which provide a default grace period, and different kinds may +have differing kind-wide default grace periods. A user provided grace period +overrides a default grace period, including the zero grace period ("now"). +* PUT /<resourceNamePlural>/<name> - Update or create the resource +with the given name with the JSON object provided by the client. +* PATCH /<resourceNamePlural>/<name> - Selectively modify the +specified fields of the resource. See more information [below](#patch-operations). +* GET /<resourceNamePlural>&watch=true - Receive a stream of JSON +objects corresponding to changes made to any resource of the given kind over +time. + +### PATCH operations + +The API supports three different PATCH operations, determined by their +corresponding Content-Type header: + +* JSON Patch, `Content-Type: application/json-patch+json` + * As defined in [RFC6902](https://tools.ietf.org/html/rfc6902), a JSON Patch is +a sequence of operations that are executed on the resource, e.g. `{"op": "add", +"path": "/a/b/c", "value": [ "foo", "bar" ]}`. For more details on how to use +JSON Patch, see the RFC. +* Merge Patch, `Content-Type: application/merge-patch+json` + * As defined in [RFC7386](https://tools.ietf.org/html/rfc7386), a Merge Patch +is essentially a partial representation of the resource. The submitted JSON is +"merged" with the current resource to create a new one, then the new one is +saved. For more details on how to use Merge Patch, see the RFC. +* Strategic Merge Patch, `Content-Type: application/strategic-merge-patch+json` + * Strategic Merge Patch is a custom implementation of Merge Patch. For a +detailed explanation of how it works and why it needed to be introduced, see +below. + +#### Strategic Merge Patch + +Details of Strategic Merge Patch are covered [here](strategic-merge-patch.md). + +## Idempotency + +All compatible Kubernetes APIs MUST support "name idempotency" and respond with +an HTTP status code 409 when a request is made to POST an object that has the +same name as an existing object in the system. See +[the identifiers docs](https://kubernetes.io/docs/user-guide/identifiers/) for details. + +Names generated by the system may be requested using `metadata.generateName`. +GenerateName indicates that the name should be made unique by the server prior +to persisting it. A non-empty value for the field indicates the name will be +made unique (and the name returned to the client will be different than the name +passed). The value of this field will be combined with a unique suffix on the +server if the Name field has not been provided. The provided value must be valid +within the rules for Name, and may be truncated by the length of the suffix +required to make the value unique on the server. If this field is specified, and +Name is not present, the server will NOT return a 409 if the generated name +exists - instead, it will either return 201 Created or 504 with Reason +`ServerTimeout` indicating a unique name could not be found in the time +allotted, and the client should retry (optionally after the time indicated in +the Retry-After header). + +## Optional vs. Required + +Fields must be either optional or required. + +Optional fields have the following properties: + +- They have the `+optional` comment tag in Go. +- They are a pointer type in the Go definition (e.g. `bool *awesomeFlag`) or +have a built-in `nil` value (e.g. maps and slices). +- The API server should allow POSTing and PUTing a resource with this field +unset. + +In most cases, optional fields should also have the `omitempty` struct tag (the +`omitempty` option specifies that the field should be omitted from the json +encoding if the field has an empty value). However, If you want to have +different logic for an optional field which is not provided vs. provided with +empty values, do not use `omitempty` (e.g. https://github.com/kubernetes/kubernetes/issues/34641). + +Note that for backward compatibility, any field that has the `omitempty` struct +tag will considered to be optional but this may change in future and having +the `+optional` comment tag is highly recommended. + +Required fields have the opposite properties, namely: + +- They do not have an `+optional` comment tag. +- They do not have an `omitempty` struct tag. +- They are not a pointer type in the Go definition (e.g. `bool otherFlag`). +- The API server should not allow POSTing or PUTing a resource with this field +unset. + +Using the `+optional` or the `omitempty` tag causes OpenAPI documentation to +reflect that the field is optional. + +Using a pointer allows distinguishing unset from the zero value for that type. +There are some cases where, in principle, a pointer is not needed for an +optional field since the zero value is forbidden, and thus implies unset. There +are examples of this in the codebase. However: + +- it can be difficult for implementors to anticipate all cases where an empty +value might need to be distinguished from a zero value +- structs are not omitted from encoder output even where omitempty is specified, +which is messy; +- having a pointer consistently imply optional is clearer for users of the Go +language client, and any other clients that use corresponding types + +Therefore, we ask that pointers always be used with optional fields that do not +have a built-in `nil` value. + + +## Defaulting + +Default resource values are API version-specific, and they are applied during +the conversion from API-versioned declarative configuration to internal objects +representing the desired state (`Spec`) of the resource. Subsequent GETs of the +resource will include the default values explicitly. + +Incorporating the default values into the `Spec` ensures that `Spec` depicts the +full desired state so that it is easier for the system to determine how to +achieve the state, and for the user to know what to anticipate. + +API version-specific default values are set by the API server. + +## Late Initialization + +Late initialization is when resource fields are set by a system controller +after an object is created/updated. + +For example, the scheduler sets the `pod.spec.nodeName` field after the pod is +created. + +Late-initializers should only make the following types of modifications: + - Setting previously unset fields + - Adding keys to maps + - Adding values to arrays which have mergeable semantics +(`patchStrategy:"merge"` attribute in the type definition). + +These conventions: + 1. allow a user (with sufficient privilege) to override any system-default + behaviors by setting the fields that would otherwise have been defaulted. + 1. enables updates from users to be merged with changes made during late +initialization, using strategic merge patch, as opposed to clobbering the +change. + 1. allow the component which does the late-initialization to use strategic +merge patch, which facilitates composition and concurrency of such components. + +Although the apiserver Admission Control stage acts prior to object creation, +Admission Control plugins should follow the Late Initialization conventions +too, to allow their implementation to be later moved to a 'controller', or to +client libraries. + +## Concurrency Control and Consistency + +Kubernetes leverages the concept of *resource versions* to achieve optimistic +concurrency. All Kubernetes resources have a "resourceVersion" field as part of +their metadata. This resourceVersion is a string that identifies the internal +version of an object that can be used by clients to determine when objects have +changed. When a record is about to be updated, it's version is checked against a +pre-saved value, and if it doesn't match, the update fails with a StatusConflict +(HTTP status code 409). + +The resourceVersion is changed by the server every time an object is modified. +If resourceVersion is included with the PUT operation the system will verify +that there have not been other successful mutations to the resource during a +read/modify/write cycle, by verifying that the current value of resourceVersion +matches the specified value. + +The resourceVersion is currently backed by [etcd's +modifiedIndex](https://coreos.com/etcd/docs/latest/v2/api.html). +However, it's important to note that the application should *not* rely on the +implementation details of the versioning system maintained by Kubernetes. We may +change the implementation of resourceVersion in the future, such as to change it +to a timestamp or per-object counter. + +The only way for a client to know the expected value of resourceVersion is to +have received it from the server in response to a prior operation, typically a +GET. This value MUST be treated as opaque by clients and passed unmodified back +to the server. Clients should not assume that the resource version has meaning +across namespaces, different kinds of resources, or different servers. +Currently, the value of resourceVersion is set to match etcd's sequencer. You +could think of it as a logical clock the API server can use to order requests. +However, we expect the implementation of resourceVersion to change in the +future, such as in the case we shard the state by kind and/or namespace, or port +to another storage system. + +In the case of a conflict, the correct client action at this point is to GET the +resource again, apply the changes afresh, and try submitting again. This +mechanism can be used to prevent races like the following: + +``` +Client #1 Client #2 +GET Foo GET Foo +Set Foo.Bar = "one" Set Foo.Baz = "two" +PUT Foo PUT Foo +``` + +When these sequences occur in parallel, either the change to Foo.Bar or the +change to Foo.Baz can be lost. + +On the other hand, when specifying the resourceVersion, one of the PUTs will +fail, since whichever write succeeds changes the resourceVersion for Foo. + +resourceVersion may be used as a precondition for other operations (e.g., GET, +DELETE) in the future, such as for read-after-write consistency in the presence +of caching. + +"Watch" operations specify resourceVersion using a query parameter. It is used +to specify the point at which to begin watching the specified resources. This +may be used to ensure that no mutations are missed between a GET of a resource +(or list of resources) and a subsequent Watch, even if the current version of +the resource is more recent. This is currently the main reason that list +operations (GET on a collection) return resourceVersion. + + +## Serialization Format + +APIs may return alternative representations of any resource in response to an +Accept header or under alternative endpoints, but the default serialization for +input and output of API responses MUST be JSON. + +A protobuf encoding is also accepted for built-in resources. As proto is not +self-describing, there is an envelope wrapper which describes the type of +the contents. + +All dates should be serialized as RFC3339 strings. + +## Units + +Units must either be explicit in the field name (e.g., `timeoutSeconds`), or +must be specified as part of the value (e.g., `resource.Quantity`). Which +approach is preferred is TBD, though currently we use the `fooSeconds` +convention for durations. + +Duration fields must be represented as integer fields with units being +part of the field name (e.g. `leaseDurationSeconds`). We don't use Duration +in the API since that would require clients to implement go-compatible parsing. + +## Selecting Fields + +Some APIs may need to identify which field in a JSON object is invalid, or to +reference a value to extract from a separate resource. The current +recommendation is to use standard JavaScript syntax for accessing that field, +assuming the JSON object was transformed into a JavaScript object, without the +leading dot, such as `metadata.name`. + +Examples: + +* Find the field "current" in the object "state" in the second item in the array +"fields": `fields[1].state.current` + +## Object references + +Object references should either be called `fooName` if referring to an object of +kind `Foo` by just the name (within the current namespace, if a namespaced +resource), or should be called `fooRef`, and should contain a subset of the +fields of the `ObjectReference` type. + + +TODO: Plugins, extensions, nested kinds, headers + + +## HTTP Status codes + +The server will respond with HTTP status codes that match the HTTP spec. See the +section below for a breakdown of the types of status codes the server will send. + +The following HTTP status codes may be returned by the API. + +#### Success codes + +* `200 StatusOK` + * Indicates that the request completed successfully. +* `201 StatusCreated` + * Indicates that the request to create kind completed successfully. +* `204 StatusNoContent` + * Indicates that the request completed successfully, and the response contains +no body. + * Returned in response to HTTP OPTIONS requests. + +#### Error codes + +* `307 StatusTemporaryRedirect` + * Indicates that the address for the requested resource has changed. + * Suggested client recovery behavior: + * Follow the redirect. + + +* `400 StatusBadRequest` + * Indicates the requested is invalid. + * Suggested client recovery behavior: + * Do not retry. Fix the request. + + +* `401 StatusUnauthorized` + * Indicates that the server can be reached and understood the request, but +refuses to take any further action, because the client must provide +authorization. If the client has provided authorization, the server is +indicating the provided authorization is unsuitable or invalid. + * Suggested client recovery behavior: + * If the user has not supplied authorization information, prompt them for +the appropriate credentials. If the user has supplied authorization information, +inform them their credentials were rejected and optionally prompt them again. + + +* `403 StatusForbidden` + * Indicates that the server can be reached and understood the request, but +refuses to take any further action, because it is configured to deny access for +some reason to the requested resource by the client. + * Suggested client recovery behavior: + * Do not retry. Fix the request. + + +* `404 StatusNotFound` + * Indicates that the requested resource does not exist. + * Suggested client recovery behavior: + * Do not retry. Fix the request. + + +* `405 StatusMethodNotAllowed` + * Indicates that the action the client attempted to perform on the resource +was not supported by the code. + * Suggested client recovery behavior: + * Do not retry. Fix the request. + + +* `409 StatusConflict` + * Indicates that either the resource the client attempted to create already +exists or the requested update operation cannot be completed due to a conflict. + * Suggested client recovery behavior: + * * If creating a new resource: + * * Either change the identifier and try again, or GET and compare the +fields in the pre-existing object and issue a PUT/update to modify the existing +object. + * * If updating an existing resource: + * See `Conflict` from the `status` response section below on how to +retrieve more information about the nature of the conflict. + * GET and compare the fields in the pre-existing object, merge changes (if +still valid according to preconditions), and retry with the updated request +(including `ResourceVersion`). + + +* `410 StatusGone` + * Indicates that the item is no longer available at the server and no +forwarding address is known. + * Suggested client recovery behavior: + * Do not retry. Fix the request. + + +* `422 StatusUnprocessableEntity` + * Indicates that the requested create or update operation cannot be completed +due to invalid data provided as part of the request. + * Suggested client recovery behavior: + * Do not retry. Fix the request. + + +* `429 StatusTooManyRequests` + * Indicates that the either the client rate limit has been exceeded or the +server has received more requests then it can process. + * Suggested client recovery behavior: + * Read the `Retry-After` HTTP header from the response, and wait at least +that long before retrying. + + +* `500 StatusInternalServerError` + * Indicates that the server can be reached and understood the request, but +either an unexpected internal error occurred and the outcome of the call is +unknown, or the server cannot complete the action in a reasonable time (this may +be due to temporary server load or a transient communication issue with another +server). + * Suggested client recovery behavior: + * Retry with exponential backoff. + + +* `503 StatusServiceUnavailable` + * Indicates that required service is unavailable. + * Suggested client recovery behavior: + * Retry with exponential backoff. + + +* `504 StatusServerTimeout` + * Indicates that the request could not be completed within the given time. +Clients can get this response ONLY when they specified a timeout param in the +request. + * Suggested client recovery behavior: + * Increase the value of the timeout param and retry with exponential +backoff. + +## Response Status Kind + +Kubernetes will always return the `Status` kind from any API endpoint when an +error occurs. Clients SHOULD handle these types of objects when appropriate. + +A `Status` kind will be returned by the API in two cases: + * When an operation is not successful (i.e. when the server would return a non +2xx HTTP status code). + * When a HTTP `DELETE` call is successful. + +The status object is encoded as JSON and provided as the body of the response. +The status object contains fields for humans and machine consumers of the API to +get more detailed information for the cause of the failure. The information in +the status object supplements, but does not override, the HTTP status code's +meaning. When fields in the status object have the same meaning as generally +defined HTTP headers and that header is returned with the response, the header +should be considered as having higher priority. + +**Example:** + +```console +$ curl -v -k -H "Authorization: Bearer WhCDvq4VPpYhrcfmF6ei7V9qlbqTubUc" https://10.240.122.184:443/api/v1/namespaces/default/pods/grafana + +> GET /api/v1/namespaces/default/pods/grafana HTTP/1.1 +> User-Agent: curl/7.26.0 +> Host: 10.240.122.184 +> Accept: */* +> Authorization: Bearer WhCDvq4VPpYhrcfmF6ei7V9qlbqTubUc +> + +< HTTP/1.1 404 Not Found +< Content-Type: application/json +< Date: Wed, 20 May 2015 18:10:42 GMT +< Content-Length: 232 +< +{ + "kind": "Status", + "apiVersion": "v1", + "metadata": {}, + "status": "Failure", + "message": "pods \"grafana\" not found", + "reason": "NotFound", + "details": { + "name": "grafana", + "kind": "pods" + }, + "code": 404 +} +``` + +`status` field contains one of two possible values: +* `Success` +* `Failure` + +`message` may contain human-readable description of the error + +`reason` may contain a machine-readable, one-word, CamelCase description of why +this operation is in the `Failure` status. If this value is empty there is no +information available. The `reason` clarifies an HTTP status code but does not +override it. + +`details` may contain extended data associated with the reason. Each reason may +define its own extended details. This field is optional and the data returned is +not guaranteed to conform to any schema except that defined by the reason type. + +Possible values for the `reason` and `details` fields: +* `BadRequest` + * Indicates that the request itself was invalid, because the request doesn't +make any sense, for example deleting a read-only object. + * This is different than `status reason` `Invalid` above which indicates that +the API call could possibly succeed, but the data was invalid. + * API calls that return BadRequest can never succeed. + * Http status code: `400 StatusBadRequest` + + +* `Unauthorized` + * Indicates that the server can be reached and understood the request, but +refuses to take any further action without the client providing appropriate +authorization. If the client has provided authorization, this error indicates +the provided credentials are insufficient or invalid. + * Details (optional): + * `kind string` + * The kind attribute of the unauthorized resource (on some operations may +differ from the requested resource). + * `name string` + * The identifier of the unauthorized resource. + * HTTP status code: `401 StatusUnauthorized` + + +* `Forbidden` + * Indicates that the server can be reached and understood the request, but +refuses to take any further action, because it is configured to deny access for +some reason to the requested resource by the client. + * Details (optional): + * `kind string` + * The kind attribute of the forbidden resource (on some operations may +differ from the requested resource). + * `name string` + * The identifier of the forbidden resource. + * HTTP status code: `403 StatusForbidden` + + +* `NotFound` + * Indicates that one or more resources required for this operation could not +be found. + * Details (optional): + * `kind string` + * The kind attribute of the missing resource (on some operations may +differ from the requested resource). + * `name string` + * The identifier of the missing resource. + * HTTP status code: `404 StatusNotFound` + + +* `AlreadyExists` + * Indicates that the resource you are creating already exists. + * Details (optional): + * `kind string` + * The kind attribute of the conflicting resource. + * `name string` + * The identifier of the conflicting resource. + * HTTP status code: `409 StatusConflict` + +* `Conflict` + * Indicates that the requested update operation cannot be completed due to a +conflict. The client may need to alter the request. Each resource may define +custom details that indicate the nature of the conflict. + * HTTP status code: `409 StatusConflict` + + +* `Invalid` + * Indicates that the requested create or update operation cannot be completed +due to invalid data provided as part of the request. + * Details (optional): + * `kind string` + * the kind attribute of the invalid resource + * `name string` + * the identifier of the invalid resource + * `causes` + * One or more `StatusCause` entries indicating the data in the provided +resource that was invalid. The `reason`, `message`, and `field` attributes will +be set. + * HTTP status code: `422 StatusUnprocessableEntity` + + +* `Timeout` + * Indicates that the request could not be completed within the given time. +Clients may receive this response if the server has decided to rate limit the +client, or if the server is overloaded and cannot process the request at this +time. + * Http status code: `429 TooManyRequests` + * The server should set the `Retry-After` HTTP header and return +`retryAfterSeconds` in the details field of the object. A value of `0` is the +default. + + +* `ServerTimeout` + * Indicates that the server can be reached and understood the request, but +cannot complete the action in a reasonable time. This maybe due to temporary +server load or a transient communication issue with another server. + * Details (optional): + * `kind string` + * The kind attribute of the resource being acted on. + * `name string` + * The operation that is being attempted. + * The server should set the `Retry-After` HTTP header and return +`retryAfterSeconds` in the details field of the object. A value of `0` is the +default. + * Http status code: `504 StatusServerTimeout` + + +* `MethodNotAllowed` + * Indicates that the action the client attempted to perform on the resource +was not supported by the code. + * For instance, attempting to delete a resource that can only be created. + * API calls that return MethodNotAllowed can never succeed. + * Http status code: `405 StatusMethodNotAllowed` + + +* `InternalError` + * Indicates that an internal error occurred, it is unexpected and the outcome +of the call is unknown. + * Details (optional): + * `causes` + * The original error. + * Http status code: `500 StatusInternalServerError` `code` may contain the suggested HTTP return code for this status. + + +## Events + +Events are complementary to status information, since they can provide some +historical information about status and occurrences in addition to current or +previous status. Generate events for situations users or administrators should +be alerted about. + +Choose a unique, specific, short, CamelCase reason for each event category. For +example, `FreeDiskSpaceInvalid` is a good event reason because it is likely to +refer to just one situation, but `Started` is not a good reason because it +doesn't sufficiently indicate what started, even when combined with other event +fields. + +`Error creating foo` or `Error creating foo %s` would be appropriate for an +event message, with the latter being preferable, since it is more informational. + +Accumulate repeated events in the client, especially for frequent events, to +reduce data volume, load on the system, and noise exposed to users. + +## Naming conventions + +* Go field names must be CamelCase. JSON field names must be camelCase. Other +than capitalization of the initial letter, the two should almost always match. +No underscores nor dashes in either. +* Field and resource names should be declarative, not imperative (DoSomething, +SomethingDoer, DoneBy, DoneAt). +* Use `Node` where referring to +the node resource in the context of the cluster. Use `Host` where referring to +properties of the individual physical/virtual system, such as `hostname`, +`hostPath`, `hostNetwork`, etc. +* `FooController` is a deprecated kind naming convention. Name the kind after +the thing being controlled instead (e.g., `Job` rather than `JobController`). +* The name of a field that specifies the time at which `something` occurs should +be called `somethingTime`. Do not use `stamp` (e.g., `creationTimestamp`). +* We use the `fooSeconds` convention for durations, as discussed in the [units +subsection](#units). + * `fooPeriodSeconds` is preferred for periodic intervals and other waiting +periods (e.g., over `fooIntervalSeconds`). + * `fooTimeoutSeconds` is preferred for inactivity/unresponsiveness deadlines. + * `fooDeadlineSeconds` is preferred for activity completion deadlines. +* Do not use abbreviations in the API, except where they are extremely commonly +used, such as "id", "args", or "stdin". +* Acronyms should similarly only be used when extremely commonly known. All +letters in the acronym should have the same case, using the appropriate case for +the situation. For example, at the beginning of a field name, the acronym should +be all lowercase, such as "httpGet". Where used as a constant, all letters +should be uppercase, such as "TCP" or "UDP". +* The name of a field referring to another resource of kind `Foo` by name should +be called `fooName`. The name of a field referring to another resource of kind +`Foo` by ObjectReference (or subset thereof) should be called `fooRef`. +* More generally, include the units and/or type in the field name if they could +be ambiguous and they are not specified by the value or value type. +* The name of a field expressing a boolean property called 'fooable' should be +called `Fooable`, not `IsFooable`. + +### Namespace Names +* The name of a namespace must be a +[DNS_LABEL](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/architecture/identifiers.md). +* The `kube-` prefix is reserved for Kubernetes system namespaces, e.g. `kube-system` and `kube-public`. +* See +[the namespace docs](https://kubernetes.io/docs/user-guide/namespaces/) for more information. + +## Label, selector, and annotation conventions + +Labels are the domain of users. They are intended to facilitate organization and +management of API resources using attributes that are meaningful to users, as +opposed to meaningful to the system. Think of them as user-created mp3 or email +inbox labels, as opposed to the directory structure used by a program to store +its data. The former enables the user to apply an arbitrary ontology, whereas +the latter is implementation-centric and inflexible. Users will use labels to +select resources to operate on, display label values in CLI/UI columns, etc. +Users should always retain full power and flexibility over the label schemas +they apply to labels in their namespaces. + +However, we should support conveniences for common cases by default. For +example, what we now do in ReplicationController is automatically set the RC's +selector and labels to the labels in the pod template by default, if they are +not already set. That ensures that the selector will match the template, and +that the RC can be managed using the same labels as the pods it creates. Note +that once we generalize selectors, it won't necessarily be possible to +unambiguously generate labels that match an arbitrary selector. + +If the user wants to apply additional labels to the pods that it doesn't select +upon, such as to facilitate adoption of pods or in the expectation that some +label values will change, they can set the selector to a subset of the pod +labels. Similarly, the RC's labels could be initialized to a subset of the pod +template's labels, or could include additional/different labels. + +For disciplined users managing resources within their own namespaces, it's not +that hard to consistently apply schemas that ensure uniqueness. One just needs +to ensure that at least one value of some label key in common differs compared +to all other comparable resources. We could/should provide a verification tool +to check that. However, development of conventions similar to the examples in +[Labels](https://kubernetes.io/docs/user-guide/labels/) make uniqueness straightforward. Furthermore, +relatively narrowly used namespaces (e.g., per environment, per application) can +be used to reduce the set of resources that could potentially cause overlap. + +In cases where users could be running misc. examples with inconsistent schemas, +or where tooling or components need to programmatically generate new objects to +be selected, there needs to be a straightforward way to generate unique label +sets. A simple way to ensure uniqueness of the set is to ensure uniqueness of a +single label value, such as by using a resource name, uid, resource hash, or +generation number. + +Problems with uids and hashes, however, include that they have no semantic +meaning to the user, are not memorable nor readily recognizable, and are not +predictable. Lack of predictability obstructs use cases such as creation of a +replication controller from a pod, such as people want to do when exploring the +system, bootstrapping a self-hosted cluster, or deletion and re-creation of a +new RC that adopts the pods of the previous one, such as to rename it. +Generation numbers are more predictable and much clearer, assuming there is a +logical sequence. Fortunately, for deployments that's the case. For jobs, use of +creation timestamps is common internally. Users should always be able to turn +off auto-generation, in order to permit some of the scenarios described above. +Note that auto-generated labels will also become one more field that needs to be +stripped out when cloning a resource, within a namespace, in a new namespace, in +a new cluster, etc., and will need to be ignored around when updating a resource +via patch or read-modify-write sequence. + +Inclusion of a system prefix in a label key is fairly hostile to UX. A prefix is +only necessary in the case that the user cannot choose the label key, in order +to avoid collisions with user-defined labels. However, I firmly believe that the +user should always be allowed to select the label keys to use on their +resources, so it should always be possible to override default label keys. + +Therefore, resources supporting auto-generation of unique labels should have a +`uniqueLabelKey` field, so that the user could specify the key if they wanted +to, but if unspecified, it could be set by default, such as to the resource +type, like job, deployment, or replicationController. The value would need to be +at least spatially unique, and perhaps temporally unique in the case of job. + +Annotations have very different intended usage from labels. They are +primarily generated and consumed by tooling and system extensions, or are used +by end-users to engage non-standard behavior of components. For example, an +annotation might be used to indicate that an instance of a resource expects +additional handling by non-kubernetes controllers. Annotations may carry +arbitrary payloads, including JSON documents. Like labels, annotation keys can +be prefixed with a governing domain (e.g. `example.com/key-name`). Unprefixed +keys (e.g. `key-name`) are reserved for end-users. Third-party components must +use prefixed keys. Key prefixes under the "kubernetes.io" and "k8s.io" domains +are reserved for use by the kubernetes project and must not be used by +third-parties. + +In early versions of Kubernetes, some in-development features represented new +API fields as annotations, generally with the form `something.alpha.kubernetes.io/name` or +`something.beta.kubernetes.io/name` (depending on our confidence in it). This +pattern is deprecated. Some such annotations may still exist, but no new +annotations may be defined. New API fields are now developed as regular fields. + +Other advice regarding use of labels, annotations, taints, and other generic map keys by +Kubernetes components and tools: + - Key names should be all lowercase, with words separated by dashes instead of camelCase + - For instance, prefer `foo.kubernetes.io/foo-bar` over `foo.kubernetes.io/fooBar`, prefer + `desired-replicas` over `DesiredReplicas` + - Unprefixed keys are reserved for end-users. All other labels and annotations must be prefixed. + - Key prefixes under "kubernetes.io" and "k8s.io" are reserved for the Kubernetes + project. + - Such keys are effectively part of the kubernetes API and may be subject + to deprecation and compatibility policies. + - Key names, including prefixes, should be precise enough that a user could + plausibly understand where it came from and what it is for. + - Key prefixes should carry as much context as possible. + - For instance, prefer `subsystem.kubernetes.io/parameter` over `kubernetes.io/subsystem-parameter` + - Use annotations to store API extensions that the controller responsible for +the resource doesn't need to know about, experimental fields that aren't +intended to be generally used API fields, etc. Beware that annotations aren't +automatically handled by the API conversion machinery. + +## WebSockets and SPDY + +Some of the API operations exposed by Kubernetes involve transfer of binary +streams between the client and a container, including attach, exec, portforward, +and logging. The API therefore exposes certain operations over upgradeable HTTP +connections ([described in RFC 2817](https://tools.ietf.org/html/rfc2817)) via +the WebSocket and SPDY protocols. These actions are exposed as subresources with +their associated verbs (exec, log, attach, and portforward) and are requested +via a GET (to support JavaScript in a browser) and POST (semantically accurate). + +There are two primary protocols in use today: + +1. Streamed channels + + When dealing with multiple independent binary streams of data such as the +remote execution of a shell command (writing to STDIN, reading from STDOUT and +STDERR) or forwarding multiple ports the streams can be multiplexed onto a +single TCP connection. Kubernetes supports a SPDY based framing protocol that +leverages SPDY channels and a WebSocket framing protocol that multiplexes +multiple channels onto the same stream by prefixing each binary chunk with a +byte indicating its channel. The WebSocket protocol supports an optional +subprotocol that handles base64-encoded bytes from the client and returns +base64-encoded bytes from the server and character based channel prefixes ('0', +'1', '2') for ease of use from JavaScript in a browser. + +2. Streaming response + + The default log output for a channel of streaming data is an HTTP Chunked +Transfer-Encoding, which can return an arbitrary stream of binary data from the +server. Browser-based JavaScript is limited in its ability to access the raw +data from a chunked response, especially when very large amounts of logs are +returned, and in future API calls it may be desirable to transfer large files. +The streaming API endpoints support an optional WebSocket upgrade that provides +a unidirectional channel from the server to the client and chunks data as binary +WebSocket frames. An optional WebSocket subprotocol is exposed that base64 +encodes the stream before returning it to the client. + +Clients should use the SPDY protocols if their clients have native support, or +WebSockets as a fallback. Note that WebSockets is susceptible to Head-of-Line +blocking and so clients must read and process each message sequentially. In +the future, an HTTP/2 implementation will be exposed that deprecates SPDY. + + +## Validation + +API objects are validated upon receipt by the apiserver. Validation errors are +flagged and returned to the caller in a `Failure` status with `reason` set to +`Invalid`. In order to facilitate consistent error messages, we ask that +validation logic adheres to the following guidelines whenever possible (though +exceptional cases will exist). + +* Be as precise as possible. +* Telling users what they CAN do is more useful than telling them what they +CANNOT do. +* When asserting a requirement in the positive, use "must". Examples: "must be +greater than 0", "must match regex '[a-z]+'". Words like "should" imply that +the assertion is optional, and must be avoided. +* When asserting a formatting requirement in the negative, use "must not". +Example: "must not contain '..'". Words like "should not" imply that the +assertion is optional, and must be avoided. +* When asserting a behavioral requirement in the negative, use "may not". +Examples: "may not be specified when otherField is empty", "only `name` may be +specified". +* When referencing a literal string value, indicate the literal in +single-quotes. Example: "must not contain '..'". +* When referencing another field name, indicate the name in back-quotes. +Example: "must be greater than `request`". +* When specifying inequalities, use words rather than symbols. Examples: "must +be less than 256", "must be greater than or equal to 0". Do not use words +like "larger than", "bigger than", "more than", "higher than", etc. +* When specifying numeric ranges, use inclusive ranges when possible. + diff --git a/contributors/devel/sig-architecture/api_changes.md b/contributors/devel/sig-architecture/api_changes.md new file mode 100644 index 00000000..1f46d298 --- /dev/null +++ b/contributors/devel/sig-architecture/api_changes.md @@ -0,0 +1,1007 @@ +*This document is oriented at developers who want to change existing APIs. +A set of API conventions, which applies to new APIs and to changes, can be +found at [API Conventions](api-conventions.md). + +**Table of Contents** + +- [So you want to change the API?](#so-you-want-to-change-the-api) + - [Operational overview](#operational-overview) + - [On compatibility](#on-compatibility) + - [Backward compatibility gotchas](#backward-compatibility-gotchas) + - [Incompatible API changes](#incompatible-api-changes) + - [Changing versioned APIs](#changing-versioned-apis) + - [Edit types.go](#edit-typesgo) + - [Edit defaults.go](#edit-defaultsgo) + - [Edit conversion.go](#edit-conversiongo) + - [Changing the internal structures](#changing-the-internal-structures) + - [Edit types.go](#edit-typesgo-1) + - [Edit validation.go](#edit-validationgo) + - [Edit version conversions](#edit-version-conversions) + - [Generate protobuf objects](#generate-protobuf-objects) + - [Edit json (un)marshaling code](#edit-json-unmarshaling-code) + - [Making a new API Version](#making-a-new-api-version) + - [Making a new API Group](#making-a-new-api-group) + - [Update the fuzzer](#update-the-fuzzer) + - [Update the semantic comparisons](#update-the-semantic-comparisons) + - [Implement your change](#implement-your-change) + - [Write end-to-end tests](#write-end-to-end-tests) + - [Examples and docs](#examples-and-docs) + - [Alpha, Beta, and Stable Versions](#alpha-beta-and-stable-versions) + - [Adding Unstable Features to Stable Versions](#adding-unstable-features-to-stable-versions) + + +# So you want to change the API? + +Before attempting a change to the API, you should familiarize yourself with a +number of existing API types and with the [API conventions](api-conventions.md). +If creating a new API type/resource, we also recommend that you first send a PR +containing just a proposal for the new API types. + +The Kubernetes API has two major components - the internal structures and +the versioned APIs. The versioned APIs are intended to be stable, while the +internal structures are implemented to best reflect the needs of the Kubernetes +code itself. + +What this means for API changes is that you have to be somewhat thoughtful in +how you approach changes, and that you have to touch a number of pieces to make +a complete change. This document aims to guide you through the process, though +not all API changes will need all of these steps. + +## Operational overview + +It is important to have a high level understanding of the API system used in +Kubernetes in order to navigate the rest of this document. + +As mentioned above, the internal representation of an API object is decoupled +from any one API version. This provides a lot of freedom to evolve the code, +but it requires robust infrastructure to convert between representations. There +are multiple steps in processing an API operation - even something as simple as +a GET involves a great deal of machinery. + +The conversion process is logically a "star" with the internal form at the +center. Every versioned API can be converted to the internal form (and +vice-versa), but versioned APIs do not convert to other versioned APIs directly. +This sounds like a heavy process, but in reality we do not intend to keep more +than a small number of versions alive at once. While all of the Kubernetes code +operates on the internal structures, they are always converted to a versioned +form before being written to storage (disk or etcd) or being sent over a wire. +Clients should consume and operate on the versioned APIs exclusively. + +To demonstrate the general process, here is a (hypothetical) example: + + 1. A user POSTs a `Pod` object to `/api/v7beta1/...` + 2. The JSON is unmarshalled into a `v7beta1.Pod` structure + 3. Default values are applied to the `v7beta1.Pod` + 4. The `v7beta1.Pod` is converted to an `api.Pod` structure + 5. The `api.Pod` is validated, and any errors are returned to the user + 6. The `api.Pod` is converted to a `v6.Pod` (because v6 is the latest stable +version) + 7. The `v6.Pod` is marshalled into JSON and written to etcd + +Now that we have the `Pod` object stored, a user can GET that object in any +supported api version. For example: + + 1. A user GETs the `Pod` from `/api/v5/...` + 2. The JSON is read from etcd and unmarshalled into a `v6.Pod` structure + 3. Default values are applied to the `v6.Pod` + 4. The `v6.Pod` is converted to an `api.Pod` structure + 5. The `api.Pod` is converted to a `v5.Pod` structure + 6. The `v5.Pod` is marshalled into JSON and sent to the user + +The implication of this process is that API changes must be done carefully and +backward-compatibly. + +## On compatibility + +Before talking about how to make API changes, it is worthwhile to clarify what +we mean by API compatibility. Kubernetes considers forwards and backwards +compatibility of its APIs a top priority. Compatibility is *hard*, especially +handling issues around rollback-safety. This is something every API change +must consider. + +An API change is considered compatible if it: + + * adds new functionality that is not required for correct behavior (e.g., +does not add a new required field) + * does not change existing semantics, including: + * the semantic meaning of default values *and behavior* + * interpretation of existing API types, fields, and values + * which fields are required and which are not + * mutable fields do not become immutable + * valid values do not become invalid + * explicitly invalid values do not become valid + +Put another way: + +1. Any API call (e.g. a structure POSTed to a REST endpoint) that succeeded +before your change must succeed after your change. +2. Any API call that does not use your change must behave the same as it did +before your change. +3. Any API call that uses your change must not cause problems (e.g. crash or +degrade behavior) when issued against an API servers that do not include your +change. +4. It must be possible to round-trip your change (convert to different API +versions and back) with no loss of information. +5. Existing clients need not be aware of your change in order for them to +continue to function as they did previously, even when your change is in use. +6. It must be possible to rollback to a previous version of API server that +does not include your change and have no impact on API objects which do not use +your change. API objects that use your change will be impacted in case of a +rollback. + +If your change does not meet these criteria, it is not considered compatible, +and may break older clients, or result in newer clients causing undefined +behavior. Such changes are generally disallowed, though exceptions have been +made in extreme cases (e.g. security or obvious bugs). + +Let's consider some examples. + +In a hypothetical API (assume we're at version v6), the `Frobber` struct looks +something like this: + +```go +// API v6. +type Frobber struct { + Height int `json:"height"` + Param string `json:"param"` +} +``` + +You want to add a new `Width` field. It is generally allowed to add new fields +without changing the API version, so you can simply change it to: + +```go +// Still API v6. +type Frobber struct { + Height int `json:"height"` + Width int `json:"width"` + Param string `json:"param"` +} +``` + +The onus is on you to define a sane default value for `Width` such that rules +#1 and #2 above are true - API calls and stored objects that used to work must +continue to work. + +For your next change you want to allow multiple `Param` values. You can not +simply remove `Param string` and add `Params []string` (without creating a +whole new API version) - that fails rules #1, #2, #3, and #6. Nor can you +simply add `Params []string` and use it instead - that fails #2 and #6. + +You must instead define a new field and the relationship between that field and +the existing field(s). Start by adding the new plural field: + +```go +// Still API v6. +type Frobber struct { + Height int `json:"height"` + Width int `json:"width"` + Param string `json:"param"` // the first param + Params []string `json:"params"` // all of the params +} +``` + +This new field must be inclusive of the singular field. In order to satisfy +the compatibility rules you must handle all the cases of version skew, multiple +clients, and rollbacks. This can be handled by defaulting or admission control +logic linking the fields together with context from the API operation to get as +close as possible to the user's intentions. + +Upon any mutating API operation: + * If only the singular field is specified (e.g. an older client), API logic + must populate plural[0] from the singular value, and de-dup the plural + field. + * If only the plural field is specified (e.g. a newer client), API logic must + populate the singular value from plural[0]. + * If both the singular and plural fields are specified, API logic must + validate that the singular value matches plural[0]. + * Any other case is an error and must be rejected. + +For this purpose "is specified" means the following: + * On a create or patch operation: the field is present in the user-provided input + * On an update operation: the field is present and has changed from the + current value + +Older clients that only know the singular field will continue to succeed and +produce the same results as before the change. Newer clients can use your +change without impacting older clients. The API server can be rolled back and +only objects that use your change will be impacted. + +Part of the reason for versioning APIs and for using internal types that are +distinct from any one version is to handle growth like this. The internal +representation can be implemented as: + +```go +// Internal, soon to be v7beta1. +type Frobber struct { + Height int + Width int + Params []string +} +``` + +The code that converts to/from versioned APIs can decode this into the +compatible structure. Eventually, a new API version, e.g. v7beta1, +will be forked and it can drop the singular field entirely. + +We've seen how to satisfy rules #1, #2, and #3. Rule #4 means that you can not +extend one versioned API without also extending the others. For example, an +API call might POST an object in API v7beta1 format, which uses the cleaner +`Params` field, but the API server might store that object in trusty old v6 +form (since v7beta1 is "beta"). When the user reads the object back in the +v7beta1 API it would be unacceptable to have lost all but `Params[0]`. This +means that, even though it is ugly, a compatible change must be made to the v6 +API, as above. + +For some changes, this can be challenging to do correctly. It may require multiple +representations of the same information in the same API resource, which need to +be kept in sync should either be changed. + +For example, let's say you decide to rename a field within the same API +version. In this case, you add units to `height` and `width`. You implement +this by adding new fields: + +```go +type Frobber struct { + Height *int `json:"height"` + Width *int `json:"width"` + HeightInInches *int `json:"heightInInches"` + WidthInInches *int `json:"widthInInches"` +} +``` + +You convert all of the fields to pointers in order to distinguish between unset +and set to 0, and then set each corresponding field from the other in the +defaulting logic (e.g. `heightInInches` from `height`, and vice versa). That +works fine when the user creates a sends a hand-written configuration -- +clients can write either field and read either field. + +But what about creation or update from the output of a GET, or update via PATCH +(see [In-place updates](https://kubernetes.io/docs/user-guide/managing-deployments/#in-place-updates-of-resources))? +In these cases, the two fields will conflict, because only one field would be +updated in the case of an old client that was only aware of the old field +(e.g. `height`). + +Suppose the client creates: + +```json +{ + "height": 10, + "width": 5 +} +``` + +and GETs: + +```json +{ + "height": 10, + "heightInInches": 10, + "width": 5, + "widthInInches": 5 +} +``` + +then PUTs back: + +```json +{ + "height": 13, + "heightInInches": 10, + "width": 5, + "widthInInches": 5 +} +``` + +As per the compatibility rules, the update must not fail, because it would have +worked before the change. + +## Backward compatibility gotchas + +* A single feature/property cannot be represented using multiple spec fields + simultaneously within an API version. Only one representation can be + populated at a time, and the client needs to be able to specify which field + they expect to use (typically via API version), on both mutation and read. As + above, older clients must continue to function properly. + +* A new representation, even in a new API version, that is more expressive than an + old one breaks backward compatibility, since clients that only understood the + old representation would not be aware of the new representation nor its + semantics. Examples of proposals that have run into this challenge include + [generalized label selectors](http://issues.k8s.io/341) and [pod-level security context](http://prs.k8s.io/12823). + +* Enumerated values cause similar challenges. Adding a new value to an enumerated set + is *not* a compatible change. Clients which assume they know how to handle all possible + values of a given field will not be able to handle the new values. However, removing a + value from an enumerated set *can* be a compatible change, if handled properly (treat the + removed value as deprecated but allowed). For enumeration-like fields that expect to add + new values in the future, such as `reason` fields, please document that expectation clearly + in the API field descriptions. Clients should treat such sets of values as potentially + open-ended. + +* For [Unions](api-conventions.md#unions), sets of fields where at most one should + be set, it is acceptable to add a new option to the union if the [appropriate + conventions](api-conventions.md#objects) were followed in the original object. + Removing an option requires following the [deprecation process](https://kubernetes.io/docs/reference/deprecation-policy/). + +* Changing any validation rules always has the potential of breaking some client, since it changes the + assumptions about part of the API, similar to adding new enum values. Validation rules on spec fields can + neither be relaxed nor strengthened. Strengthening cannot be permitted because any requests that previously + worked must continue to work. Weakening validation has the potential to break other consumers and generators + of the API resource. Status fields whose writers are under our control (e.g., written by non-pluggable + controllers), may potentially tighten validation, since that would cause a subset of previously valid + values to be observable by clients. + +* Do not add a new API version of an existing resource and make it the preferred version in the same + release, and do not make it the storage version. The latter is necessary so that a rollback of the + apiserver doesn't render resources in etcd undecodable after rollback. + +* Any field with a default value in one API version must have a *non-nil* default + value in all API versions. This can be split into 2 cases: + * Adding a new API version with a default value for an existing non-defaulted + field: it is required to add a default value semantically equivalent to + being unset in all previous API versions, to preserve the semantic meaning + of the value being unset. + * Adding a new field with a default value: the default values must be + semantically equivalent in all currently supported API versions. + +## Incompatible API changes + +There are times when incompatible changes might be OK, but mostly we want +changes that meet the above definitions. If you think you need to break +compatibility, you should talk to the Kubernetes API reviewers first. + +Breaking compatibility of a beta or stable API version, such as v1, is +unacceptable. Compatibility for experimental or alpha APIs is not strictly +required, but breaking compatibility should not be done lightly, as it disrupts +all users of the feature. Alpha and beta API versions may be deprecated and +eventually removed wholesale, as described in the [deprecation policy](https://kubernetes.io/docs/reference/deprecation-policy/). + +If your change is going to be backward incompatible or might be a breaking +change for API consumers, please send an announcement to +`kubernetes-dev@googlegroups.com` before the change gets in. If you are unsure, +ask. Also make sure that the change gets documented in the release notes for the +next release by labeling the PR with the "release-note-action-required" github label. + +If you found that your change accidentally broke clients, it should be reverted. + +In short, the expected API evolution is as follows: + +* `newapigroup/v1alpha1` -> ... -> `newapigroup/v1alphaN` -> +* `newapigroup/v1beta1` -> ... -> `newapigroup/v1betaN` -> +* `newapigroup/v1` -> +* `newapigroup/v2alpha1` -> ... + +While in alpha we expect to move forward with it, but may break it. + +Once in beta we will preserve forward compatibility, but may introduce new +versions and delete old ones. + +v1 must be backward-compatible for an extended length of time. + +## Changing versioned APIs + +For most changes, you will probably find it easiest to change the versioned +APIs first. This forces you to think about how to make your change in a +compatible way. Rather than doing each step in every version, it's usually +easier to do each versioned API one at a time, or to do all of one version +before starting "all the rest". + +### Edit types.go + +The struct definitions for each API are in +`staging/src/k8s.io/api/<group>/<version>/types.go`. Edit those files to reflect +the change you want to make. Note that all types and non-inline fields in +versioned APIs must be preceded by descriptive comments - these are used to +generate documentation. Comments for types should not contain the type name; API +documentation is generated from these comments and end-users should not be +exposed to golang type names. + +For types that need the generated +[DeepCopyObject](https://github.com/kubernetes/kubernetes/commit/8dd0989b395b29b872e1f5e06934721863e4a210#diff-6318847735efb6fae447e7dbf198c8b2R3767) +methods, usually only required by the top-level types like `Pod`, add this line +to the comment +([example](https://github.com/kubernetes/kubernetes/commit/39d95b9b065fffebe5b6f233d978fe1723722085#diff-ab819c2e7a94a3521aecf6b477f9b2a7R30)): + +```golang + // +k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object +``` + +Optional fields should have the `,omitempty` json tag; fields are interpreted as +being required otherwise. + +### Edit defaults.go + +If your change includes new fields for which you will need default values, you +need to add cases to `pkg/apis/<group>/<version>/defaults.go`. + +**Note:** When adding default values to new fields, you *must* also add default +values in all API versions, instead of leaving new fields unset (e.g. `nil`) in +old API versions. This is required because defaulting happens whenever a +serialized version is read (see [#66135]). When possible, pick meaningful values +as sentinels for unset values. + +In the past the core v1 API +was special. Its `defaults.go` used to live at `pkg/api/v1/defaults.go`. +If you see code referencing that path, you can be sure its outdated. Now the core v1 api lives at +`pkg/apis/core/v1/defaults.go` which follows the above convention. + +Of course, since you have added code, you have to add a test: +`pkg/apis/<group>/<version>/defaults_test.go`. + +Do use pointers to scalars when you need to distinguish between an unset value +and an automatic zero value. For example, +`PodSpec.TerminationGracePeriodSeconds` is defined as `*int64` the go type +definition. A zero value means 0 seconds, and a nil value asks the system to +pick a default. + +Don't forget to run the tests! + +[#66135]: https://github.com/kubernetes/kubernetes/issues/66135 + +### Edit conversion.go + +Given that you have not yet changed the internal structs, this might feel +premature, and that's because it is. You don't yet have anything to convert to +or from. We will revisit this in the "internal" section. If you're doing this +all in a different order (i.e. you started with the internal structs), then you +should jump to that topic below. In the very rare case that you are making an +incompatible change you might or might not want to do this now, but you will +have to do more later. The files you want are +`pkg/apis/<group>/<version>/conversion.go` and +`pkg/apis/<group>/<version>/conversion_test.go`. + +Note that the conversion machinery doesn't generically handle conversion of +values, such as various kinds of field references and API constants. [The client +library](https://github.com/kubernetes/client-go/blob/v4.0.0-beta.0/rest/request.go#L352) +has custom conversion code for field references. You also need to add a call to +`AddFieldLabelConversionFunc` of your scheme with a mapping function that +understands supported translations, like this +[line](https://github.com/kubernetes/kubernetes/blob/v1.8.0-alpha.2/pkg/api/v1/conversion.go#L165). + +## Changing the internal structures + +Now it is time to change the internal structs so your versioned changes can be +used. + +### Edit types.go + +Similar to the versioned APIs, the definitions for the internal structs are in +`pkg/apis/<group>/types.go`. Edit those files to reflect the change you want to +make. Keep in mind that the internal structs must be able to express *all* of +the versioned APIs. + +Similar to the versioned APIs, you need to add the `+k8s:deepcopy-gen` tag to +types that need generated DeepCopyObject methods. + +## Edit validation.go + +Most changes made to the internal structs need some form of input validation. +Validation is currently done on internal objects in +`pkg/apis/<group>/validation/validation.go`. This validation is the one of the +first opportunities we have to make a great user experience - good error +messages and thorough validation help ensure that users are giving you what you +expect and, when they don't, that they know why and how to fix it. Think hard +about the contents of `string` fields, the bounds of `int` fields and the +optionality of fields. + +Of course, code needs tests - `pkg/apis/<group>/validation/validation_test.go`. + +## Edit version conversions + +At this point you have both the versioned API changes and the internal +structure changes done. If there are any notable differences - field names, +types, structural change in particular - you must add some logic to convert +versioned APIs to and from the internal representation. If you see errors from +the `serialization_test`, it may indicate the need for explicit conversions. + +Performance of conversions very heavily influence performance of apiserver. +Thus, we are auto-generating conversion functions that are much more efficient +than the generic ones (which are based on reflections and thus are highly +inefficient). + +The conversion code resides with each versioned API. There are two files: + + - `pkg/apis/<group>/<version>/conversion.go` containing manually written + conversion functions + - `pkg/apis/<group>/<version>/zz_generated.conversion.go` containing + auto-generated conversion functions + +Since auto-generated conversion functions are using manually written ones, +those manually written should be named with a defined convention, i.e. a +function converting type `X` in pkg `a` to type `Y` in pkg `b`, should be named: +`convert_a_X_To_b_Y`. + +Also note that you can (and for efficiency reasons should) use auto-generated +conversion functions when writing your conversion functions. + +Adding manually written conversion also requires you to add tests to +`pkg/apis/<group>/<version>/conversion_test.go`. + +Once all the necessary manually written conversions are added, you need to +regenerate auto-generated ones. To regenerate them run: + +```sh +make clean && make generated_files +``` + +`make clean` is important, otherwise the generated files might be stale, because +the build system uses custom cache. + +`make all` will invoke `make generated_files` as well. + +The `make generated_files` will also regenerate the `zz_generated.deepcopy.go`, +`zz_generated.defaults.go`, and `api/openapi-spec/swagger.json`. + +If regeneration is somehow not possible due to compile errors, the easiest +workaround is to remove the files causing errors and rerun the command. + +## Generate Code + +Apart from the `defaulter-gen`, `deepcopy-gen`, `conversion-gen` and +`openapi-gen`, there are a few other generators: + - `go-to-protobuf` + - `client-gen` + - `lister-gen` + - `informer-gen` + - `codecgen` (for fast json serialization with ugorji codec) + +Many of the generators are based on +[`gengo`](https://github.com/kubernetes/gengo) and share common +flags. The `--verify-only` flag will check the existing files on disk +and fail if they are not what would have been generated. + +The generators that create go code have a `--go-header-file` flag +which should be a file that contains the header that should be +included. This header is the copyright that should be present at the +top of the generated file and should be checked with the +[`repo-infra/verify/verify-boilerplane.sh`](https://git.k8s.io/repo-infra/verify/verify-boilerplate.sh) +script at a later stage of the build. + +To invoke these generators, you can run `make update`, which runs a bunch of +[scripts](https://github.com/kubernetes/kubernetes/blob/v1.8.0-alpha.2/hack/update-all.sh#L63-L78). +Please continue to read the next a few sections, because some generators have +prerequisites, also because they introduce how to invoke the generators +individually if you find `make update` takes too long to run. + +### Generate protobuf objects + +For any core API object, we also need to generate the Protobuf IDL and marshallers. +That generation is invoked with + +```sh +hack/update-generated-protobuf.sh +``` + +The vast majority of objects will not need any consideration when converting +to protobuf, but be aware that if you depend on a Golang type in the standard +library there may be additional work required, although in practice we typically +use our own equivalents for JSON serialization. The `pkg/api/serialization_test.go` +will verify that your protobuf serialization preserves all fields - be sure to +run it several times to ensure there are no incompletely calculated fields. + +### Generate Clientset + +`client-gen` is a tool to generate clientsets for top-level API objects. + +`client-gen` requires the `// +genclient` annotation on each +exported type in both the internal `pkg/apis/<group>/types.go` as well as each +specifically versioned `staging/src/k8s.io/api/<group>/<version>/types.go`. + +If the apiserver hosts your API under a different group name than the `<group>` +in the filesystem, (usually this is because the `<group>` in the filesystem +omits the "k8s.io" suffix, e.g., admission vs. admission.k8s.io), you can +instruct the `client-gen` to use the correct group name by adding the `// ++groupName=` annotation in the `doc.go` in both the internal +`pkg/apis/<group>/doc.go` as well as in each specifically versioned +`staging/src/k8s.io/api/<group>/<version>/types.go`. + +Once you added the annotations, generate the client with + +```sh +hack/update-codegen.sh +``` + +Note that you can use the optional `// +groupGoName=` to specify a CamelCase +custom Golang identifier to de-conflict e.g. `policy.authorization.k8s.io` and +`policy.k8s.io`. These two would both map to `Policy()` in clientsets. + +client-gen is flexible. See [this document](generating-clientset.md) if you need +client-gen for non-kubernetes API. + +### Generate Listers + +`lister-gen` is a tool to generate listers for a client. It reuses the +`//+genclient` and the `// +groupName=` annotations, so you do not need to +specify extra annotations. + +Your previous run of `hack/update-codegen.sh` has invoked `lister-gen`. + +### Generate Informers + +`informer-gen` generates the very useful Informers which watch API +resources for changes. It reuses the `//+genclient` and the +`//+groupName=` annotations, so you do not need to specify extra annotations. + +Your previous run of `hack/update-codegen.sh` has invoked `informer-gen`. + +### Edit json (un)marshaling code + +We are auto-generating code for marshaling and unmarshaling json representation +of api objects - this is to improve the overall system performance. + +The auto-generated code resides with each versioned API: + + - `staging/src/k8s.io/api/<group>/<version>/generated.proto` + - `staging/src/k8s.io/api/<group>/<version>/generated.pb.go` + +To regenerate them run: + +```sh +hack/update-generated-protobuf.sh +``` + +## Making a new API Version + +This section is under construction, as we make the tooling completely generic. + +If you are adding a new API version to an existing group, you can copy the +structure of the existing `pkg/apis/<group>/<existing-version>` and +`staging/src/k8s.io/api/<group>/<existing-version>` directories. + +Due to the fast changing nature of the project, the following content is probably out-dated: +* You can control if the version is enabled by default by update +[pkg/master/master.go](https://github.com/kubernetes/kubernetes/blob/v1.8.0-alpha.2/pkg/master/master.go#L381). +* You must add the new version to + [pkg/apis/group_name/install/install.go](https://github.com/kubernetes/kubernetes/blob/v1.8.0-alpha.2/pkg/apis/apps/install/install.go). +* You must add the new version to + [hack/lib/init.sh#KUBE_AVAILABLE_GROUP_VERSIONS](https://github.com/kubernetes/kubernetes/blob/v1.8.0-alpha.2/hack/lib/init.sh#L53). +* You must add the new version to + [hack/update-generated-protobuf-dockerized.sh](https://github.com/kubernetes/kubernetes/blob/v1.8.2/hack/update-generated-protobuf-dockerized.sh#L44) + to generate protobuf IDL and marshallers. +* You must add the new version to + [cmd/kube-apiserver/app#apiVersionPriorities](https://github.com/kubernetes/kubernetes/blob/v1.8.0-alpha.2/cmd/kube-apiserver/app/aggregator.go#L172) +* You must setup storage for the new version in + [pkg/registry/group_name/rest](https://github.com/kubernetes/kubernetes/blob/v1.8.0-alpha.2/pkg/registry/authentication/rest/storage_authentication.go) + +You need to regenerate the generated code as instructed in the sections above. + +## Making a new API Group + +You'll have to make a new directory under `pkg/apis/` and +`staging/src/k8s.io/api`; copy the directory structure of an existing API group, +e.g. `pkg/apis/authentication` and `staging/src/k8s.io/api/authentication`; +replace "authentication" with your group name and replace versions with your +versions; replace the API kinds in +[versioned](https://github.com/kubernetes/kubernetes/blob/v1.8.0-alpha.2/staging/src/k8s.io/api/authentication/v1/register.go#L47) +and +[internal](https://github.com/kubernetes/kubernetes/blob/v1.8.0-alpha.2/pkg/apis/authentication/register.go#L47) +register.go, and +[install.go](https://github.com/kubernetes/kubernetes/blob/v1.8.0-alpha.2/pkg/apis/authentication/install/install.go#L43) +with your kinds. + +You'll have to add your API group/version to a few places in the code base, as +noted in [Making a new API Version](#making-a-new-api-version) section. + +You need to regenerate the generated code as instructed in the sections above. + +## Update the fuzzer + +Part of our testing regimen for APIs is to "fuzz" (fill with random values) API +objects and then convert them to and from the different API versions. This is +a great way of exposing places where you lost information or made bad +assumptions. If you have added any fields which need very careful formatting +(the test does not run validation) or if you have made assumptions such as +"this slice will always have at least 1 element", you may get an error or even +a panic from the `serialization_test`. If so, look at the diff it produces (or +the backtrace in case of a panic) and figure out what you forgot. Encode that +into the fuzzer's custom fuzz functions. Hint: if you added defaults for a +field, that field will need to have a custom fuzz function that ensures that the +field is fuzzed to a non-empty value. + +The fuzzer can be found in `pkg/api/testing/fuzzer.go`. + +## Update the semantic comparisons + +VERY VERY rarely is this needed, but when it hits, it hurts. In some rare cases +we end up with objects (e.g. resource quantities) that have morally equivalent +values with different bitwise representations (e.g. value 10 with a base-2 +formatter is the same as value 0 with a base-10 formatter). The only way Go +knows how to do deep-equality is through field-by-field bitwise comparisons. +This is a problem for us. + +The first thing you should do is try not to do that. If you really can't avoid +this, I'd like to introduce you to our `apiequality.Semantic.DeepEqual` routine. +It supports custom overrides for specific types - you can find that in +`pkg/api/helper/helpers.go`. + +There's one other time when you might have to touch this: `unexported fields`. +You see, while Go's `reflect` package is allowed to touch `unexported fields`, +us mere mortals are not - this includes `apiequality.Semantic.DeepEqual`. +Fortunately, most of our API objects are "dumb structs" all the way down - all +fields are exported (start with a capital letter) and there are no unexported +fields. But sometimes you want to include an object in our API that does have +unexported fields somewhere in it (for example, `time.Time` has unexported fields). +If this hits you, you may have to touch the `apiequality.Semantic.DeepEqual` +customization functions. + +## Implement your change + +Now you have the API all changed - go implement whatever it is that you're +doing! + +## Write end-to-end tests + +Check out the [E2E docs](e2e-tests.md) for detailed information about how to +write end-to-end tests for your feature. + +## Examples and docs + +At last, your change is done, all unit tests pass, e2e passes, you're done, +right? Actually, no. You just changed the API. If you are touching an existing +facet of the API, you have to try *really* hard to make sure that *all* the +examples and docs are updated. There's no easy way to do this, due in part to +JSON and YAML silently dropping unknown fields. You're clever - you'll figure it +out. Put `grep` or `ack` to good use. + +If you added functionality, you should consider documenting it and/or writing +an example to illustrate your change. + +Make sure you update the swagger and OpenAPI spec by running: + +```sh +hack/update-swagger-spec.sh +hack/update-openapi-spec.sh +``` + +The API spec changes should be in a commit separate from your other changes. + +## Alpha, Beta, and Stable Versions + +New feature development proceeds through a series of stages of increasing +maturity: + +- Development level + - Object Versioning: no convention + - Availability: not committed to main kubernetes repo, and thus not available +in official releases + - Audience: other developers closely collaborating on a feature or +proof-of-concept + - Upgradeability, Reliability, Completeness, and Support: no requirements or +guarantees +- Alpha level + - Object Versioning: API version name contains `alpha` (e.g. `v1alpha1`) + - Availability: committed to main kubernetes repo; appears in an official +release; feature is disabled by default, but may be enabled by flag + - Audience: developers and expert users interested in giving early feedback on +features + - Completeness: some API operations, CLI commands, or UI support may not be +implemented; the API need not have had an *API review* (an intensive and +targeted review of the API, on top of a normal code review) + - Upgradeability: the object schema and semantics may change in a later +software release, without any provision for preserving objects in an existing +cluster; removing the upgradability concern allows developers to make rapid +progress; in particular, API versions can increment faster than the minor +release cadence and the developer need not maintain multiple versions; +developers should still increment the API version when object schema or +semantics change in an [incompatible way](#on-compatibility) + - Cluster Reliability: because the feature is relatively new, and may lack +complete end-to-end tests, enabling the feature via a flag might expose bugs +with destabilize the cluster (e.g. a bug in a control loop might rapidly create +excessive numbers of object, exhausting API storage). + - Support: there is *no commitment* from the project to complete the feature; +the feature may be dropped entirely in a later software release + - Recommended Use Cases: only in short-lived testing clusters, due to +complexity of upgradeability and lack of long-term support and lack of +upgradability. +- Beta level: + - Object Versioning: API version name contains `beta` (e.g. `v2beta3`) + - Availability: in official Kubernetes releases, and enabled by default + - Audience: users interested in providing feedback on features + - Completeness: all API operations, CLI commands, and UI support should be +implemented; end-to-end tests complete; the API has had a thorough API review +and is thought to be complete, though use during beta may frequently turn up API +issues not thought of during review + - Upgradeability: the object schema and semantics may change in a later +software release; when this happens, an upgrade path will be documented; in some +cases, objects will be automatically converted to the new version; in other +cases, a manual upgrade may be necessary; a manual upgrade may require downtime +for anything relying on the new feature, and may require manual conversion of +objects to the new version; when manual conversion is necessary, the project +will provide documentation on the process + - Cluster Reliability: since the feature has e2e tests, enabling the feature +via a flag should not create new bugs in unrelated features; because the feature +is new, it may have minor bugs + - Support: the project commits to complete the feature, in some form, in a +subsequent Stable version; typically this will happen within 3 months, but +sometimes longer; releases should simultaneously support two consecutive +versions (e.g. `v1beta1` and `v1beta2`; or `v1beta2` and `v1`) for at least one +minor release cycle (typically 3 months) so that users have enough time to +upgrade and migrate objects + - Recommended Use Cases: in short-lived testing clusters; in production +clusters as part of a short-lived evaluation of the feature in order to provide +feedback +- Stable level: + - Object Versioning: API version `vX` where `X` is an integer (e.g. `v1`) + - Availability: in official Kubernetes releases, and enabled by default + - Audience: all users + - Completeness: must have conformance tests, approved by SIG Architecture, +in the appropriate conformance profile (e.g., non-portable and/or optional +features may not be in the default profile) + - Upgradeability: only [strictly compatible](#on-compatibility) changes +allowed in subsequent software releases + - Cluster Reliability: high + - Support: API version will continue to be present for many subsequent +software releases; + - Recommended Use Cases: any + +### Adding Unstable Features to Stable Versions + +When adding a feature to an object which is already Stable, the new fields and +new behaviors need to meet the Stable level requirements. If these cannot be +met, then the new field cannot be added to the object. + +For example, consider the following object: + +```go +// API v6. +type Frobber struct { + // height ... + Height *int32 `json:"height" + // param ... + Param string `json:"param" +} +``` + +A developer is considering adding a new `Width` parameter, like this: + +```go +// API v6. +type Frobber struct { + // height ... + Height *int32 `json:"height" + // param ... + Param string `json:"param" + // width ... + Width *int32 `json:"width,omitempty" +} +``` + +However, the new feature is not stable enough to be used in a stable version +(`v6`). Some reasons for this might include: + +- the final representation is undecided (e.g. should it be called `Width` or `Breadth`?) +- the implementation is not stable enough for general use (e.g. the `Area()` routine sometimes overflows.) + +The developer cannot add the new field unconditionally until stability is met. However, +sometimes stability cannot be met until some users try the new feature, and some +users are only able or willing to accept a released version of Kubernetes. In +that case, the developer has a few options, both of which require staging work +over several releases. + +#### Alpha field in existing API version + +Previously, annotations were used for experimental alpha features, but are no longer recommended for several reasons: + +* They expose the cluster to "time-bomb" data added as unstructured annotations against an earlier API server (https://issue.k8s.io/30819) +* They cannot be migrated to first-class fields in the same API version (see the issues with representing a single value in multiple places in [backward compatibility gotchas](#backward-compatibility-gotchas)) + +The preferred approach adds an alpha field to the existing object, and ensures it is disabled by default: + +1. Add a feature gate to the API server to control enablement of the new field (and associated function): + + In [staging/src/k8s.io/apiserver/pkg/features/kube_features.go](https://git.k8s.io/kubernetes/staging/src/k8s.io/apiserver/pkg/features/kube_features.go): + + ```go + // owner: @you + // alpha: v1.11 + // + // Add multiple dimensions to frobbers. + Frobber2D utilfeature.Feature = "Frobber2D" + + var defaultKubernetesFeatureGates = map[utilfeature.Feature]utilfeature.FeatureSpec{ + ... + Frobber2D: {Default: false, PreRelease: utilfeature.Alpha}, + } + ``` + +2. Add the field to the API type: + + * ensure the field is [optional](api-conventions.md#optional-vs-required) + * add the `omitempty` struct tag + * add the `// +optional` comment tag + * ensure the field is entirely absent from API responses when empty (optional fields should be pointers, anyway) + * include details about the alpha-level in the field description + + ```go + // API v6. + type Frobber struct { + // height ... + Height int32 `json:"height"` + // param ... + Param string `json:"param"` + // width indicates how wide the object is. + // This field is alpha-level and is only honored by servers that enable the Frobber2D feature. + // +optional + Width *int32 `json:"width,omitempty"` + } + ``` + +3. Before persisting the object to storage, clear disabled alpha fields on create, +and on update if the existing object does not already have a value in the field. +This prevents new usage of the feature while it is disabled, while ensuring existing data is preserved. +The recommended place to do this is in the REST storage strategy's PrepareForCreate/PrepareForUpdate methods: + + ```go + func (frobberStrategy) PrepareForCreate(ctx genericapirequest.Context, obj runtime.Object) { + frobber := obj.(*api.Frobber) + + if !utilfeature.DefaultFeatureGate.Enabled(features.Frobber2D) { + frobber.Width = nil + } + } + + func (frobberStrategy) PrepareForUpdate(ctx genericapirequest.Context, obj, old runtime.Object) { + newFrobber := obj.(*api.Frobber) + oldFrobber := old.(*api.Frobber) + + if !utilfeature.DefaultFeatureGate.Enabled(features.Frobber2D) && oldFrobber.Width == nil { + newFrobber.Width = nil + } + } + ``` + +4. In validation, validate the field if present: + + ```go + func ValidateFrobber(f *api.Frobber, fldPath *field.Path) field.ErrorList { + ... + if f.Width != nil { + ... validation of width field ... + } + ... + } + ``` + +In future Kubernetes versions: + +* if the feature progresses to beta or stable status, the feature gate can be removed or be enabled by default. +* if the schema of the alpha field must change in an incompatible way, a new field name must be used. +* if the feature is abandoned, or the field name is changed, the field should be removed from the go struct, with a tombstone comment ensuring the field name and protobuf tag are not reused: + + ```go + // API v6. + type Frobber struct { + // height ... + Height int32 `json:"height" protobuf:"varint,1,opt,name=height"` + // param ... + Param string `json:"param" protobuf:"bytes,2,opt,name=param"` + + // +k8s:deprecated=width,protobuf=3 + } + ``` + +#### New alpha API version + +Another option is to introduce a new type with an new `alpha` or `beta` version +designator, like this: + +```go +// API v7alpha1 +type Frobber struct { + // height ... + Height *int32 `json:"height"` + // param ... + Param string `json:"param"` + // width ... + Width *int32 `json:"width,omitempty"` +} +``` + +The latter requires that all objects in the same API group as `Frobber` to be +replicated in the new version, `v7alpha1`. This also requires user to use a new +client which uses the other version. Therefore, this is not a preferred option. + +A related issue is how a cluster manager can roll back from a new version +with a new feature, that is already being used by users. See +https://github.com/kubernetes/kubernetes/issues/4855. diff --git a/contributors/devel/sig-architecture/component-config-conventions.md b/contributors/devel/sig-architecture/component-config-conventions.md new file mode 100644 index 00000000..d3fe1000 --- /dev/null +++ b/contributors/devel/sig-architecture/component-config-conventions.md @@ -0,0 +1,221 @@ +# Component Configuration Conventions + +# Objective + +This document concerns the configuration of Kubernetes system components (as +opposed to the configuration of user workloads running on Kubernetes). +Component configuration is a major operational burden for operators of +Kubernetes clusters. To date, much literature has been written on and much +effort expended to improve component configuration. Despite this, the state of +component configuration remains dissonant. This document attempts to aggregate +that literature and propose a set of guidelines that component owners can +follow to improve consistency across the project. + +# Background + +Currently, component configuration is primarily driven through command line +flags. Command line driven configuration poses certain problems which are +discussed below. Attempts to improve component configuration as a whole have +been slow to make progress and have petered out (ref componentconfig api group, +configmap driven config issues). Some component owners have made use case +specific improvements on a per-need basis. Various comments in issues recommend +subsets of best design practice but no coherent, complete story exists. + +## Pain Points of Current Configuration + +Flag based configuration has poor qualities such as: + +1. Flags exist in a flat namespace, hampering the ability to organize them and expose them in helpful documentation. --help becomes useless as a reference as the number of knobs grows. It's impossible to distinguish useful knobs from cruft. +1. Flags can't easily have different values for different instances of a class. To adjust the resync period in the informers of O(n) controllers requires O(n) different flags in a global namespace. +1. Changing a process's command line necessitates a binary restart. This negatively impacts availability. +1. Flags are unsuitable for passing confidential configuration. The command line of a process is available to unprivileged process running in the host pid namespace. +1. Flags are a public API but are unversioned and unversionable. +1. Many arguments against using global variables apply to flags. + +Configuration in general has poor qualities such as: + +1. Configuration changes have the same forward/backward compatibility requirements as releases but rollout/rollback of configuration largely untested. Examples of configuration changes that might break a cluster: kubelet CNI plugin, etcd storage version. +1. Configuration options often exist only to test a specific feature where the default is reasonable for all real use cases. Examples: many sync periods. +1. Configuration options often exist to defer a "hard" design decision and to pay forward the "TODO(someone-else): think critically". +1. Configuration options are often used to workaround deficiencies of the API. For example `--register-with-labels` and `--register-with-taints` could be solved with a node initializer, if initializers existed. +1. Configuration options often exist to take testing shortcuts. There is a mentality that because a feature is opt-in, it can be released as a flag without robust testing. +1. Configuration accumulates new knobs, knobs accumulate new behaviors, knobs are forgotten and bitrot reducing code quality over time. +1. Number of configuration options is inversely proportional to test coverage. The size of the configuration state space grows >O(2^n) with the number of configuration bits. A handful of states in that space are ever tested. +1. Configuration options hamper troubleshooting efforts. On github, users frequently file tickets from environments that are neither consistent nor reproducible. + +## Types Of Configuration + +Configuration can only come from three sources: + +1. Command line flags. +1. API types serialized and stored on disk. +1. API types serialized and stored in the kubernetes API. + +Configuration options can be partitioned along certain lines. To name a few +important partitions: + +1. Bootstrap: This is configuration that is required before the component can contact the API. Examples include the kubeconfig and the filepath to the kubeconfig. +1. Dynamic vs Static: Dynamic config is config that is expected to change as part of normal operations such as a scheduler configuration or a node entering maintenance mode. Static config is config that is unlikely to change over subsequent deployments and even releases of a component. +1. Shared vs Per-Instance: Per-Instance configuration is configuration whose value is unique to the instance that the node runs on (e.g. Kubelet's `--hostname-override`). +1. Feature Gates: Feature gates are configuration options that enable a feature that has been deemed unsafe to enable by default. +1. Request context dependent: Request context dependent config is config that should probably be scoped to an attribute of the request (such as the user). We do a pretty good job of keeping these out of config and in policy objects (e.g. Quota, RBAC) but we could do more (e.g. rate limits). +1. Environment information: This is configuration that is available through downwards and OS APIs, e.g. node name, pod name, number of cpus, IP address. + +# Requirements + +Desired qualities of a configuration solution: + +1. Secure: We need to control who can change configuration. We need to control who can read sensitive configuration. +1. Manageable: We need to control which instances of a component uses which configuration, especially when those instances differ in version. +1. Reliable: Configuration pushes should just work. If they fail, they should fail early in the rollout, rollback config if possible, and alert noisily. +1. Recoverable: We need to be able to update (e.g. rollback) configuration when a component is down. +1. Monitorable: Both humans and computers need to monitor configuration; humans through json interfaces like /configz, computers through interfaces like prometheus /streamz. Confidential configuration needs to be accounted for, but can also be useful to monitor in an unredacted or partially redacted (i.e. hashed) form. +1. Verifiable: We need to be able to verify that a configuration is good. We need to verify the integrity of the received configuration and we need to validate that the encoded configuration state is sensible. +1. Auditable: We need to be able to trace the origin of a configuration change. +1. Accountable: We need to correlate a configuration push with its impact to the system. We need to be able to do this at the time of the push and later when analyzing logs. +1. Available: We should avoid high frequency configuration updates that require service disruption. We need to take into account system component SLA. +1. Scalable: We need to support distributing configuration to O(10,000) components at our current supported scalability limits. +1. Consistent: There should exist conventions that hold across components. +1. Composable: We should favor composition of configuration sources over layering/templating/inheritance. +1. Normalized: Redundant specification of configuration data should be avoided. +1. Testable: We need to be able to test the system under many different configurations. We also need to test configuration changes, both dynamic changes and those that require process restarts. +1. Maintainable: We need to push back on ever increasing cyclomatic complexity in our codebase. Each if statement and function argument added to support a configuration option negatively impacts the maintainability of our code. +1. Evolvable: We need to be able to extend our configuration API like we extend our other user facing APIs. We need to hold our configuration API to the same SLA and deprecation policy of public facing APIs. (e.g. [dynamic admission control](https://github.com/kubernetes/community/pull/611) and [hooks](https://github.com/kubernetes/kubernetes/issues/3585)) + +These don't need to be implemented immediately but are good to keep in mind. At +some point these should be ranked by priority and implemented. + +# Two Part Solution: + +## Part 1: Don't Make It Configuration + +The most effective way to reduce the operational burden of configuration is to +minimize the amount of configuration. When adding a configuration option, ask +whether alternatives might be a better fit. + +1. Policy objects: Create first class Kubernetes objects to encompass how the system should behave. These are especially useful for request context dependent configuration. We do this already in places such as RBAC and ResourceQuota but we could do more such as rate limiting. We should never hardcode groups or usermaps in configuration. +1. API features: Use (or implement) functionality of the API (e.g. think through and implement initializers instead of --register-with-label). Allowing for extension in the right places is a better way to give users control. +1. Feature discovery: Write components that introspect the existing API to decide whether to enable a feature or not. E.g. controller-manager should start an app controller if the app API is available, kubelet should enable zram if zram is set in the node spec. +1. Downwards API: Use the APIs that the OS and pod environment expose directly before opting to pass in new configuration options. +1. const's: If you don't know whether tweaking a value will be useful, make the value const. Only give it a configuration option once there becomes a need to tweak the value at runtime. +1. Autotuning: Build systems that incorporate feedback and do the best thing under the given circumstances. This makes the system more robust. (e.g. prefer congestion control, load shedding, backoff rather than explicit limiting). +1. Avoid feature flags: Turn on features when they are tested and ready for production. Don't use feature flags as a fallback for poorly tested code. +1. Configuration profiles: Instead of allowing individual configuration options to be modified, try to encompass a broader desire as a configuration profile. For example: instead of enabling individual alpha features, have an EnableAlpha option that enables all. Instead of allowing individual controller knobs to be modified, have a TestMode option that sets a broad number of parameters to be suitable for tests. + +## Part 2: Component Configuration + +### Versioning Configuration + +We create configuration API groups per component that live in the source tree of +the component. Each component has its own API group for configuration. +Components will use the same API machinery that we use for other API groups. +Configuration API serialization doesn't have the same performance requirements +as other APIs so much of the codegen can be avoided (e.g. ugorji, generated +conversions) and we can instead fallback to the reflection based implementations +where they exist. + +Configuration API groups for component config should be named according to the +scheme `<component>.config.k8s.io`. The `.config.k8s.io` suffix serves to +disambiguate types of config API groups from served APIs. + +### Retrieving Configuration + +The primary mechanism for retrieving static configuration should be +deserialization from files. For the majority of components (with the possible +exception of the kubelet, see +[here](https://github.com/kubernetes/kubernetes/pull/29459)), these files will +be source from the configmap API and managed by the kubelet. Reliability of +this mechanism is predicated on kubelet checkpointing of pod dependencies. + + +### Structuring Configuration + +Group related options into distinct objects and subobjects. Instead of writing: + + +```yaml +kind: KubeProxyConfiguration +apiVersion: kubeproxy.config.k8s.io/v1beta3 +ipTablesSyncPeriod: 2 +ipTablesConntrackHashSize: 2 +ipTablesConntrackTableSize: 2 +``` + +Write: + +```yaml +kind: KubeProxyConfiguration +apiVersion: kubeproxy.config.k8s.io/v1beta3 +ipTables: + syncPeriod: 2 + conntrack: + hashSize: 2 + tableSize: 2 +``` + +We should avoid passing around full configuration options to deeply constructed +modules. For example, instead of calling NewSomethingController in the +controller-manager with the full controller-manager config, group relevant +config into a subobject and only pass the subobject. We should expose the +smallest possible necessary configuration to the SomethingController. + + +### Handling Different Types Of Configuration + +Above in "Type Of Configuration" we introduce a few ways to partition +configuration options. Environment information, request context depending +configuration, feature gates, and static configuration should be avoided if at +all possible using a configuration alternative. We should maintain separate +objects along these partitions and consider retrieving these configurations +from separate source (i.e. files). For example: kubeconfig (which falls into +the bootstrapping category) should not be part of the main config option (nor +should the filepath to the kubeconfig), per-instance config should be stored +separately from shared config. This allows for composition and obviates the +need for layering/templating solutions. + + +### In-Process Representation Of Configuration + +We should separate structs for flags, serializable config, and runtime config. + +1. Structs for flags should have enough information for the process startup to retrieve its full configuration. Examples include: path the kubeconfig, path to configuration file, namespace and name of configmap to use for configuration. +1. Structs for serializable configuration: This struct contains the full set of options in a serializable form (e.g. to represent an ip address instead of `net.IP`, use `string`). This is the struct that is versioned and serialized to disk using API machinery. +1. Structs for runtime: This struct holds data in the most appropriate format for execution. This field can hold non-serializable types (e.g. have a `kubeClient` field instead of a `kubeConfig` field, store ip addresses as `net.IP`). + +The flag struct is transformed into the configuration struct which is +transformed into the runtime struct. + + +### Migrating Away From Flags + +Migrating to component configuration can happen incrementally (per component). +By versioning each component's API group separately, we can allow each API +group to advance to beta and GA independently. APIs should be approved by +component owners and reviewers familiar with the component configuration +conventions. We can incentivize operators to migrate away from flags by making +new configuration options only available through the component configuration +APIs. + +# Caveats + +Proposed are not laws but guidelines and as such we've favored completeness +over consistency. There will thus be need for exceptions. + +1. Components (especially those that are not self hosted such as the kubelet) will require custom rollout strategies of new config. +1. Pod checkpointing by kubelet would allow this strategy to be simpler to make reliable. + + +# Miscellaneous Consideration + +1. **This document takes intentionally a very zealous stance against configuration.** Often configuration alternatives are not possible in Kubernetes as they are in proprietary software because Kubernetes has to run in diverse environments, with diverse users, managed by diverse operators. +1. More frequent releases of kubernetes would make "skipping the config knob" more enticing because fixing a bad guess at a const wouldn't take O(4 months) best case to rollout. Factoring in our support for old versions, it takes closer to a year. +1. Self-hosting resolves much of the distribution issue (except for maybe the Kubelet) but reliability is predicated on to-be-implemented features such as kubelet checkpointing of pod dependencies and sound operational practices such as incremental rollout of new configuration using Deployments/DaemonSets. +1. Validating config is hard. Fatal logs lead to crash loops and error logs are ignored. Both options are suboptimal. +1. Configuration needs to be updatable when components are down. +1. Naming style guide: + 1. No negatives, e.g. prefer --enable-foo over --disable-foo + 1. Use the active voice +1. We should actually enforce deprecation. Can we have a test that fails when a comment exists beyond its deadline to be removed? See [#44248](https://github.com/kubernetes/kubernetes/issues/44248) +1. Use different implementations of the same interface rather than if statements to toggle features. This makes deprecation and deletion easy, improving maintainability. +1. How does the proposed solution meet the requirements? Which desired qualities are missed? +1. Configuration changes should trigger predictable and reproducible actions. From a given system state and a given component configuration, we should be able to simulate the actions that the system will take. diff --git a/contributors/devel/sig-architecture/conformance-tests.md b/contributors/devel/sig-architecture/conformance-tests.md new file mode 100644 index 00000000..46ca318d --- /dev/null +++ b/contributors/devel/sig-architecture/conformance-tests.md @@ -0,0 +1,216 @@ +# Conformance Testing in Kubernetes + +The Kubernetes Conformance test suite is a subset of e2e tests that SIG +Architecture has approved to define the core set of interoperable features that +all conformant Kubernetes clusters must support. The tests verify that the +expected behavior works as a user might encounter it in the wild. + +The process to add new conformance tests is intended to decouple the development +of useful tests from their promotion to conformance: +- Contributors write and submit e2e tests, to be approved by owning SIGs +- Tests are proven to meet the [conformance test requirements] by review + and by accumulation of data on flakiness and reliability +- A follow up PR is submitted to [promote the test to conformance](#promoting-tests-to-conformance) + +NB: This should be viewed as a living document in a few key areas: +- The desired set of conformant behaviors is not adequately expressed by the + current set of e2e tests, as such this document is currently intended to + guide us in the addition of new e2e tests than can fill this gap +- This document currently focuses solely on the requirements for GA, + non-optional features or APIs. The list of requirements will be refined over + time to the point where it as concrete and complete as possible. +- There are currently conformance tests that violate some of the requirements + (e.g., require privileged access), we will be categorizing these tests and + deciding what to do once we have a better understanding of the situation +- Once we resolve the above issues, we plan on identifying the appropriate areas + to relax requirements to allow for the concept of conformance Profiles that + cover optional or additional behaviors + +## Conformance Test Requirements + +Conformance tests currently test only GA, non-optional features or APIs. More +specifically, a test is eligible for promotion to conformance if: + +- it tests only GA, non-optional features or APIs (e.g., no alpha or beta + endpoints, no feature flags required, no deprecated features) +- it works for all providers (e.g., no `SkipIfProviderIs`/`SkipUnlessProviderIs` + calls) +- it is non-privileged (e.g., does not require root on nodes, access to raw + network interfaces, or cluster admin permissions) +- it works without access to the public internet (short of whatever is required + to pre-pull images for conformance tests) +- it works without non-standard filesystem permissions granted to pods +- it does not rely on any binaries that would not be required for the linux + kernel or kubelet to run (e.g., can't rely on git) +- any container images used within the test support all architectures for which + kubernetes releases are built +- it passes against the appropriate versions of kubernetes as spelled out in + the [conformance test version skew policy] +- it is stable and runs consistently (e.g., no flakes) + +Examples of features which are not currently eligible for conformance tests: + +- node/platform-reliant features, eg: multiple disk mounts, GPUs, high density, + etc. +- optional features, eg: policy enforcement +- cloud-provider-specific features, eg: GCE monitoring, S3 Bucketing, etc. +- anything that requires a non-default admission plugin + +Examples of tests which are not eligible for promotion to conformance: +- anything that checks specific Events are generated, as we make no guarantees + about the contents of events, nor their delivery +- anything that checks optional Condition fields, such as Reason or Message, as + these may change over time (however it is reasonable to verify these fields + exist or are non-empty) + +Examples of areas we may want to relax these requirements once we have a +sufficient corpus of tests that define out of the box functionality in all +reasonable production worthy environments: +- tests may need to create or set objects or fields that are alpha or beta that + bypass policies that are not yet GA, but which may reasonably be enabled on a + conformant cluster (e.g., pod security policy, non-GA scheduler annotations) + +## Conformance Test Version Skew Policy + +As each new release of Kubernetes provides new functionality, the subset of +tests necessary to demonstrate conformance grows with each release. Conformance +is thus considered versioned, with the same backwards compatibility guarantees +as laid out in the [kubernetes versioning policy] + +To quote: + +> For example, a v1.3 master should work with v1.1, v1.2, and v1.3 nodes, and +> should work with v1.2, v1.3, and v1.4 clients. + +Conformance tests for a given version should be run off of the release branch +that corresponds to that version. Thus `v1.2` conformance tests would be run +from the head of the `release-1.2` branch. + +For example, suppose we're in the midst of developing kubernetes v1.3. Clusters +with the following versions must pass conformance tests built from the +following branches: + +| cluster version | master | release-1.3 | release-1.2 | release-1.1 | +| --------------- | ----- | ----------- | ----------- | ----------- | +| v1.3.0-alpha | yes | yes | yes | no | +| v1.2.x | no | no | yes | yes | +| v1.1.x | no | no | no | yes | + +## Running Conformance Tests + +Conformance tests are designed to be run even when there is no cloud provider +configured. Conformance tests must be able to be run against clusters that have +not been created with `hack/e2e.go`, just provide a kubeconfig with the +appropriate endpoint and credentials. + +These commands are intended to be run within a kubernetes directory, either +cloned from source, or extracted from release artifacts such as +`kubernetes.tar.gz`. They assume you have a valid golang installation. + +```sh +# ensure kubetest is installed +go get -u k8s.io/test-infra/kubetest + +# build test binaries, ginkgo, and kubectl first: +make WHAT="test/e2e/e2e.test vendor/github.com/onsi/ginkgo/ginkgo cmd/kubectl" + +# setup for conformance tests +export KUBECONFIG=/path/to/kubeconfig +export KUBERNETES_CONFORMANCE_TEST=y + +# Option A: run all conformance tests serially +kubetest --provider=skeleton --test --test_args="--ginkgo.focus=\[Conformance\]" + +# Option B: run parallel conformance tests first, then serial conformance tests serially +GINKGO_PARALLEL=y kubetest --provider=skeleton --test --test_args="--ginkgo.focus=\[Conformance\] --ginkgo.skip=\[Serial\]" +kubetest --provider=skeleton --test --test_args="--ginkgo.focus=\[Serial\].*\[Conformance\]" +``` + +## Kubernetes Conformance Document + +For each Kubernetes release, a Conformance Document will be generated that lists +all of the tests that comprise the conformance test suite, along with the formal +specification of each test. For an example, see the [v1.9 conformance doc]. +This document will help people understand what features are being tested without +having to look through the testcase's code directly. + + +## Promoting Tests to Conformance + +To promote a test to the conformance test suite, open a PR as follows: +- is titled "Promote xxx e2e test to Conformance" +- includes information and metadata in the description as follows: + - "/area conformance" on a newline + - "@kubernetes/sig-architecture-pr-reviews @kubernetes/sig-foo-pr-reviews + @kubernetes/cncf-conformance-wg" on a new line, where sig-foo is whichever + sig owns this test + - any necessary information in the description to verify that the test meets + [conformance test requirements], such as links to reports or dashboards that + prove lack of flakiness +- contains no other modifications to test source code other than the following: + - modifies the testcase to use the `framework.ConformanceIt()` function rather + than the `framework.It()` function + - adds a comment immediately before the `ConformanceIt()` call that includes + all of the required [conformance test comment metadata] +- add the PR to SIG Architecture's [Conformance Test Review board] + + +### Conformance Test Comment Metadata + +Each conformance test must include the following piece of metadata +within its associated comment: + +- `Release`: indicates the Kubernetes release that the test was added to the + conformance test suite. If the test was modified in subsequent releases + then those releases should be included as well (comma separated) +- `Testname`: a human readable short name of the test +- `Description`: a detailed description of the test. This field must describe + the required behaviour of the Kubernetes components being tested using + [RFC2119](https://tools.ietf.org/html/rfc2119) keywords. This field + is meant to be a "specification" of the tested Kubernetes features, as + such, it must be detailed enough so that readers can fully understand + the aspects of Kubernetes that are being tested without having to read + the test's code directly. Additionally, this test should provide a clear + distinction between the parts of the test that are there for the purpose + of validating Kubernetes rather than simply infrastructure logic that + is necessary to setup, or clean up, the test. + +### Sample Conformance Test + +The following snippet of code shows a sample conformance test's metadata: + +``` +/* + Release : v1.9 + Testname: Kubelet: log output + Description: By default the stdout and stderr from the process being + executed in a pod MUST be sent to the pod's logs. +*/ +framework.ConformanceIt("it should print the output to logs", func() { + ... +}) +``` + +The corresponding portion of the Kubernetes Conformance Documentfor this test +would then look like this: + +> ## [Kubelet: log output](https://github.com/kubernetes/kubernetes/tree/release-1.9/test/e2e_node/kubelet_test.go#L47) +> +> Release : v1.9 +> +> By default the stdout and stderr from the process being executed in a pod MUST be sent to the pod's logs. + +### Reporting Conformance Test Results + +Conformance test results, by provider and releases, can be viewed in the +[testgrid conformance dashboard]. If you wish to contribute test results +for your provider, please see the [testgrid conformance README] + +[kubernetes versioning policy]: /contributors/design-proposals/release/versioning.md#supported-releases-and-component-skew +[Conformance Test Review board]: https://github.com/kubernetes-sigs/architecture-tracking/projects/1 +[conformance test requirements]: #conformance-test-requirements +[conformance test metadata]: #conformance-test-metadata +[conformance test version skew policy]: #conformance-test-version-skew-policy +[testgrid conformance dashboard]: https://testgrid.k8s.io/conformance-all +[testgrid conformance README]: https://github.com/kubernetes/test-infra/blob/master/testgrid/conformance/README.md +[v1.9 conformance doc]: https://github.com/cncf/k8s-conformance/blob/master/docs/KubeConformance-1.9.md diff --git a/contributors/devel/sig-architecture/godep.md b/contributors/devel/sig-architecture/godep.md new file mode 100644 index 00000000..fc748b10 --- /dev/null +++ b/contributors/devel/sig-architecture/godep.md @@ -0,0 +1,261 @@ +# Using godep to manage dependencies + +This document is intended to show a way for managing `vendor/` tree dependencies +in Kubernetes. If you do not need to manage vendored dependencies, you probably +do not need to read this. + +## Background + +As a tool, `godep` leaves much to be desired. It builds on `go get`, and adds +the ability to pin dependencies to exact git version. The `go get` tool itself +doesn't have any concept of versions, and tends to blow up if it finds a git +repo synced to anything but `master`, but that is exactly the state that +`godep` leaves repos. This is a recipe for frustration when people try to use +the tools. + +This doc will focus on predictability and reproducibility. + +## Justifications for an update + +Before you update a dependency, take a moment to consider why it should be +updated. Valid reasons include: + 1. We need new functionality that is in a later version. + 2. New or improved APIs in the dependency significantly improve Kubernetes code. + 3. Bugs were fixed that impact Kubernetes. + 4. Security issues were fixed even if they don't impact Kubernetes yet. + 5. Performance, scale, or efficiency was meaningfully improved. + 6. We need dependency A and there is a transitive dependency B. + 7. Kubernetes has an older level of a dependency that is precluding being able +to work with other projects in the ecosystem. + +## Theory of operation + +The `go` toolchain assumes a global workspace that hosts all of your Go code. + +The `godep` tool operates by first "restoring" dependencies into your `$GOPATH`. +This reads the `Godeps.json` file, downloads all of the dependencies from the +internet, and syncs them to the specified revisions. You can then make +changes - sync to different revisions or edit Kubernetes code to use new +dependencies (and satisfy them with `go get`). When ready, you tell `godep` to +"save" everything, which it does by walking the Kubernetes code, finding all +required dependencies, copying them from `$GOPATH` into the `vendor/` directory, +and rewriting `Godeps.json`. + +This does not work well, when combined with a global Go workspace. Instead, we +will set up a private workspace for this process. + +The Kubernetes build process uses this same technique, and offers a tool called +`run-in-gopath.sh` which sets up and switches to a local, private workspace, +including setting up `$GOPATH` and `$PATH`. If you wrap commands with this +tool, they will use the private workspace, which will not conflict with other +projects and is easily cleaned up and recreated. + +To see this in action, you can run an interactive shell in this environment: + +```sh +# Run a shell, but don't run your own shell initializations. +hack/run-in-gopath.sh bash --norc --noprofile +``` + +## Restoring deps + +To extract and download dependencies into `$GOPATH` we provide a script: +`hack/godep-restore.sh`. If you run this tool, it will restore into your own +`$GOPATH`. If you wrap it in `run-in-gopath.sh` it will restore into your +`_output/` directory. + +```sh +hack/run-in-gopath.sh hack/godep-restore.sh +``` + +This script will try to optimize what it needs to download, and if it seems the +dependencies are all present already, it will return very quickly. + +If there's every any doubt about the correctness of your dependencies, you can +simply `make clean` or `rm -rf _output`, and run it again. + +Now you should have a clean copy of all of the Kubernetes dependencies. + +Downloading dependencies might take a while, so if you want to see progress +information use the `-v` flag: + +```sh +hack/run-in-gopath.sh hack/godep-restore.sh -v +``` + +## Making changes + +The most common things people need to do with deps are add and update them. +These are similar but different. + +### Adding a dep + +For the sake of examples, consider that we have discovered a wonderful Go +library at `example.com/go/frob`. The first thing you need to do is get that +code into your workspace: + +```sh +hack/run-in-gopath.sh go get -d example.com/go/frob +``` + +This will fetch, but not compile (omit the `-d` if you want to compile it now), +the library into your private `$GOPATH`. It will pull whatever the default +revision of that library is, typically the `master` branch for git repositories. +If this is not the revision you need, you can change it, for example to +`v1.0.0`: + +```sh +hack/run-in-gopath.sh bash -c 'git -C $GOPATH/src/example.com/go/frob checkout v1.0.0' +``` + +Now that the code is present, you can start to use it in Kubernetes code. +Because it is in your private workspace's `$GOPATH`, it might not be part of +your own `$GOPATH`, so tools like `goimports` might not find it. This is an +unfortunate side-effect of this process. You can either add the whole private +workspace to your own `$GOPATH` or you can `go get` the library into your own +`$GOPATH` until it is properly vendored into Kubernetes. + +Another possible complication is a dep that uses `gopdep` itself. In that case, +you need to restore its dependencies, too: + +```sh +hack/run-in-gopath.sh bash -c 'cd $GOPATH/src/example.com/go/frob && godep restore' +``` + +If the transitive deps collide with Kubernetes deps, you may have to manually +resolve things. This is where the ability to run a shell in this environment +comes in handy: + +```sh +hack/run-in-gopath.sh bash --norc --noprofile +``` + +### Updating a dep + +Sometimes we already have a dep, but the version of it is wrong. Because of the +way that `godep` and `go get` interact (badly) it's generally easiest to hit it +with a big hammer: + +```sh +hack/run-in-gopath.sh bash -c 'rm -rf $GOPATH/src/example.com/go/frob' +hack/run-in-gopath.sh go get -d example.com/go/frob +hack/run-in-gopath.sh bash -c 'git -C $GOPATH/src/example.com/go/frob checkout v2.0.0' +``` + +This will remove the code, re-fetch it, and sync to your desired version. + +### Removing a dep + +This happens almost for free. If you edit Kubernetes code and remove the last +use of a given dependency, you only need to restore and save the deps, and the +`godep` tool will figure out that you don't need that dep any more: + +## Saving deps + +Now that you have made your changes - adding, updating, or removing the use of a +dep - you need to rebuild the dependency database and make changes to the +`vendor/` directory. + +```sh +hack/run-in-gopath.sh hack/godep-save.sh +``` + +This will run through all of the primary targets for the Kubernetes project, +calculate which deps are needed, and rebuild the database. It will also +regenerate other metadata files which the project needs, such as BUILD files and +the LICENSE database. + +Commit the changes before updating deps in staging repos. + +## Saving deps in staging repos + +Kubernetes stores some code in a directory called `staging` which is handled +specially, and is not covered by the above. If you modified any code under +staging, or if you changed a dependency of code under staging (even +transitively), you'll also need to update deps there: + +```sh +./hack/update-staging-godeps.sh +``` + +Then commit the changes generated by the above script. + +## Commit messages + +Terse messages like "Update foo.org/bar to 0.42" are problematic +for maintainability. Please include in your commit message the +detailed reason why the dependencies were modified. + +Too commonly dependency changes have a ripple effect where something +else breaks unexpectedly. The first instinct during issue triage +is to revert a change. If the change was made to fix some other +issue and that issue was not documented, then a revert simply +continues the ripple by fixing one issue and reintroducing another +which then needs refixed. This can needlessly span multiple days +as CI results bubble in and subsequent patches fix and refix and +rerefix issues. This may be avoided if the original modifications +recorded artifacts of the change rationale. + +## Sanity checking + +After all of this is done, `git status` should show you what files have been +modified and added/removed. Make sure to sanity-check them with `git diff`, and +to `git add` and `git rm` them, as needed. It is commonly advised to make one +`git commit` which includes just the dependencies and Godeps files, and +another `git commit` that includes changes to Kubernetes code to use (or stop +using) the new/updated/removed dependency. These commits can go into a single +pull request. + +Before sending your PR, it's a good idea to sanity check that your +Godeps.json file and the contents of `vendor/ `are ok: + +```sh +hack/run-in-gopath.sh hack/verify-godeps.sh +``` + +All this script will do is a restore, followed by a save, and then look for +changes. If you followed the above instructions, it should be clean. If it is +not, you get to figure out why. + +## Manual updates + +It is sometimes expedient to manually fix the `Godeps.json` file to +minimize the changes. However, without great care this can lead to failures +with the verifier scripts. The kubernetes codebase does "interesting things" +with symlinks between `vendor/` and `staging/` to allow multiple Go import +paths to coexist in the same git repo. + +The verifiers, including `hack/verify-godeps.sh` *must* pass for every pull +request. + +## Reviewing and approving dependency changes + +Particular attention to detail should be exercised when reviewing and approving +PRs that add/remove/update dependencies. Importing a new dependency should bring +a certain degree of value as there is a maintenance overhead for maintaining +dependencies into the future. + +When importing a new dependency, be sure to keep an eye out for the following: +- Is the dependency maintained? +- Does the dependency bring value to the project? Could this be done without + adding a new dependency? +- Is the target dependency the original source, or a fork? +- Is there already a dependency in the project that does something similar? +- Does the dependency have a license that is compatible with the Kubernetes + project? + +Additionally: +- Look at the godeps file. Check that the only changes are what the PR claims + them to be. +- Check if there is a tagged release we can vendor instead of a random hash +- Scan the imported code for things like init() functions +- Look at the Kubernetes code changes and make sure they are appropriate + (e.g. renaming imports or similar). You do not need to do feature code review. +- If this is all good, approve, but don't LGTM, unless you also do code review + or unless it is trivial (e.g. moving from k/k/pkg/utils -> k/utils). + +All new dependency licenses should be reviewed by either Tim Hockin (@thockin) +or the Steering Committee (@kubernetes/steering-committee) to ensure that they +are compatible with the Kubernetes project license. It is also important to note +and flag if a license has changed when updating a dependency, so that these can +also be reviewed. diff --git a/contributors/devel/sig-architecture/staging.md b/contributors/devel/sig-architecture/staging.md new file mode 100644 index 00000000..79ae762f --- /dev/null +++ b/contributors/devel/sig-architecture/staging.md @@ -0,0 +1,34 @@ +# Staging Directory and Publishing + +The [staging/ directory](https://git.k8s.io/kubernetes/staging) of Kubernetes contains a number of pseudo repositories ("staging repos"). They are symlinked into Kubernetes' [vendor/ directory](https://git.k8s.io/kubernetes/vendor/k8s.io) for Golang to pick them up. + +We publish the staging repos using the [publishing bot](https://git.k8s.io/publishing-bot). It uses `git filter-branch` essentially to [cut the staging directories into separate git trees](https://de.slideshare.net/sttts/cutting-the-kubernetes-monorepo-in-pieces-never-learnt-more-about-git) and pushing the new commits to the corresponding real repositories in the [kubernetes organization on Github](https://github.com/kubernetes). + +The list of staging repositories and their published branches are listed in [publisher.go inside of the bot](https://git.k8s.io/publishing-bot/cmd/publishing-bot/publisher.go). Though it is planned to move this out into the k8s.io/kubernetes repository. + +At the time of this writing, this includes the branches + +- master, +- release-1.8 / release-5.0, +- and release-1.9 / release-6.0 + +of the following staging repos in the k8s.io org: + +- api +- apiextensions-apiserver +- apimachinery +- apiserver +- client-go +- code-generator +- kube-aggregator +- metrics +- sample-apiserver +- sample-controller + +Kubernetes tags (e.g., v1.9.1-beta1) are also applied automatically to the published repositories, prefixed with kubernetes- (e.g., kubernetes-1.9.1-beta1). The client-go semver tags (on client-go only!) including release-notes are still done manually. + +The semver tags are still the (well tested) official releases. The kubernetes-1.x.y tags have limited test coverage (we have some automatic tests in place in the bot), but can be used by early adopters of client-go and the other libraries. Moreover, they help to vendor the correct version of k8s.io/api and k8s.io/apimachinery. + +If further repos under staging are need, adding them to the bot is easy. Contact one of the [owners of the bot](https://git.k8s.io/publishing-bot/OWNERS). + +Currently, the bot is hosted on the CI cluster of Redhat's OpenShift (ready to be moved out to a public CNCF cluster if we have that in the future). diff --git a/contributors/devel/sig-cli/kubectl-conventions.md b/contributors/devel/sig-cli/kubectl-conventions.md new file mode 100644 index 00000000..5b009657 --- /dev/null +++ b/contributors/devel/sig-cli/kubectl-conventions.md @@ -0,0 +1,458 @@ +# Kubectl Conventions + +Updated: 3/23/2017 + +**Table of Contents** + +- [Kubectl Conventions](#kubectl-conventions) + - [Principles](#principles) + - [Command conventions](#command-conventions) + - [Create commands](#create-commands) + - [Rules for extending special resource alias - "all"](#rules-for-extending-special-resource-alias---all) + - [Flag conventions](#flag-conventions) + - [Output conventions](#output-conventions) + - [Documentation conventions](#documentation-conventions) + - [kubectl Factory conventions](#kubectl-Factory-conventions) + - [Command implementation conventions](#command-implementation-conventions) + - [Exit code conventions](#exit-code-conventions) + - [Generators](#generators) + + +## Principles + +* Strive for consistency across commands + +* Explicit should always override implicit + + * Environment variables should override default values + + * Command-line flags should override default values and environment variables + + * `--namespace` should also override the value specified in a specified +resource + +* Most kubectl commands should be able to operate in bulk on resources, of mixed types. + +* Kubectl should not make any decisions based on its nor the server's release version string. Instead, API + discovery and/or OpenAPI should be used to determine available features. + +* We currently only guarantee one release of version skew is supported, but we strive to make old releases of kubectl + continue to work with newer servers in compliance with our API compatibility guarantees. This means, for instance, that + kubectl should not fully parse objects returned by the server into full Go types and then re-encode them, since that + would drop newly added fields. ([#3955](https://github.com/kubernetes/kubernetes/issues/3955)) + +* General-purpose kubectl commands (e.g., get, delete, create -f, replace, patch, apply) should work for all resource types, + even those not present when that release of kubectl was built, such as APIs added in newer releases, aggregated APIs, + and third-party resources. + +* While functionality may be added to kubectl out of expedience, commonly needed functionality should be provided by + the server to make it easily accessible to all API clients. ([#12143](https://github.com/kubernetes/kubernetes/issues/12143)) + +* Remaining non-trivial functionality remaining in kubectl should be made available to other clients via libraries + ([#7311](https://github.com/kubernetes/kubernetes/issues/7311)) + +## Command conventions + +* Command names are all lowercase, and hyphenated if multiple words. + +* kubectl VERB NOUNs for commands that apply to multiple resource types. + +* Command itself should not have built-in aliases. + +* NOUNs may be specified as `TYPE name1 name2` or `TYPE/name1 TYPE/name2` or +`TYPE1,TYPE2,TYPE3/name1`; TYPE is omitted when only a single type is expected. + +* Resource types are all lowercase, with no hyphens; both singular and plural +forms are accepted. + +* NOUNs may also be specified by one or more file arguments: `-f file1 -f file2 +...` + +* Resource types may have 2- or 3-letter aliases. + +* Business logic should be decoupled from the command framework, so that it can +be reused independently of kubectl, cobra, etc. + * Ideally, commonly needed functionality would be implemented server-side in +order to avoid problems typical of "fat" clients and to make it readily +available to non-Go clients. + +* Commands that generate resources, such as `run` or `expose`, should obey +specific conventions, see [generators](#generators). + +* A command group (e.g., `kubectl config`) may be used to group related +non-standard commands, such as custom generators, mutations, and computations. + + +### Create commands + +`kubectl create <resource>` commands fill the gap between "I want to try +Kubernetes, but I don't know or care what gets created" (`kubectl run`) and "I +want to create exactly this" (author yaml and run `kubectl create -f`). They +provide an easy way to create a valid object without having to know the vagaries +of particular kinds, nested fields, and object key typos that are ignored by the +yaml/json parser. Because editing an already created object is easier than +authoring one from scratch, these commands only need to have enough parameters +to create a valid object and set common immutable fields. It should default as +much as is reasonably possible. Once that valid object is created, it can be +further manipulated using `kubectl edit` or the eventual `kubectl set` commands. + +`kubectl create <resource> <special-case>` commands help in cases where you need +to perform non-trivial configuration generation/transformation tailored for a +common use case. `kubectl create secret` is a good example, there's a `generic` +flavor with keys mapping to files, then there's a `docker-registry` flavor that +is tailored for creating an image pull secret, and there's a `tls` flavor for +creating tls secrets. You create these as separate commands to get distinct +flags and separate help that is tailored for the particular usage. + + +### Rules for extending special resource alias - "all" + +Here are the rules to add a new resource to the `kubectl get all` output. + +* No cluster scoped resources + +* No namespace admin level resources (limits, quota, policy, authorization +rules) + +* No resources that are potentially unrecoverable (secrets and pvc) + +* Resources that are considered "similar" to #3 should be grouped +the same (configmaps) + + +## Flag conventions + +* Flags are all lowercase, with words separated by hyphens + +* Flag names and single-character aliases should have the same meaning across +all commands + +* Flag descriptions should start with an uppercase letter and not have a +period at the end of a sentence + +* Command-line flags corresponding to API fields should accept API enums +exactly (e.g., `--restart=Always`) + +* Do not reuse flags for different semantic purposes, and do not use different +flag names for the same semantic purpose -- grep for `"Flags()"` before adding a +new flag + +* Use short flags sparingly, only for the most frequently used options, prefer +lowercase over uppercase for the most common cases, try to stick to well known +conventions for UNIX commands and/or Docker, where they exist, and update this +list when adding new short flags + + * `-f`: Resource file + * also used for `--follow` in `logs`, but should be deprecated in favor of `-F` + * `-n`: Namespace scope + * `-l`: Label selector + * also used for `--labels` in `expose`, but should be deprecated + * `-L`: Label columns + * `-c`: Container + * also used for `--client` in `version`, but should be deprecated + * `-i`: Attach stdin + * `-t`: Allocate TTY + * `-w`: Watch (currently also used for `--www` in `proxy`, but should be deprecated) + * `-p`: Previous + * also used for `--pod` in `exec`, but deprecated + * also used for `--patch` in `patch`, but should be deprecated + * also used for `--port` in `proxy`, but should be deprecated + * `-P`: Static file prefix in `proxy`, but should be deprecated + * `-r`: Replicas + * `-u`: Unix socket + * `-v`: Verbose logging level + + +* `--dry-run`: Don't modify the live state; simulate the mutation and display +the output. All mutations should support it. + +* `--local`: Don't contact the server; just do local read, transformation, +generation, etc., and display the output + +* `--output-version=...`: Convert the output to a different API group/version + +* `--short`: Output a compact summary of normal output; the format is subject +to change and is optimized for reading not parsing. + +* `--validate`: Validate the resource schema + +## Output conventions + +* By default, output is intended for humans rather than programs + * However, affordances are made for simple parsing of `get` output + +* Only errors should be directed to stderr + +* `get` commands should output one row per resource, and one resource per row + + * Column titles and values should not contain spaces in order to facilitate +commands that break lines into fields: cut, awk, etc. Instead, use `-` as the +word separator. + + * By default, `get` output should fit within about 80 columns + + * Eventually we could perhaps auto-detect width + * `-o wide` may be used to display additional columns + + + * The first column should be the resource name, titled `NAME` (may change this +to an abbreviation of resource type) + + * NAMESPACE should be displayed as the first column when --all-namespaces is +specified + + * The last default column should be time since creation, titled `AGE` + + * `-Lkey` should append a column containing the value of label with key `key`, +with `<none>` if not present + + * json, yaml, Go template, and jsonpath template formats should be supported +and encouraged for subsequent processing + + * Users should use --api-version or --output-version to ensure the output +uses the version they expect + + +* `describe` commands may output on multiple lines and may include information +from related resources, such as events. Describe should add additional +information from related resources that a normal user may need to know - if a +user would always run "describe resource1" and the immediately want to run a +"get type2" or "describe resource2", consider including that info. Examples, +persistent volume claims for pods that reference claims, events for most +resources, nodes and the pods scheduled on them. When fetching related +resources, a targeted field selector should be used in favor of client side +filtering of related resources. + +* For fields that can be explicitly unset (booleans, integers, structs), the +output should say `<unset>`. Likewise, for arrays `<none>` should be used; for +external IP, `<nodes>` should be used; for load balancer, `<pending>` should be +used. Lastly `<unknown>` should be used where unrecognized field type was +specified. + +* Mutations should output TYPE/name verbed by default, where TYPE is singular; +`-o name` may be used to just display TYPE/name, which may be used to specify +resources in other commands + +## Documentation conventions + +* Commands are documented using Cobra; docs are then auto-generated by +`hack/update-generated-docs.sh`. + + * Use should contain a short usage string for the most common use case(s), not +an exhaustive specification + + * Short should contain a one-line explanation of what the command does + * Short descriptions should start with an uppercase case letter and not + have a period at the end of a sentence + * Short descriptions should (if possible) start with a first person + (singular present tense) verb + + * Long may contain multiple lines, including additional information about +input, output, commonly used flags, etc. + * Long descriptions should use proper grammar, start with an uppercase + letter and have a period at the end of a sentence + + + * Example should contain examples + * A comment should precede each example command. Comment should start with + an uppercase letter + * Command examples should not include a `$` prefix + +* Use "FILENAME" for filenames + +* Use "TYPE" for the particular flavor of resource type accepted by kubectl, +rather than "RESOURCE" or "KIND" + +* Use "NAME" for resource names + +## kubectl Factory conventions + +The kubectl `Factory` is a large interface which is used to provide access to clients, +polymorphic inspection, and polymorphic mutation. The `Factory` is layered in +"rings" in which one ring may reference inner rings, but not peers or outer rings. +This is done to allow composition by extenders. + +In order for composers to be able to provide alternative factory implementations +they need to provide low level pieces of *certain* functions so that when the factory +calls back into itself it uses the custom version of the function. Rather than try +to enumerate everything that someone would want to override we split the factory into +rings, where each ring can depend on methods an earlier ring, but cannot depend upon +peer methods in its own ring. + + +## Command implementation conventions + +For every command there should be a `NewCmd<CommandName>` function that creates +the command and returns a pointer to a `cobra.Command`, which can later be added +to other parent commands to compose the structure tree. There should also be a +`<CommandName>Options` struct with a variable to every flag and argument +declared by the command (and any other variable required for the command to +run). This makes tests and mocking easier. The struct ideally exposes three +methods: + +* `Complete`: Completes the struct fields with values that may or may not be +directly provided by the user, for example, by flags pointers, by the `args` +slice, by using the Factory, etc. + +* `Validate`: performs validation on the struct fields and returns appropriate +errors. + +* `Run<CommandName>`: runs the actual logic of the command, taking as assumption +that the struct is complete with all required values to run, and they are valid. + +Sample command skeleton: + +```go +// MineRecommendedName is the recommended command name for kubectl mine. +const MineRecommendedName = "mine" + +// Long command description and examples. +var ( + mineLong = templates.LongDesc(` + mine which is described here + with lots of details.`) + + mineExample = templates.Examples(` + # Run my command's first action + kubectl mine first_action + + # Run my command's second action on latest stuff + kubectl mine second_action --flag`) +) + +// MineOptions contains all the options for running the mine cli command. +type MineOptions struct { + mineLatest bool +} + +// NewCmdMine implements the kubectl mine command. +func NewCmdMine(parent, name string, f *cmdutil.Factory, out io.Writer) *cobra.Command { + opts := &MineOptions{} + + cmd := &cobra.Command{ + Use: fmt.Sprintf("%s [--latest]", name), + Short: "Run my command", + Long: mineLong, + Example: fmt.Sprintf(mineExample, parent+" "+name), + Run: func(cmd *cobra.Command, args []string) { + if err := opts.Complete(f, cmd, args, out); err != nil { + cmdutil.CheckErr(err) + } + if err := opts.Validate(); err != nil { + cmdutil.CheckErr(cmdutil.UsageError(cmd, err.Error())) + } + if err := opts.RunMine(); err != nil { + cmdutil.CheckErr(err) + } + }, + } + + cmd.Flags().BoolVar(&options.mineLatest, "latest", false, "Use latest stuff") + return cmd +} + +// Complete completes all the required options for mine. +func (o *MineOptions) Complete(f *cmdutil.Factory, cmd *cobra.Command, args []string, out io.Writer) error { + return nil +} + +// Validate validates all the required options for mine. +func (o MineOptions) Validate() error { + return nil +} + +// RunMine implements all the necessary functionality for mine. +func (o MineOptions) RunMine() error { + return nil +} +``` + +The `Run<CommandName>` method should contain the business logic of the command +and as noted in [command conventions](#command-conventions), ideally that logic +should exist server-side so any client could take advantage of it. Notice that +this is not a mandatory structure and not every command is implemented this way, +but this is a nice convention so try to be compliant with it. As an example, +have a look at how [kubectl logs](https://git.k8s.io/kubernetes/pkg/kubectl/cmd/logs.go) is implemented. + +## Exit code conventions + +Generally, for all the command exit code, result of `zero` means success and `non-zero` means errors. + +For idempotent ("make-it-so") commands, we should return `zero` when success even if no changes were provided, user can request treating "make-it-so" as "already-so" via flag `--error-unchanged` to make it return `non-zero` exit code. + +For non-idempotent ("already-so") commands, we should return `non-zero` by default, user can request treating "already-so" as "make-it-so" via flag `--ignore-unchanged` to make it return `zero` exit code. + + +| Exit Code Number | Meaning | Enable | +| :--- | :--- | :--- | +| 0 | Command exited success | By default, By flag `--ignore-unchanged` | +| 1 | Command exited for general errors | By default | +| 3 | Command was successful, but the user requested a distinct exit code when no change was made | By flag `--error-unchanged`| + +## Generators + +Generators are kubectl commands that generate resources based on a set of inputs +(other resources, flags, or a combination of both). + +The point of generators is: + +* to enable users using kubectl in a scripted fashion to pin to a particular +behavior which may change in the future. Explicit use of a generator will always +guarantee that the expected behavior stays the same. + +* to enable potential expansion of the generated resources for scenarios other +than just creation, similar to how -f is supported for most general-purpose +commands. + +Generator commands should obey the following conventions: + +* A `--generator` flag should be defined. Users then can choose between +different generators, if the command supports them (for example, `kubectl run` +currently supports generators for pods, jobs, replication controllers, and +deployments), or between different versions of a generator so that users +depending on a specific behavior may pin to that version (for example, `kubectl +expose` currently supports two different versions of a service generator). + +* Generation should be decoupled from creation. A generator should implement the +`kubectl.StructuredGenerator` interface and have no dependencies on cobra or the +Factory. See, for example, how the first version of the namespace generator is +defined: + +```go +// NamespaceGeneratorV1 supports stable generation of a namespace +type NamespaceGeneratorV1 struct { + // Name of namespace + Name string +} + +// Ensure it supports the generator pattern that uses parameters specified during construction +var _ StructuredGenerator = &NamespaceGeneratorV1{} + +// StructuredGenerate outputs a namespace object using the configured fields +func (g *NamespaceGeneratorV1) StructuredGenerate() (runtime.Object, error) { + if err := g.validate(); err != nil { + return nil, err + } + namespace := &api.Namespace{} + namespace.Name = g.Name + return namespace, nil +} + +// validate validates required fields are set to support structured generation +func (g *NamespaceGeneratorV1) validate() error { + if len(g.Name) == 0 { + return fmt.Errorf("name must be specified") + } + return nil +} +``` + +The generator struct (`NamespaceGeneratorV1`) holds the necessary fields for +namespace generation. It also satisfies the `kubectl.StructuredGenerator` +interface by implementing the `StructuredGenerate() (runtime.Object, error)` +method which configures the generated namespace that callers of the generator +(`kubectl create namespace` in our case) need to create. + +* `--dry-run` should output the resource that would be created, without +creating it. + diff --git a/contributors/devel/sig-instrumentation/OWNERS b/contributors/devel/sig-instrumentation/OWNERS new file mode 100644 index 00000000..3e1efb0c --- /dev/null +++ b/contributors/devel/sig-instrumentation/OWNERS @@ -0,0 +1,8 @@ +# See the OWNERS docs at https://go.k8s.io/owners + +reviewers: + - sig-instrumentation-leads +approvers: + - sig-instrumentation-leads +labels: + - sig/instrumentation diff --git a/contributors/devel/sig-instrumentation/event-style-guide.md b/contributors/devel/sig-instrumentation/event-style-guide.md new file mode 100644 index 00000000..bc4ba22b --- /dev/null +++ b/contributors/devel/sig-instrumentation/event-style-guide.md @@ -0,0 +1,51 @@ +# Event style guide + +Status: During Review + +Author: Marek Grabowski (gmarek@) + +## Why the guide? + +The Event API change proposal is the first step towards having useful Events in the system. Another step is to formalize the Event style guide, i.e. set of properties that developers need to ensure when adding new Events to the system. This is necessary to ensure that we have a system in which all components emit consistently structured Events. + +## When to emit an Event? + +Events are expected to provide important insights for the application developer/operator on the state of their application. Events relevant to cluster administrators are acceptable, as well, though they usually also have the option of looking at component logs. Events are much more expensive than logs, thus they're not expected to provide in-depth system debugging information. Instead concentrate on things that are important from the application developer's perspective. Events need to be either actionable, or be useful to understand past or future system's behavior. Events are not intended to drive automation. Watching resource status should be sufficient for controllers. + +Following are the guidelines for adding Events to the system. Those are not hard-and-fast rules, but should be considered by all contributors adding new Events and members doing reviews. +1. Emit events only when state of the system changes/attempts to change. Events "it's still running" are not interesting. Also, changes that do not add information beyond what is observable by watching the altered resources should not be duplicated as events. Note that adding a reason for some action that can't be inferred from the state change is considered additional information. +1. Limit Events to no more than one per change/attempt. There's no need for Events on "About to do X" AND "Did X"/"Failed to do X". Result is more interesting and implies an attempt. + 1. It may give impression that this gets tricky with scale events, e.g. Deployment scales ReplicaSet which creates/deletes Pods. For us those are 3 (or more) separate Events (3 different objects are affected) so it's fine to emit multiple Events. +1. When an error occurs that prevents a user application from starting or from enacting other normal system behavior, such as object creation, an Event should be emitted (e.g. invalid image). + 1. Note that Events are garbage collected so every user-actionable error needs to be surfaced via resource status as well. + 1. It's usually OK to emit failure Events for each failure. Dedup mechanism will deal with that. The exception is failures that are frequent but typically ephemeral and automatically repairable/recoverable, such as broken socket connections, in which case they should only be reported if persistent and unrepairable, in order to mitigate event spam. +1. When a user application stops running for any reason, an Event should be emitted (e.g. Pod evicted because Node is under memory pressure) +1. If it's a system-wide change of state that may impact currently running applications or have an may have severe impact on future workload schedulability, an Event should be emitted (e.g. Node became unreachable, 1. Failed to create route for Node). +1. If it doesn't fit any of above scenarios you should consider not emitting Event. + +## How to structure an Event? +New Event API tries to use more descriptive field names to influence how Events are structured. Event has following fields: +* Regarding +* Related +* ReportingController +* ReportingInstance +* Action +* Reason +* Type +* Note + +The Event should be structured in a way that following sentence "makes sense": +"Regarding <Event.Regarding>: <Event.Action> <Event.Related> - <Event.Reason>", e.g. +* Regarding Node X: BecameNotReady - NodeUnreachable +* Regarding Pod X: ScheduledOnNode Node Y - <nil> +* Regarding PVC X: BoundToNode Node Y - <nil> +* Regarding Pod X: KilledContainer Container Y - NodeMemoryPressure + +1. ReportingController is a type of a Controller reporting an Event, e.g. k8s.io/node-controller, k8s.io/kubelet. There will be a standard list for controller names for Kubernetes components. Third-party components must namespace themselves in the same manner as label keys. Validation ensures it's a proper qualified name. This shouldn’t be needed in order for users to understand the event, but is provided in case the controller’s logs need to be accessed for further debugging. +1. ReportingInstance is an identifier of the instance of the ReportingController which needs to uniquely identify it. I.e. host name can be used only for controllers that are guaranteed to be unique on the host. This requirement isn't met e.g. for scheduler, so it may need a secondary index. For singleton controllers use Node name (or hostname if controller is not running on the Node). Can have at most 128 alpha-numeric characters. +1. Regarding and Related are ObjectReferences. Regarding should represent the object that's implemented by the ReportingController, Related can contain additional information about another object that takes part in or is affected by the Action (see examples). +1. Action is a low-cardinality (meaning that there's a restricted, predefined set of values allowed) CamelCase string field (i.e. its value has to be determined at compile time) that explains what happened with Regarding/what action did the ReportingController take in Regarding's name. The tuple of {ReportingController, Action, Reason} must be unique, such that a user could look up documentation. Can have at most 128 characters. +1. Reason is a low-cardinality CamelCase string field (i.e. its value has to be determined at compile time) that explains why ReportingController took Action. Can have at most 128 characters. +1. Type can be either "Normal" or "Warning". "Warning" types are reserved for Events that represent a situation that's not expected in a healthy cluster and/or healthy workload: something unexpected and/or undesirable, at least if it occurs frequently enough and/or for a long enough duration. +1. Note can contain an arbitrary, high-cardinality, user readable summary of the Event. This field can lose data if deduplication is triggered. Can have at most 1024 characters. + diff --git a/contributors/devel/sig-instrumentation/instrumentation.md b/contributors/devel/sig-instrumentation/instrumentation.md new file mode 100644 index 00000000..b0a11193 --- /dev/null +++ b/contributors/devel/sig-instrumentation/instrumentation.md @@ -0,0 +1,215 @@ +## Instrumenting Kubernetes + +The following references and outlines general guidelines for metric instrumentation +in Kubernetes components. Components are instrumented using the +[Prometheus Go client library](https://github.com/prometheus/client_golang). For non-Go +components. [Libraries in other languages](https://prometheus.io/docs/instrumenting/clientlibs/) +are available. + +The metrics are exposed via HTTP in the +[Prometheus metric format](https://prometheus.io/docs/instrumenting/exposition_formats/), +which is open and well-understood by a wide range of third party applications and vendors +outside of the Prometheus eco-system. + +The [general instrumentation advice](https://prometheus.io/docs/practices/instrumentation/) +from the Prometheus documentation applies. This document reiterates common pitfalls and some +Kubernetes specific considerations. + +Prometheus metrics are cheap as they have minimal internal memory state. Set and increment +operations are thread safe and take 10-25 nanoseconds (Go & Java). +Thus, instrumentation can and should cover all operationally relevant aspects of an application, +internal and external. + +## Quick Start + +The following describes the basic steps required to add a new metric (in Go). + +1. Import "github.com/prometheus/client_golang/prometheus". + +2. Create a top-level var to define the metric. For this, you have to: + + 1. Pick the type of metric. Use a Gauge for things you want to set to a +particular value, a Counter for things you want to increment, or a Histogram or +Summary for histograms/distributions of values (typically for latency). +Histograms are better if you're going to aggregate the values across jobs, while +summaries are better if you just want the job to give you a useful summary of +the values. + 2. Give the metric a name and description. + 3. Pick whether you want to distinguish different categories of things using +labels on the metric. If so, add "Vec" to the name of the type of metric you +want and add a slice of the label names to the definition. + + [Example](https://github.com/kubernetes/kubernetes/blob/cd3299307d44665564e1a5c77d0daa0286603ff5/pkg/apiserver/apiserver.go#L53) + ```go + requestCounter = prometheus.NewCounterVec( + prometheus.CounterOpts{ + Name: "apiserver_request_count", + Help: "Counter of apiserver requests broken out for each verb, API resource, client, and HTTP response code.", + }, + []string{"verb", "resource", "client", "code"}, + ) + ``` + +3. Register the metric so that prometheus will know to export it. + + [Example](https://github.com/kubernetes/kubernetes/blob/cd3299307d44665564e1a5c77d0daa0286603ff5/pkg/apiserver/apiserver.go#L78) + ```go + func init() { + prometheus.MustRegister(requestCounter) + prometheus.MustRegister(requestLatencies) + prometheus.MustRegister(requestLatenciesSummary) + } + ``` + +4. Use the metric by calling the appropriate method for your metric type (Set, +Inc/Add, or Observe, respectively for Gauge, Counter, or Histogram/Summary), +first calling WithLabelValues if your metric has any labels + + [Example](https://github.com/kubernetes/kubernetes/blob/cd3299307d44665564e1a5c77d0daa0286603ff5/pkg/apiserver/apiserver.go#L87) + ```go + requestCounter.WithLabelValues(*verb, *resource, client, strconv.Itoa(*httpCode)).Inc() + ``` + + +## Instrumentation types + +Components have metrics capturing events and states that are inherent to their +application logic. Examples are request and error counters, request latency +histograms, or internal garbage collection cycles. Those metrics are instrumented +directly in the application code. + +Secondly, there are business logic metrics. Those are not about observed application +behavior but abstract system state, such as desired replicas for a deployment. +They are not directly instrumented but collected from otherwise exposed data. + +In Kubernetes they are generally captured in the [kube-state-metrics](https://github.com/kubernetes/kube-state-metrics) +component, which reads them from the API server. +For this types of metric exposition, the +[exporter guidelines](https://prometheus.io/docs/instrumenting/writing_exporters/) +apply additionally. + +## Naming + +Metrics added directly by application or package code should have a unique name. +This avoids collisions of metrics added via dependencies. They also clearly +distinguish metrics collected with different semantics. This is solved through +prefixes: + +``` +<component_name>_<metric> +``` + +For example, suppose the kubelet instrumented its HTTP requests but also uses +an HTTP router providing its own implementation. Both expose metrics on total +http requests. They should be distinguishable as in: + +``` +kubelet_http_requests_total{path=”/some/path”,status=”200”} +routerpkg_http_requests_total{path=”/some/path”,status=”200”,method=”GET”} +``` + +As we can see they expose different labels and thus a naming collision would +not have been possible to resolve even if both metrics counted the exact same +requests. + +Resource objects that occur in names should inherit the spelling that is used +in kubectl, i.e. daemon sets are `daemonset` rather than `daemon_set`. + +## Dimensionality & Cardinality + +Metrics can often replace more expensive logging as they are time-aggregated +over a sampling interval. The [multidimensional data model](https://prometheus.io/docs/concepts/data_model/) +enables deep insights and all metrics should use those label dimensions +where appropriate. + +A common error that often causes performance issues in the ingesting metric +system is considering dimensions that inhibit or eliminate time aggregation +by being too specific. Typically those are user IDs or error messages. +More generally: one should know a comprehensive list of all possible values +for a label at instrumentation time. + +Notable exceptions are exporters like kube-state-metrics, which expose per-pod +or per-deployment metrics, which are theoretically unbound over time as one could +constantly create new ones, with new names. However, they have +a reasonable upper bound for a given size of infrastructure they refer to and +its typical frequency of changes. + +In general, “external” labels like pod or node name do not belong in the +instrumentation itself. They are to be attached to metrics by the collecting +system that has the external knowledge ([blog post](https://www.robustperception.io/target-labels-are-for-life-not-just-for-christmas/)). + +## Normalization + +Metrics should be normalized with respect to their dimensions. They should +expose the minimal set of labels, each of which provides additional information. +Labels that are composed from values of different labels are not desirable. +For example: + +``` +example_metric{pod=”abc”,container=”proxy”,container_long=”abc/proxy”} +``` + +It often seems feasible to add additional meta information about an object +to all metrics about that object, e.g.: + +``` +kube_pod_container_restarts{namespace=...,pod=...,container=...} +``` + +A common use case is wanting to look at such metrics w.r.t to the node the +pod is scheduled on. So it seems convenient to add a “node” label. + +``` +kube_pod_container_restarts{namespace=...,pod=...,container=...,node=...} +``` + +This however only caters to one specific query use case. There are many more +pieces of metadata that could be added, effectively blowing up the instrumentation. +They are also not guaranteed to be stable over time. What if pods at some +point can be live migrated? +Those pieces of information should be normalized into an info-level metric +([blog post](https://www.robustperception.io/exposing-the-software-version-to-prometheus/)), +which is always set to 1. For example: + +``` +kube_pod_info{pod=...,namespace=...,pod_ip=...,host_ip=..,node=..., ...} +``` + +The metric system can later denormalize those along the identifying labels +“pod” and “namespace” labels. This leads to... + +## Resource Referencing + +It is often desirable to correlate different metrics about a common object, +such as a pod. Label dimensions can be used to match up different metrics. +This is most easy if label names and values are following a common pattern. +For metrics exposed by the same application, that often happens naturally. + +For a system composed of several independent, and also pluggable components, +it makes sense to set cross-component standards to allow easy querying in +metric systems without extensive post-processing of data. +In Kubernetes, those are the resource objects such as deployments, +pods, or services and the namespace they belong to. + +The following should be consistently used: + +``` +example_metric_ccc{pod=”example-app-5378923”, namespace=”default”} +``` + +An object is referenced by its unique name in a label named after the resource +itself (i.e. `pod`/`deployment`/... and not `pod_name`/`deployment_name`) +and the namespace it belongs to in the `namespace` label. + +Note: namespace/name combinations are only unique at a certain point in time. +For time series this is given by the timestamp associated with any data point. +UUIDs are truly unique but not convenient to use in user-facing time series +queries. +They can still be incorporated using an info level metric as described above for +`kube_pod_info`. A query to a metric system selecting by UUID via a the info level +metric could look as follows: + +``` +kube_pod_restarts and on(namespace, pod) kube_pod_info{uuid=”ABC”} +``` + diff --git a/contributors/devel/sig-instrumentation/logging.md b/contributors/devel/sig-instrumentation/logging.md new file mode 100644 index 00000000..c4da6829 --- /dev/null +++ b/contributors/devel/sig-instrumentation/logging.md @@ -0,0 +1,34 @@ +## Logging Conventions + +The following conventions for the klog levels to use. +[klog](http://godoc.org/github.com/kubernetes/klog) is globally preferred to +[log](http://golang.org/pkg/log/) for better runtime control. + +* klog.Errorf() - Always an error + +* klog.Warningf() - Something unexpected, but probably not an error + +* klog.Infof() has multiple levels: + * klog.V(0) - Generally useful for this to ALWAYS be visible to an operator + * Programmer errors + * Logging extra info about a panic + * CLI argument handling + * klog.V(1) - A reasonable default log level if you don't want verbosity. + * Information about config (listening on X, watching Y) + * Errors that repeat frequently that relate to conditions that can be corrected (pod detected as unhealthy) + * klog.V(2) - Useful steady state information about the service and important log messages that may correlate to significant changes in the system. This is the recommended default log level for most systems. + * Logging HTTP requests and their exit code + * System state changing (killing pod) + * Controller state change events (starting pods) + * Scheduler log messages + * klog.V(3) - Extended information about changes + * More info about system state changes + * klog.V(4) - Debug level verbosity + * Logging in particularly thorny parts of code where you may want to come back later and check it + * klog.V(5) - Trace level verbosity + * Context to understand the steps leading up to errors and warnings + * More information for troubleshooting reported issues + +As per the comments, the practical default level is V(2). Developers and QE +environments may wish to run at V(3) or V(4). If you wish to change the log +level, you can pass in `-v=X` where X is the desired maximum level to log. diff --git a/contributors/devel/sig-node/OWNERS b/contributors/devel/sig-node/OWNERS new file mode 100644 index 00000000..810bc689 --- /dev/null +++ b/contributors/devel/sig-node/OWNERS @@ -0,0 +1,8 @@ +# See the OWNERS docs at https://go.k8s.io/owners + +reviewers: + - sig-node-leads +approvers: + - sig-node-leads +labels: + - sig/node diff --git a/contributors/devel/sig-node/container-runtime-interface.md b/contributors/devel/sig-node/container-runtime-interface.md new file mode 100644 index 00000000..4d9757d9 --- /dev/null +++ b/contributors/devel/sig-node/container-runtime-interface.md @@ -0,0 +1,136 @@ +# CRI: the Container Runtime Interface + +## What is CRI? + +CRI (_Container Runtime Interface_) consists of a +[protobuf API](https://git.k8s.io/kubernetes/pkg/kubelet/apis/cri/runtime/v1alpha2/api.proto), +specifications/requirements (to-be-added), +and [libraries](https://git.k8s.io/kubernetes/pkg/kubelet/server/streaming) +for container runtimes to integrate with kubelet on a node. CRI is currently in Alpha. + +In the future, we plan to add more developer tools such as the CRI validation +tests. + +## Why develop CRI? + +Prior to the existence of CRI, container runtimes (e.g., `docker`, `rkt`) were +integrated with kubelet through implementing an internal, high-level interface +in kubelet. The entrance barrier for runtimes was high because the integration +required understanding the internals of kubelet and contributing to the main +Kubernetes repository. More importantly, this would not scale because every new +addition incurs a significant maintenance overhead in the main Kubernetes +repository. + +Kubernetes aims to be extensible. CRI is one small, yet important step to enable +pluggable container runtimes and build a healthier ecosystem. + +## How to use CRI? + +For Kubernetes 1.6+: + +1. Start the image and runtime services on your node. You can have a single + service acting as both image and runtime services. +2. Set the kubelet flags + - Pass the unix socket(s) to which your services listen to kubelet: + `--container-runtime-endpoint` and `--image-service-endpoint`. + - Use the "remote" runtime by `--container-runtime=remote`. + +CRI is still young and we are actively incorporating feedback from developers +to improve the API. Although we strive to maintain backward compatibility, +developers should expect occasional API breaking changes. + +*For Kubernetes 1.5, additional flags are required:* + - Set apiserver flag `--feature-gates=StreamingProxyRedirects=true`. + - Set kubelet flag `--experimental-cri=true`. + +## Does Kubelet use CRI today? + +Yes, Kubelet always uses CRI except for using the rktnetes integration. + +The old, pre-CRI Docker integration was removed in 1.7. + +## Specifications, design documents and proposals + +The Kubernetes 1.5 [blog post on CRI](https://kubernetes.io/blog/2016/12/container-runtime-interface-cri-in-kubernetes/) +serves as a general introduction. + + +Below is a mixed list of CRI specifications/requirements, design docs and +proposals. We are working on adding more documentation for the API. + + - [Original proposal](https://github.com/kubernetes/kubernetes/blob/release-1.5/docs/proposals/container-runtime-interface-v1.md) + - [Networking](kubelet-cri-networking.md) + - [Container metrics](cri-container-stats.md) + - [Exec/attach/port-forward streaming requests](https://docs.google.com/document/d/1OE_QoInPlVCK9rMAx9aybRmgFiVjHpJCHI9LrfdNM_s/edit?usp=sharing) + - [Container stdout/stderr logs](https://github.com/kubernetes/kubernetes/blob/release-1.5/docs/proposals/kubelet-cri-logging.md) + +## Work-In-Progress CRI runtimes + + - [cri-o](https://github.com/kubernetes-incubator/cri-o) + - [rktlet](https://github.com/kubernetes-incubator/rktlet) + - [frakti](https://github.com/kubernetes/frakti) + - [cri-containerd](https://github.com/kubernetes-incubator/cri-containerd) + +## [Status update](#status-update) +### Kubernetes v1.7 release (Docker-CRI integration GA, container metrics API) + - The Docker CRI integration has been promoted to GA. + - The legacy, non-CRI Docker integration has been completely removed from + Kubelet. The deprecated `--enable-cri` flag has been removed. + - CRI has been extended to support collecting container metrics from the + runtime. + +### Kubernetes v1.6 release (Docker-CRI integration Beta) + **The Docker CRI integration has been promoted to Beta, and been enabled by +default in Kubelet**. + - **Upgrade**: It is recommended to drain your node before upgrading the + Kubelet. If you choose to perform in-place upgrade, the Kubelet will + restart all Kubernetes-managed containers on the node. + - **Resource usage and performance**: There is no performance regression + in our measurement. The memory usage of Kubelet increases slightly + (~0.27MB per pod) due to the additional gRPC serialization for CRI. + - **Disable**: To disable the Docker CRI integration and fall back to the + old implementation, set `--enable-cri=false`. Note that the old + implementation has been *deprecated* and is scheduled to be removed in + the next release. You are encouraged to migrate to CRI as early as + possible. + - **Others**: The Docker container naming/labeling scheme has changed + significantly in 1.6. This is perceived as implementation detail and + should not be relied upon by any external tools or scripts. + +### Kubernetes v1.5 release (CRI v1alpha1) + + - [v1alpha1 version](https://github.com/kubernetes/kubernetes/blob/release-1.5/pkg/kubelet/api/v1alpha1/runtime/api.proto) of CRI is released. + +#### [CRI known issues](#cri-1.5-known-issues): + + - [#27097](https://github.com/kubernetes/kubernetes/issues/27097): Container + metrics are not yet defined in CRI. + - [#36401](https://github.com/kubernetes/kubernetes/issues/36401): The new + container log path/format is not yet supported by the logging pipeline + (e.g., fluentd, GCL). + - CRI may not be compatible with other experimental features (e.g., Seccomp). + - Streaming server needs to be hardened. + - [#36666](https://github.com/kubernetes/kubernetes/issues/36666): + Authentication. + - [#36187](https://github.com/kubernetes/kubernetes/issues/36187): Avoid + including user data in the redirect URL. + +#### [Docker CRI integration known issues](#docker-cri-1.5-known-issues) + + - Docker compatibility: Support only Docker v1.11 and v1.12. + - Network: + - [#35457](https://github.com/kubernetes/kubernetes/issues/35457): Does + not support host ports. + - [#37315](https://github.com/kubernetes/kubernetes/issues/37315): Does + not support bandwidth shaping. + - Exec/attach/port-forward (streaming requests): + - [#35747](https://github.com/kubernetes/kubernetes/issues/35747): Does + not support `nsenter` as the exec handler (`--exec-handler=nsenter`). + - Also see [CRI 1.5 known issues](#cri-1.5-known-issues) for limitations + on CRI streaming. + +## Contacts + + - Email: sig-node (kubernetes-sig-node@googlegroups.com) + - Slack: https://kubernetes.slack.com/messages/sig-node + diff --git a/contributors/devel/sig-node/cri-container-stats.md b/contributors/devel/sig-node/cri-container-stats.md new file mode 100644 index 00000000..c1176f05 --- /dev/null +++ b/contributors/devel/sig-node/cri-container-stats.md @@ -0,0 +1,121 @@ +# Container Runtime Interface: Container Metrics + +[Container runtime interface +(CRI)](/contributors/devel/container-runtime-interface.md) +provides an abstraction for container runtimes to integrate with Kubernetes. +CRI expects the runtime to provide resource usage statistics for the +containers. + +## Background + +Historically Kubelet relied on the [cAdvisor](https://github.com/google/cadvisor) +library, an open-source project hosted in a separate repository, to retrieve +container metrics such as CPU and memory usage. These metrics are then aggregated +and exposed through Kubelet's [Summary +API](https://git.k8s.io/kubernetes/pkg/kubelet/apis/stats/v1alpha1/types.go) +for the monitoring pipeline (and other components) to consume. Any container +runtime (e.g., Docker and Rkt) integrated with Kubernetes needed to add a +corresponding package in cAdvisor to support tracking container and image file +system metrics. + +With CRI being the new abstraction for integration, it was a natural +progression to augment CRI to serve container metrics to eliminate a separate +integration point. + +*See the [core metrics design +proposal](/contributors/design-proposals/instrumentation/core-metrics-pipeline.md) +for more information on metrics exposed by Kubelet, and [monitoring +architecture](/contributors/design-proposals/instrumentation/monitoring_architecture.md) +for the evolving monitoring pipeline in Kubernetes.* + +# Container Metrics + +Kubelet is responsible for creating pod-level cgroups based on the Quality of +Service class to which the pod belongs, and passes this as a parent cgroup to the +runtime so that it can ensure all resources used by the pod (e.g., pod sandbox, +containers) will be charged to the cgroup. Therefore, Kubelet has the ability +to track resource usage at the pod level (using the built-in cAdvisor), and the +API enhancement focuses on the container-level metrics. + + +We include the only a set of metrics that are necessary to fulfill the needs of +Kubelet. As the requirements evolve over time, we may extend the API to support +more metrics. Below is the API with the metrics supported today. + +```go +// ContainerStats returns stats of the container. If the container does not +// exist, the call returns an error. +rpc ContainerStats(ContainerStatsRequest) returns (ContainerStatsResponse) {} +// ListContainerStats returns stats of all running containers. +rpc ListContainerStats(ListContainerStatsRequest) returns (ListContainerStatsResponse) {} +``` + +```go +// ContainerStats provides the resource usage statistics for a container. +message ContainerStats { + // Information of the container. + ContainerAttributes attributes = 1; + // CPU usage gathered from the container. + CpuUsage cpu = 2; + // Memory usage gathered from the container. + MemoryUsage memory = 3; + // Usage of the writable layer. + FilesystemUsage writable_layer = 4; +} + +// CpuUsage provides the CPU usage information. +message CpuUsage { + // Timestamp in nanoseconds at which the information were collected. Must be > 0. + int64 timestamp = 1; + // Cumulative CPU usage (sum across all cores) since object creation. + UInt64Value usage_core_nano_seconds = 2; +} + +// MemoryUsage provides the memory usage information. +message MemoryUsage { + // Timestamp in nanoseconds at which the information were collected. Must be > 0. + int64 timestamp = 1; + // The amount of working set memory in bytes. + UInt64Value working_set_bytes = 2; +} + +// FilesystemUsage provides the filesystem usage information. +message FilesystemUsage { + // Timestamp in nanoseconds at which the information were collected. Must be > 0. + int64 timestamp = 1; + // The underlying storage of the filesystem. + StorageIdentifier storage_id = 2; + // UsedBytes represents the bytes used for images on the filesystem. + // This may differ from the total bytes used on the filesystem and may not + // equal CapacityBytes - AvailableBytes. + UInt64Value used_bytes = 3; + // InodesUsed represents the inodes used by the images. + // This may not equal InodesCapacity - InodesAvailable because the underlying + // filesystem may also be used for purposes other than storing images. + UInt64Value inodes_used = 4; +} +``` + +There are three categories or resources: CPU, memory, and filesystem. Each of +the resource usage message includes a timestamp to indicate when the usage +statistics is collected. This is necessary because some resource usage (e.g., +filesystem) are inherently more expensive to collect and may be updated less +frequently than others. Having the timestamp allows the consumer to know how +stale/fresh the data is, while giving the runtime flexibility to adjust. + +Although CRI does not dictate the frequency of the stats update, Kubelet needs +a minimum guarantee of freshness of the stats for certain resources so that it +can reclaim them timely when under pressure. We will formulate the requirements +for any of such resources and include them in CRI in the near future. + + +*For more details on why we request cached stats with timestamps as opposed to +requesting stats on-demand, here is the [rationale](https://github.com/kubernetes/kubernetes/pull/45614#issuecomment-302258090) +behind it.* + +## Status + +The container metrics calls are added to CRI in Kubernetes 1.7, but Kubelet does not +yet use it to gather metrics from the runtime. We plan to enable Kubelet to +optionally consume the container metrics from the API in 1.8. + diff --git a/contributors/devel/sig-node/cri-testing-policy.md b/contributors/devel/sig-node/cri-testing-policy.md new file mode 100644 index 00000000..d0371677 --- /dev/null +++ b/contributors/devel/sig-node/cri-testing-policy.md @@ -0,0 +1,118 @@ +# Container Runtime Interface: Testing Policy + +**Owner: SIG-Node** + +This document describes testing policy and process for runtimes implementing the +[Container Runtime Interface (CRI)](/contributors/devel/container-runtime-interface.md) +to publish test results in a federated dashboard. The objective is to provide +the Kubernetes community an easy way to track the conformance, stability, and +supported features of a CRI runtime. + +This document focuses on Kubernetes node/cluster end-to-end (E2E) testing +because many features require integration of runtime, OS, or even the cloud +provider. A higher-level integration tests provider better signals on vertical +stack compatibility to the Kubernetes community. On the other hand, runtime +developers are strongly encouraged to run low-level +[CRI validation test suite](https://github.com/kubernetes-sigs/cri-tools/blob/master/docs/validation.md) +for validation as part of their development process. + +## Required and optional tests + +Runtime maintainers are **required** to submit the tests listed below. + 1. Node conformance test suite + 2. Node feature test suite + +Node E2E tests qualify an OS image with a pre-installed CRI runtime. The +runtime maintainers are free to choose any OS distribution, packaging, and +deployment mechanism. Please see the +[tutorial](e2e-node-tests.md) +to know more about the Node E2E test framework and tests for validating a +compatible OS image. + +The conformance suite is a set of platform-agnostic (e.g., OS, runtime, and +cloud provider) tests that validate the conformance of the OS image. The feature +suite allows the runtime to demonstrate what features are supported with the OS +distribution. + +In addition to the required tests, the runtime maintainers are *strongly +recommended to run and submit results from the Kubernetes conformance test +suite*. This cluster-level E2E test suite provides extra test signal for areas +such as Networking, which cannot be covered by CRI, or Node-level +tests. Because networking requires deep integration between the runtime, the +cloud provider, and/or other cluster components, runtime maintainers are +recommended to reach out to other relevant SIGs (e.g., SIG-GCP or SIG-AWS) for +guidance and/or sponsorship. + +## Process for publishing test results + +To publish tests results, please submit a proposal in the +[Kubernetes community repository](https://github.com/kubernetes/community) +briefly explaining your runtime, providing at least two maintainers, and +assigning the proposal to the leads of SIG-Node. + +These test results should be published under the `sig-node` tab, organized +as follows. + +``` +sig-node -> sig-node-cri-{Kubernetes-version} -> [page containing the required jobs] +``` + +Only the last three most recent Kubernetes versions and the master branch are +kept at any time. This is consistent with the Kubernetes release schedule and +policy. + +## Test job maintenance + +Tests are required to run at least nightly. + +The runtime maintainers are responsible for keeping the tests healthy. If the +tests are deemed not actively maintained, SIG-Node may remove the tests from +the test grid at their discretion. + +## Process for adding pre-submit testing + +If the tests are in good standing (i.e., consistently passing for more than 2 +weeks), the runtime maintainers may request that the tests to be included in the +pre-submit Pull Request (PR) tests. Please note that the pre-submit tests +require significantly higher testing capacity, and are held at a higher standard +since they directly affect the development velocity. + +If the tests are flaky or failing, and the maintainers are unable to respond and +fix the issues in a timely manner, the SIG leads may remove the runtime from +the presubmit tests until the issues are resolved. + +As of now, SIG-Node only accepts promotion of Node conformance tests to +pre-submit because Kubernetes conformance tests involve a wider scope and may +need co-sponsorships from other SIGs. + +## FAQ + + *1. Can runtime maintainers publish results from other E2E tests?* + +Yes, runtime maintainers can publish additional Node E2E tests results. These +test jobs will be displayed in the `sig-node-{runtime-name}` page. The same +policy for test maintenance applies. + +As for additional Cluster E2E tests, SIG-Node may agree to host the +results. However, runtime maintainers are strongly encouraged to seek for a more +appropriate SIG to sponsor or host the results. + + *2. Can these runtime-specific test jobs be considered release blocking?* + +This is beyond the authority of SIG-Node, and requires agreement and consensus +across multiple SIGs (e.g., Release, the relevant cloud provider SIG, etc). + + *3. How to run the aforementioned tests?* + +It is hard to keep instructions are even links to them up-to-date in one +document. Please contact the relevant SIGs for assistance. + + *4. How can I change the test-grid to publish the test results?* + +Please contact SIG-Node for the detailed instructions. + + *5. How does this policy apply to Windows containers?* + +Windows containers are still in the early development phase and the features +they support change rapidly. Therefore, it is suggested to treat it as a +feature with select, whitelisted tests to run. diff --git a/contributors/devel/sig-node/cri-validation.md b/contributors/devel/sig-node/cri-validation.md new file mode 100644 index 00000000..84842c9b --- /dev/null +++ b/contributors/devel/sig-node/cri-validation.md @@ -0,0 +1,53 @@ +# Container Runtime Interface (CRI) Validation Testing + +CRI validation testing provides a test framework and a suite of tests to validate that the Container Runtime Interface (CRI) server implementation meets all the requirements. This allows the CRI runtime developers to verify that their runtime conforms to CRI, without needing to set up Kubernetes components or run Kubernetes end-to-end tests. + +CRI validation testing is GA since v1.11.0 and is hosted at the [cri-tools](https://github.com/kubernetes-sigs/cri-tools) repository. We encourage the CRI developers to report bugs or help extend the test coverage by adding more tests. + +## Install + +The test suites can be downloaded from cri-tools [release page](https://github.com/kubernetes-sigs/cri-tools/releases): + +```sh +VERSION="v1.11.0" +wget https://github.com/kubernetes-sigs/cri-tools/releases/download/$VERSION/critest-$VERSION-linux-amd64.tar.gz +sudo tar zxvf critest-$VERSION-linux-amd64.tar.gz -C /usr/local/bin +rm -f critest-$VERSION-linux-amd64.tar.gz +``` + +critest requires [ginkgo](https://github.com/onsi/ginkgo) to run parallel tests. It could be installed by + +```sh +go get -u github.com/onsi/ginkgo/ginkgo +``` + +*Note: ensure GO is installed and GOPATH is set before installing ginkgo.* + +## Running tests + +### Prerequisite + +Before running the test, you need to _ensure that the CRI server under test is running and listening on a Unix socket_. Because the validation tests are designed to request changes (e.g., create/delete) to the containers and verify that correct status is reported, it expects to be the only user of the CRI server. Please make sure that 1) there are no existing CRI-managed containers running on the node, and 2) no other processes (e.g., Kubelet) will interfere with the tests. + +### Run + +```sh +critest +``` + +This will + +- Connect to the shim of CRI container runtime +- Run the tests using `ginkgo` +- Output the test results to STDOUT + +critest connects to `unix:///var/run/dockershim.sock` by default. For other runtimes, the endpoint can be set by flags `-runtime-endpoint` and `-image-endpoint`. + +## Additional options + +- `-ginkgo.focus`: Only run the tests that match the regular expression. +- `-image-endpoint`: Set the endpoint of image service. Same with runtime-endpoint if not specified. +- `-runtime-endpoint`: Set the endpoint of runtime service. Default to `unix:///var/run/dockershim.sock`. +- `-ginkgo.skip`: Skip the tests that match the regular expression. +- `-parallel`: The number of parallel test nodes to run (default 1). ginkgo must be installed to run parallel tests. +- `-h`: Show help and all supported options. diff --git a/contributors/devel/sig-node/e2e-node-tests.md b/contributors/devel/sig-node/e2e-node-tests.md new file mode 100644 index 00000000..4f3327cb --- /dev/null +++ b/contributors/devel/sig-node/e2e-node-tests.md @@ -0,0 +1,229 @@ +# Node End-To-End tests + +Node e2e tests are component tests meant for testing the Kubelet code on a custom host environment. + +Tests can be run either locally or against a host running on GCE. + +Node e2e tests are run as both pre- and post- submit tests by the Kubernetes project. + +*Note: Linux only. Mac and Windows unsupported.* + +*Note: There is no scheduler running. The e2e tests have to do manual scheduling, e.g. by using `framework.PodClient`.* + +# Running tests + +## Locally + +Why run tests *Locally*? Much faster than running tests Remotely. + +Prerequisites: +- [Install etcd](https://github.com/coreos/etcd/releases) on your PATH + - Verify etcd is installed correctly by running `which etcd` + - Or make etcd binary available and executable at `/tmp/etcd` +- [Install ginkgo](https://github.com/onsi/ginkgo) on your PATH + - Verify ginkgo is installed correctly by running `which ginkgo` + +From the Kubernetes base directory, run: + +```sh +make test-e2e-node +``` + +This will: run the *ginkgo* binary against the subdirectory *test/e2e_node*, which will in turn: +- Ask for sudo access (needed for running some of the processes) +- Build the Kubernetes source code +- Pre-pull docker images used by the tests +- Start a local instance of *etcd* +- Start a local instance of *kube-apiserver* +- Start a local instance of *kubelet* +- Run the test using the locally started processes +- Output the test results to STDOUT +- Stop *kubelet*, *kube-apiserver*, and *etcd* + +## Remotely + +Why Run tests *Remotely*? Tests will be run in a customized pristine environment. Closely mimics what will be done +as pre- and post- submit testing performed by the project. + +Prerequisites: +- [join the googlegroup](https://groups.google.com/forum/#!forum/kubernetes-dev) +`kubernetes-dev@googlegroups.com` + - *This provides read access to the node test images.* +- Setup a [Google Cloud Platform](https://cloud.google.com/) account and project with Google Compute Engine enabled +- Install and setup the [gcloud sdk](https://cloud.google.com/sdk/downloads) + - Verify the sdk is setup correctly by running `gcloud compute instances list` and `gcloud compute images list --project kubernetes-node-e2e-images` + +Run: + +```sh +make test-e2e-node REMOTE=true +``` + +This will: +- Build the Kubernetes source code +- Create a new GCE instance using the default test image + - Instance will be called **test-e2e-node-containervm-v20160321-image** +- Lookup the instance public ip address +- Copy a compressed archive file to the host containing the following binaries: + - ginkgo + - kubelet + - kube-apiserver + - e2e_node.test (this binary contains the actual tests to be run) +- Unzip the archive to a directory under **/tmp/gcloud** +- Run the tests using the `ginkgo` command + - Starts etcd, kube-apiserver, kubelet + - The ginkgo command is used because this supports more features than running the test binary directly +- Output the remote test results to STDOUT +- `scp` the log files back to the local host under /tmp/_artifacts/e2e-node-containervm-v20160321-image +- Stop the processes on the remote host +- **Leave the GCE instance running** + +**Note: Subsequent tests run using the same image will *reuse the existing host* instead of deleting it and +provisioning a new one. To delete the GCE instance after each test see +*[DELETE_INSTANCE](#delete-instance-after-tests-run)*.** + + +# Additional Remote Options + +## Run tests using different images + +This is useful if you want to run tests against a host using a different OS distro or container runtime than +provided by the default image. + +List the available test images using gcloud. + +```sh +make test-e2e-node LIST_IMAGES=true +``` + +This will output a list of the available images for the default image project. + +Then run: + +```sh +make test-e2e-node REMOTE=true IMAGES="<comma-separated-list-images>" +``` + +## Run tests against a running GCE instance (not an image) + +This is useful if you have an host instance running already and want to run the tests there instead of on a new instance. + +```sh +make test-e2e-node REMOTE=true HOSTS="<comma-separated-list-of-hostnames>" +``` + +## Delete instance after tests run + +This is useful if you want recreate the instance for each test run to trigger flakes related to starting the instance. + +```sh +make test-e2e-node REMOTE=true DELETE_INSTANCES=true +``` + +## Keep instance, test binaries, and *processes* around after tests run + +This is useful if you want to manually inspect or debug the kubelet process run as part of the tests. + +```sh +make test-e2e-node REMOTE=true CLEANUP=false +``` + +## Run tests using an image in another project + +This is useful if you want to create your own host image in another project and use it for testing. + +```sh +make test-e2e-node REMOTE=true IMAGE_PROJECT="<name-of-project-with-images>" IMAGES="<image-name>" +``` + +Setting up your own host image may require additional steps such as installing etcd or docker. See +[setup_host.sh](https://git.k8s.io/kubernetes/test/e2e_node/environment/setup_host.sh) for common steps to setup hosts to run node tests. + +## Create instances using a different instance name prefix + +This is useful if you want to create instances using a different name so that you can run multiple copies of the +test in parallel against different instances of the same image. + +```sh +make test-e2e-node REMOTE=true INSTANCE_PREFIX="my-prefix" +``` + +# Additional Test Options for both Remote and Local execution + +## Only run a subset of the tests + +To run tests matching a regex: + +```sh +make test-e2e-node REMOTE=true FOCUS="<regex-to-match>" +``` + +To run tests NOT matching a regex: + +```sh +make test-e2e-node REMOTE=true SKIP="<regex-to-match>" +``` + +## Run tests continually until they fail + +This is useful if you are trying to debug a flaky test failure. This will cause ginkgo to continually +run the tests until they fail. **Note: this will only perform test setup once (e.g. creating the instance) and is +less useful for catching flakes related creating the instance from an image.** + +```sh +make test-e2e-node REMOTE=true RUN_UNTIL_FAILURE=true +``` + +## Run tests in parallel + +Running test in parallel can usually shorten the test duration. By default node +e2e test runs with`--nodes=8` (see ginkgo flag +[--nodes](https://onsi.github.io/ginkgo/#parallel-specs)). You can use the +`PARALLELISM` option to change the parallelism. + +```sh +make test-e2e-node PARALLELISM=4 # run test with 4 parallel nodes +make test-e2e-node PARALLELISM=1 # run test sequentially +``` + +## Run tests with kubenet network plugin + +[kubenet](http://kubernetes.io/docs/admin/network-plugins/#kubenet) is +the default network plugin used by kubelet since Kubernetes 1.3. The +plugin requires [CNI](https://github.com/containernetworking/cni) and +[nsenter](http://man7.org/linux/man-pages/man1/nsenter.1.html). + +Currently, kubenet is enabled by default for Remote execution `REMOTE=true`, +but disabled for Local execution. **Note: kubenet is not supported for +local execution currently. This may cause network related test result to be +different for Local and Remote execution. So if you want to run network +related test, Remote execution is recommended.** + +To enable/disable kubenet: + +```sh +# enable kubenet +make test-e2e-node TEST_ARGS='--kubelet-flags="--network-plugin=kubenet --network-plugin-dir=/opt/cni/bin"' +# disable kubenet +make test-e2e-node TEST_ARGS='--kubelet-flags="--network-plugin= --network-plugin-dir="' +``` + +## Additional QoS Cgroups Hierarchy level testing + +For testing with the QoS Cgroup Hierarchy enabled, you can pass --cgroups-per-qos flag as an argument into Ginkgo using TEST_ARGS + +```sh +make test_e2e_node TEST_ARGS="--cgroups-per-qos=true" +``` + +# Notes on tests run by the Kubernetes project during pre-, post- submit. + +The node e2e tests are run by the PR builder for each Pull Request and the results published at +the bottom of the comments section. To re-run just the node e2e tests from the PR builder add the comment +`@k8s-bot node e2e test this issue: #<Flake-Issue-Number or IGNORE>` and **include a link to the test +failure logs if caused by a flake.** + +The PR builder runs tests against the images listed in [jenkins-pull.properties](https://git.k8s.io/kubernetes/test/e2e_node/jenkins/jenkins-pull.properties) + +The post submit tests run against the images listed in [jenkins-ci.properties](https://git.k8s.io/kubernetes/test/e2e_node/jenkins/jenkins-ci.properties) + diff --git a/contributors/devel/sig-node/kubelet-cri-networking.md b/contributors/devel/sig-node/kubelet-cri-networking.md new file mode 100644 index 00000000..1446e6d3 --- /dev/null +++ b/contributors/devel/sig-node/kubelet-cri-networking.md @@ -0,0 +1,56 @@ +# Container Runtime Interface (CRI) Networking Specifications + +## Introduction +[Container Runtime Interface (CRI)](container-runtime-interface.md) is +an ongoing project to allow container +runtimes to integrate with kubernetes via a newly-defined API. This document +specifies the network requirements for container runtime +interface (CRI). CRI networking requirements expand upon kubernetes pod +networking requirements. This document does not specify requirements +from upper layers of kubernetes network stack, such as `Service`. More +background on k8s networking could be found +[here](http://kubernetes.io/docs/admin/networking/) + +## Requirements +1. Kubelet expects the runtime shim to manage pod's network life cycle. Pod +networking should be handled accordingly along with pod sandbox operations. + * `RunPodSandbox` must set up pod's network. This includes, but is not limited +to allocating a pod IP, configuring the pod's network interfaces and default +network route. Kubelet expects the pod sandbox to have an IP which is +routable within the k8s cluster, if `RunPodSandbox` returns successfully. +`RunPodSandbox` must return an error if it fails to set up the pod's network. +If the pod's network has already been set up, `RunPodSandbox` must skip +network setup and proceed. + * `StopPodSandbox` must tear down the pod's network. The runtime shim +must return error on network tear down failure. If pod's network has +already been torn down, `StopPodSandbox` must skip network tear down and proceed. + * `RemovePodSandbox` may tear down pod's network, if the networking has +not been torn down already. `RemovePodSandbox` must return error on +network tear down failure. + * Response from `PodSandboxStatus` must include pod sandbox network status. +The runtime shim must return an empty network status if it failed +to construct a network status. + +2. User supplied pod networking configurations, which are NOT directly +exposed by the kubernetes API, should be handled directly by runtime +shims. For instance, `hairpin-mode`, `cni-bin-dir`, `cni-conf-dir`, `network-plugin`, +`network-plugin-mtu` and `non-masquerade-cidr`. Kubelet will no longer handle +these configurations after the transition to CRI is complete. +3. Network configurations that are exposed through the kubernetes API +are communicated to the runtime shim through `UpdateRuntimeConfig` +interface, e.g. `podCIDR`. For each runtime and network implementation, +some configs may not be applicable. The runtime shim may handle or ignore +network configuration updates from `UpdateRuntimeConfig` interface. + +## Extensibility +* Kubelet is oblivious to how the runtime shim manages networking, i.e +runtime shim is free to use [CNI](https://github.com/containernetworking/cni), +[CNM](https://github.com/docker/libnetwork/blob/master/docs/design.md) or +any other implementation as long as the CRI networking requirements and +k8s networking requirements are satisfied. +* Runtime shims have full visibility into pod networking configurations. +* As more network feature arrives, CRI will evolve. + +## Related Issues +* Kubelet network plugin for client/server container runtimes [#28667](https://github.com/kubernetes/kubernetes/issues/28667) +* CRI networking umbrella issue [#37316](https://github.com/kubernetes/kubernetes/issues/37316) diff --git a/contributors/devel/sig-node/node-performance-testing.md b/contributors/devel/sig-node/node-performance-testing.md new file mode 100644 index 00000000..d43737a8 --- /dev/null +++ b/contributors/devel/sig-node/node-performance-testing.md @@ -0,0 +1,121 @@ +# Measuring Node Performance + +This document outlines the issues and pitfalls of measuring Node performance, as +well as the tools available. + +## Cluster Set-up + +There are lots of factors which can affect node performance numbers, so care +must be taken in setting up the cluster to make the intended measurements. In +addition to taking the following steps into consideration, it is important to +document precisely which setup was used. For example, performance can vary +wildly from commit-to-commit, so it is very important to **document which commit +or version** of Kubernetes was used, which Docker version was used, etc. + +### Addon pods + +Be aware of which addon pods are running on which nodes. By default Kubernetes +runs 8 addon pods, plus another 2 per node (`fluentd-elasticsearch` and +`kube-proxy`) in the `kube-system` namespace. The addon pods can be disabled for +more consistent results, but doing so can also have performance implications. + +For example, Heapster polls each node regularly to collect stats data. Disabling +Heapster will hide the performance cost of serving those stats in the Kubelet. + +#### Disabling Add-ons + +Disabling addons is simple. Just ssh into the Kubernetes master and move the +addon from `/etc/kubernetes/addons/` to a backup location. More details +[here](https://git.k8s.io/kubernetes/cluster/addons/). + +### Which / how many pods? + +Performance will vary a lot between a node with 0 pods and a node with 100 pods. +In many cases you'll want to make measurements with several different amounts of +pods. On a single node cluster scaling a replication controller makes this easy, +just make sure the system reaches a steady-state before starting the +measurement. E.g. `kubectl scale replicationcontroller pause --replicas=100` + +In most cases pause pods will yield the most consistent measurements since the +system will not be affected by pod load. However, in some special cases +Kubernetes has been tuned to optimize pods that are not doing anything, such as +the cAdvisor housekeeping (stats gathering). In these cases, performing a very +light task (such as a simple network ping) can make a difference. + +Finally, you should also consider which features yours pods should be using. For +example, if you want to measure performance with probing, you should obviously +use pods with liveness or readiness probes configured. Likewise for volumes, +number of containers, etc. + +### Other Tips + +**Number of nodes** - On the one hand, it can be easier to manage logs, pods, +environment etc. with a single node to worry about. On the other hand, having +multiple nodes will let you gather more data in parallel for more robust +sampling. + +## E2E Performance Test + +There is an end-to-end test for collecting overall resource usage of node +components: [kubelet_perf.go](https://git.k8s.io/kubernetes/test/e2e/node/kubelet_perf.go). To +run the test, simply make sure you have an e2e cluster running (`go run +hack/e2e.go -- -up`) and [set up](#cluster-set-up) correctly. + +Run the test with `go run hack/e2e.go -- -v -test +--test_args="--ginkgo.focus=resource\susage\stracking"`. You may also wish to +customise the number of pods or other parameters of the test (remember to rerun +`make WHAT=test/e2e/e2e.test` after you do). + +## Profiling + +Kubelet installs the [go pprof handlers](https://golang.org/pkg/net/http/pprof/), which can be queried for CPU profiles: + +```console +$ kubectl proxy & +Starting to serve on 127.0.0.1:8001 +$ curl -G "http://localhost:8001/api/v1/proxy/nodes/${NODE}:10250/debug/pprof/profile?seconds=${DURATION_SECONDS}" > $OUTPUT +$ KUBELET_BIN=_output/dockerized/bin/linux/amd64/kubelet +$ go tool pprof -web $KUBELET_BIN $OUTPUT +``` + +`pprof` can also provide heap usage, from the `/debug/pprof/heap` endpoint +(e.g. `http://localhost:8001/api/v1/proxy/nodes/${NODE}:10250/debug/pprof/heap`). + +More information on go profiling can be found +[here](http://blog.golang.org/profiling-go-programs). + +## Benchmarks + +Before jumping through all the hoops to measure a live Kubernetes node in a real +cluster, it is worth considering whether the data you need can be gathered +through a Benchmark test. Go provides a really simple benchmarking mechanism, +just add a unit test of the form: + +```go +// In foo_test.go +func BenchmarkFoo(b *testing.B) { + b.StopTimer() + setupFoo() // Perform any global setup + b.StartTimer() + for i := 0; i < b.N; i++ { + foo() // Functionality to measure + } +} +``` + +Then: + +```console +$ go test -bench=. -benchtime=${SECONDS}s foo_test.go +``` + +More details on benchmarking [here](https://golang.org/pkg/testing/). + +## TODO + +- (taotao) Measuring docker performance +- Expand cluster set-up section +- (vishh) Measuring disk usage +- (yujuhong) Measuring memory usage +- Add section on monitoring kubelet metrics (e.g. with prometheus) + diff --git a/contributors/devel/sig-release/OWNERS b/contributors/devel/sig-release/OWNERS new file mode 100644 index 00000000..c414be94 --- /dev/null +++ b/contributors/devel/sig-release/OWNERS @@ -0,0 +1,8 @@ +# See the OWNERS docs at https://go.k8s.io/owners + +reviewers: + - sig-release-leads +approvers: + - sig-release-leads +labels: + - sig/release diff --git a/contributors/devel/sig-release/cherry-picks.md b/contributors/devel/sig-release/cherry-picks.md new file mode 100644 index 00000000..7769f970 --- /dev/null +++ b/contributors/devel/sig-release/cherry-picks.md @@ -0,0 +1,73 @@ +# Overview + +This document explains how cherry-picks are managed on release branches within +the kubernetes/kubernetes repository. +A common use case for this task is backporting PRs from master to release +branches. + +## Prerequisites + * [Contributor License Agreement](http://git.k8s.io/community/CLA.md) is + considered implicit for all code within cherry-pick pull requests, + **unless there is a large conflict**. + * A pull request merged against the master branch. + * [Release branch](https://git.k8s.io/release/docs/branching.md) exists. + * The normal git and GitHub configured shell environment for pushing to your + kubernetes `origin` fork on GitHub and making a pull request against a + configured remote `upstream` that tracks + "https://github.com/kubernetes/kubernetes.git", including `GITHUB_USER`. + * Have `hub` installed, which is most easily installed via `go get + github.com/github/hub` assuming you have a standard golang development + environment. + +## Initiate a Cherry-pick + * Run the [cherry-pick + script](https://git.k8s.io/kubernetes/hack/cherry_pick_pull.sh). + This example applies a master branch PR #98765 to the remote branch + `upstream/release-3.14`: `hack/cherry_pick_pull.sh upstream/release-3.14 + 98765` + * Be aware the cherry-pick script assumes you have a git remote called + `upstream` that points at the Kubernetes github org. + Please see our [recommended Git workflow](https://git.k8s.io/community/contributors/guide/github-workflow.md#workflow). + * You will need to run the cherry-pick script separately for each patch release you want to cherry-pick to. + + * Your cherry-pick PR will immediately get the `do-not-merge/cherry-pick-not-approved` label. + The [Branch Manager](https://git.k8s.io/sig-release/release-team/role-handbooks/branch-manager) + will triage PRs targeted to the next .0 minor release branch up until the + release, while the [Patch Release Team](https://git.k8s.io/sig-release/release-team/role-handbooks/patch-release-manager) + will handle all cherry-picks to patch releases. + Normal rules apply for code merge. + * Reviewers `/lgtm` and owners `/approve` as they deem appropriate. + * Milestones on cherry-pick PRs should be the milestone for the target + release branch (for example, milestone 1.11 for a cherry-pick onto + release-1.11). + * You can find the current release team members in the + [appropriate release folder](https://git.k8s.io/sig-release/releases) for the target release. + You may cc them with `<@githubusername>` on your cherry-pick PR. + +## Cherry-pick Review + +Cherry-pick pull requests have an additional requirement compared to normal pull +requests. +They must be approved specifically for cherry-pick by Approvers. +The [Branch Manager](https://git.k8s.io/sig-release/release-team/role-handbooks/branch-manager) +or the [Patch Release Team](https://git.k8s.io/sig-release/release-team/role-handbooks/patch-release-manager) +are the final authority on removing the `do-not-merge/cherry-pick-not-approved` +label and triggering a merge into the target branch. + +## Searching for Cherry-picks + +- [A sample search on kubernetes/kubernetes pull requests that are labeled as `cherry-pick-approved`](https://github.com/kubernetes/kubernetes/pulls?q=is%3Aopen+is%3Apr+label%3Acherry-pick-approved) + +- [A sample search on kubernetes/kubernetes pull requests that are labeled as `do-not-merge/cherry-pick-not-approved`](https://github.com/kubernetes/kubernetes/pulls?q=is%3Aopen+is%3Apr+label%3Ado-not-merge%2Fcherry-pick-not-approved) + + +## Troubleshooting Cherry-picks + +Contributors may encounter some of the following difficulties when initiating a cherry-pick. + +- A cherry-pick PR does not apply cleanly against an old release branch. +In that case, you will need to manually fix conflicts. + +- The cherry-pick PR includes code that does not pass CI tests. +In such a case you will have to fetch the auto-generated branch from your fork, amend the problematic commit and force push to the auto-generated branch. +Alternatively, you can create a new PR, which is noisier. diff --git a/contributors/devel/sig-release/getting-builds.md b/contributors/devel/sig-release/getting-builds.md new file mode 100644 index 00000000..0ae7031b --- /dev/null +++ b/contributors/devel/sig-release/getting-builds.md @@ -0,0 +1,48 @@ +# Getting Kubernetes Builds + +You can use [hack/get-build.sh](http://releases.k8s.io/HEAD/hack/get-build.sh) +to get a build or to use as a reference on how to get the most recent builds +with curl. With `get-build.sh` you can grab the most recent stable build, the +most recent release candidate, or the most recent build to pass our ci and gce +e2e tests (essentially a nightly build). + +Run `./hack/get-build.sh -h` for its usage. + +To get a build at a specific version (v1.1.1) use: + +```console +./hack/get-build.sh v1.1.1 +``` + +To get the latest stable release: + +```console +./hack/get-build.sh release/stable +``` + +Use the "-v" option to print the version number of a build without retrieving +it. For example, the following prints the version number for the latest ci +build: + +```console +./hack/get-build.sh -v ci/latest +``` + +You can also use the gsutil tool to explore the Google Cloud Storage release +buckets. Here are some examples: + +```sh +gsutil cat gs://kubernetes-release-dev/ci/latest.txt # output the latest ci version number +gsutil cat gs://kubernetes-release-dev/ci/latest-green.txt # output the latest ci version number that passed gce e2e +gsutil ls gs://kubernetes-release-dev/ci/v0.20.0-29-g29a55cc/ # list the contents of a ci release +gsutil ls gs://kubernetes-release/release # list all official releases and rcs +``` + +## Install `gsutil` + +Example installation: + +```console +$ curl -sSL https://storage.googleapis.com/pub/gsutil.tar.gz | sudo tar -xz -C /usr/local/src +$ sudo ln -s /usr/local/src/gsutil/gsutil /usr/bin/gsutil +``` diff --git a/contributors/devel/release-cycle.png b/contributors/devel/sig-release/release-cycle.png Binary files differindex f3aa460a..f3aa460a 100644 --- a/contributors/devel/release-cycle.png +++ b/contributors/devel/sig-release/release-cycle.png diff --git a/contributors/devel/release-lifecycle.png b/contributors/devel/sig-release/release-lifecycle.png Binary files differindex 090dabab..090dabab 100644 --- a/contributors/devel/release-lifecycle.png +++ b/contributors/devel/sig-release/release-lifecycle.png diff --git a/contributors/devel/sig-release/release.md b/contributors/devel/sig-release/release.md new file mode 100644 index 00000000..b4e9224e --- /dev/null +++ b/contributors/devel/sig-release/release.md @@ -0,0 +1,307 @@ +# Targeting Features, Issues and PRs to Release Milestones + +This document is focused on Kubernetes developers and contributors +who need to create a feature, issue, or pull request which targets a specific +release milestone. + +- [TL;DR](#tldr) +- [Definitions](#definitions) +- [The Release Cycle](#the-release-cycle) +- [Removal Of Items From The Milestone](#removal-of-items-from-the-milestone) +- [Adding An Item To The Milestone](#adding-an-item-to-the-milestone) + - [Milestone Maintainers](#milestone-maintainers) + - [Feature additions](#feature-additions) + - [Issue additions](#issue-additions) + - [PR Additions](#pr-additions) +- [Other Required Labels](#other-required-labels) + - [SIG Owner Label](#sig-owner-label) + - [Priority Label](#priority-label) + - [Issue Kind Label](#issue-kind-label) + +The process for shepherding features, issues, and pull requests +into a Kubernetes release spans multiple stakeholders: +* the feature, issue, or pull request owner +* SIG leadership +* the release team + +Information on workflows and interactions are described below. + +As the owner of a feature, issue, or pull request (PR), it is your +responsibility to ensure release milestone requirements are met. +Automation and the release team will be in contact with you if +updates are required, but inaction can result in your work being +removed from the milestone. Additional requirements exist when the +target milestone is a prior release (see [cherry pick +process](cherry-picks.md) for more information). + +## TL;DR + +If you want your PR to get merged, it needs the following required labels and milestones, represented here by the Prow /commands it would take to add them: +<table> +<tr> +<td></td> +<td>Normal Dev</td> +<td>Code Freeze</td> +<td>Post-Release</td> +</tr> +<tr> +<td></td> +<td>Weeks 1-8</td> +<td>Weeks 9-11</td> +<td>Weeks 11+</td> +</tr> +<tr> +<td>Required Labels</td> +<td> +<ul> +<!--Weeks 1-8--> +<li>/sig {name}</li> +<li>/kind {type}</li> +<li>/lgtm</li> +<li>/approved</li> +</ul> +</td> +<td> +<ul> +<!--Weeks 9-11--> +<li>/milestone {v1.y}</li> +<li>/sig {name}</li> +<li>/kind {bug, failing-test}</li> +<li>/priority critical-urgent</li> +<li>/lgtm</li> +<li>/approved</li> +</ul> +</td> +<td> +<!--Weeks 11+--> +Return to 'Normal Dev' phase requirements: +<ul> +<li>/sig {name}</li> +<li>/kind {type}</li> +<li>/lgtm</li> +<li>/approved</li> +</ul> + +Merges into the 1.y branch are now [via cherrypicks](https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md), approved by release branch manager. +</td> +<td> +<ul> +</td> +</tr> +</table> + +In the past there was a requirement for a milestone targeted pull +request to have an associated GitHub issue opened, but this is no +longer the case. Features are effectively GitHub issues or +[KEPs](https://git.k8s.io/community/keps) +which lead to subsequent PRs. The general labeling process should +be consistent across artifact types. + +--- + +## Definitions + +- *issue owners*: Creator, assignees, and user who moved the issue into a release milestone. +- *release team*: Each Kubernetes release has a team doing project + management tasks described + [here](https://git.k8s.io/sig-release/release-team/README.md). The + contact info for the team associated with any given release can be + found [here](https://git.k8s.io/sig-release/releases/). +- *Y days*: Refers to business days (using the location local to the release-manager M-F). +- *feature*: see "[Is My Thing a Feature?](http://git.k8s.io/features/README.md#is-my-thing-a-feature) +- *release milestone*: semantic version string or [GitHub milestone](https://help.github.com/articles/associating-milestones-with-issues-and-pull-requests/) referring to a release MAJOR.MINOR vX.Y version. See also [release versioning](http://git.k8s.io/community/contributors/design-proposals/release/versioning.md) +- *release branch*: Git branch "release-X.Y" created for the vX.Y milestone. Created at the time of the vX.Y-beta.0 release and maintained after the release for approximately 9 months with vX.Y.Z patch releases. + +## The Release Cycle + + + +Kubernetes releases currently happen four times per year. The release +process can be thought of as having three main phases: +* Feature Definition +* Implementation +* Stabilization + +But in reality this is an open source and agile project, with feature +planning and implementation happening at all times. Given the +project scale and globally distributed developer base, it is critical +to project velocity to not rely on a trailing stabilization phase and +rather have continuous integration testing which ensures the +project is always stable so that individual commits can be +flagged as having broken something. + +With ongoing feature definition through the year, some set of items +will bubble up as targeting a given release. The **enhancement freeze** +starts ~4 weeks into release cycle. By this point all intended +feature work for the given release has been defined in suitable +planning artifacts in conjunction with the Release Team's [enhancements +lead](https://git.k8s.io/sig-release/release-team/role-handbooks/enhancements/README.md). + +Implementation and bugfixing is ongoing across the cycle, but +culminates in a code freeze period: +* The **code freeze** starts in week ~10 and continues for ~2 weeks. + Only critical bug fixes are accepted into the release codebase. + +There are approximately two weeks following code freeze, and preceding +release, during which all remaining critical issues must be resolved +before release. This also gives time for documentation finalization. + +When the code base is sufficiently stable, the master branch re-opens +for general development and work begins there for the next release +milestone. Any remaining modifications for the current release are cherry +picked from master back to the release branch. The release is built from +the release branch. + +Following release, the [Release Branch +Manager](https://git.k8s.io/sig-release/release-team/role-handbooks/branch-manager/README.md) +cherry picks additional critical fixes from the master branch for +a period of around 9 months, leaving an overlap of three release +versions forward support. Thus, each release is part of a broader +Kubernetes lifecycle: + + + +## Removal Of Items From The Milestone + +Before getting too far into the process for adding an item to the +milestone, please note: + +Members of the Release Team may remove Issues from the milestone +if they or the responsible SIG determine that the issue is not +actually blocking the release and is unlikely to be resolved in a +timely fashion. + +Members of the Release Team may remove PRs from the milestone for +any of the following, or similar, reasons: + +* PR is potentially de-stabilizing and is not needed to resolve a blocking issue; +* PR is a new, late feature PR and has not gone through the features process or the exception process; +* There is no responsible SIG willing to take ownership of the PR and resolve any follow-up issues with it; +* PR is not correctly labelled; +* Work has visibly halted on the PR and delivery dates are uncertain or late. + +While members of the Release Team will help with labelling and +contacting SIG(s), it is the responsibility of the submitter to +categorize PRs, and to secure support from the relevant SIG to +guarantee that any breakage caused by the PR will be rapidly resolved. + +Where additional action is required, an attempt at human to human +escalation will be made by the release team through the following +channels: + +- Comment in GitHub mentioning the SIG team and SIG members as appropriate for the issue type +- Emailing the SIG mailing list + - bootstrapped with group email addresses from the [community sig list](/sig-list.md) + - optionally also directly addressing SIG leadership or other SIG members +- Messaging the SIG's Slack channel + - bootstrapped with the slackchannel and SIG leadership from the [community sig list](/sig-list.md) + - optionally directly "@" mentioning SIG leadership or others by handle + +## Adding An Item To The Milestone + +### Milestone Maintainers + +The members of the GitHub [“kubernetes-milestone-maintainers” +team](https://github.com/orgs/kubernetes/teams/kubernetes-milestone-maintainers/members) +are entrusted with the responsibility of specifying the release milestone on +GitHub artifacts. This group is [maintained by +SIG-Release](https://git.k8s.io/sig-release/release-team/README.md#milestone-maintainers) +and has representation from the various SIGs' leadership. + +### Feature additions + +Feature planning and definition takes many forms today, but a typical +example might be a large piece of work described in a +[KEP](https://git.k8s.io/community/keps), with associated +task issues in GitHub. When the plan has reached an implementable state and +work is underway, the feature or parts thereof are targeted for an upcoming +milestone by creating GitHub issues and marking them with the Prow "/milestone" +command. + +For the first ~4 weeks into the release cycle, the release team's +Enhancements Lead will interact with SIGs and feature owners via GitHub, +Slack, and SIG meetings to capture all required planning artifacts. + +If you have a feature to target for an upcoming release milestone, begin a +conversation with your SIG leadership and with that release's Enhancements +Lead. + +### Issue additions + +Issues are marked as targeting a milestone via the Prow +"/milestone" command. + +The release team's [Bug Triage +Lead](https://git.k8s.io/sig-release/release-team/role-handbooks/bug-triage/README.md) and overall community watch +incoming issues and triage them, as described in the contributor +guide section on [issue triage](/contributors/guide/issue-triage.md). + +Marking issues with the milestone provides the community better +visibility regarding when an issue was observed and by when the community +feels it must be resolved. During code freeze, to merge a PR it is required +that a release milestone is set. + +An open issue is no longer required for a PR, but open issues and +associated PRs should have synchronized labels. For example a high +priority bug issue might not have its associated PR merged if the PR is +only marked as lower priority. + +### PR Additions + +PRs are marked as targeting a milestone via the Prow +"/milestone" command. + +This is a blocking requirement during code freeze as described above. + +## Other Required Labels + +*Note* [Here is the list of labels and their use and purpose.](https://git.k8s.io/test-infra/label_sync/labels.md#labels-that-apply-to-all-repos-for-both-issues-and-prs) + +### SIG Owner Label + +The SIG owner label defines the SIG to which we escalate if a +milestone issue is languishing or needs additional attention. If +there are no updates after escalation, the issue may be automatically +removed from the milestone. + +These are added with the Prow "/sig" command. For example to add +the label indicating SIG Storage is responsible, comment with `/sig +storage`. + +### Priority Label + +Priority labels are used to determine an escalation path before +moving issues out of the release milestone. They are also used to +determine whether or not a release should be blocked on the resolution +of the issue. + +- `priority/critical-urgent`: Never automatically move out of a release milestone; continually escalate to contributor and SIG through all available channels. + - considered a release blocking issue + - code freeze: issue owner update frequency: daily + - would require a patch release if left undiscovered until after the minor release. +- `priority/important-soon`: Escalate to the issue owners and SIG owner; move out of milestone after several unsuccessful escalation attempts. + - not considered a release blocking issue + - would not require a patch release + - will automatically be moved out of the release milestone at code freeze after a 4 day grace period +- `priority/important-longterm`: Escalate to the issue owners; move out of the milestone after 1 attempt. + - even less urgent / critical than `priority/important-soon` + - moved out of milestone more aggressively than `priority/important-soon` + +### Issue/PR Kind Label + +The issue kind is used to help identify the types of changes going +into the release over time. This may allow the release team to +develop a better understanding of what sorts of issues we would +miss with a faster release cadence. + +For release targeted issues, including pull requests, one of the following +issue kind labels must be set: + +- `kind/api-change`: Adds, removes, or changes an API +- `kind/bug`: Fixes a newly discovered bug. +- `kind/cleanup`: Adding tests, refactoring, fixing old bugs. +- `kind/design`: Related to design +- `kind/documentation`: Adds documentation +- `kind/failing-test`: CI test case is failing consistently. +- `kind/feature`: New functionality. +- `kind/flake`: CI test case is showing intermittent failures. diff --git a/contributors/devel/sig-scalability/OWNERS b/contributors/devel/sig-scalability/OWNERS new file mode 100644 index 00000000..6b57aa45 --- /dev/null +++ b/contributors/devel/sig-scalability/OWNERS @@ -0,0 +1,8 @@ +# See the OWNERS docs at https://go.k8s.io/owners + +reviewers: + - sig-scalability-leads +approvers: + - sig-scalability-leads +labels: + - sig/scalability diff --git a/contributors/devel/sig-scalability/kubemark-guide.md b/contributors/devel/sig-scalability/kubemark-guide.md new file mode 100644 index 00000000..ce5727e8 --- /dev/null +++ b/contributors/devel/sig-scalability/kubemark-guide.md @@ -0,0 +1,256 @@ +# Kubemark User Guide + +## Introduction + +Kubemark is a performance testing tool which allows users to run experiments on +simulated clusters. The primary use case is scalability testing, as simulated +clusters can be much bigger than the real ones. The objective is to expose +problems with the master components (API server, controller manager or +scheduler) that appear only on bigger clusters (e.g. small memory leaks). + +This document serves as a primer to understand what Kubemark is, what it is not, +and how to use it. + +## Architecture + +On a very high level, Kubemark cluster consists of two parts: a real master +and a set of “Hollow” Nodes. The prefix “Hollow” to any component means an +implementation/instantiation of the actual component with all “moving” +parts mocked out. The best example is HollowKubelet, which pretends to be an +ordinary Kubelet, but does not start anything, nor mount any volumes - it just +lies it does. More detailed design and implementation details are at the end +of this document. + +Currently, master components run on a dedicated machine as pods that are +created/managed by kubelet, which itself runs as either a systemd or a supervisord +service on the master VM depending on the VM distro (though currently it is +only systemd as we use a GCI image). Having a dedicated machine for the master +has a slight advantage over running the master components on an external cluster, +which is being able to completely isolate master resources from everything else. +The HollowNodes on the other hand are run on an ‘external’ Kubernetes cluster +as pods in an isolated namespace (named kubemark). This idea of using pods on a +real cluster behave (or act) as nodes on the kubemark cluster lies at the heart of +kubemark's design. + +## Requirements + +To run Kubemark you need a Kubernetes cluster (called `external cluster`) +for running all your HollowNodes and a dedicated machine for a master. +Master machine has to be directly routable from HollowNodes. You also need +access to a Docker repository (which is gcr.io in the case of GCE) that has the +container images for etcd, hollow-node and node-problem-detector. + +Currently, scripts are written to be easily usable by GCE, but it should be +relatively straightforward to port them to different providers or bare metal. +There is an ongoing effort to refactor kubemark code into provider-specific (gce) +and provider-independent code, which should make it relatively simple to run +kubemark clusters on other cloud providers as well. + +## Common use cases and helper scripts + +Common workflow for Kubemark is: +- starting a Kubemark cluster (on GCE) +- running e2e tests on Kubemark cluster +- monitoring test execution and debugging problems +- turning down Kubemark cluster + +(For now) Included in descriptions there will be comments helpful for anyone who’ll +want to port Kubemark to different providers. +(Later) When the refactoring mentioned in the above section finishes, we would replace +these comments with a clean API that would allow kubemark to run on top of any provider. + +### Starting a Kubemark cluster + +To start a Kubemark cluster on GCE you need to create an external kubernetes +cluster (it can be GCE, GKE or anything else) by yourself, make sure that kubeconfig +points to it by default, build a kubernetes release (e.g. by running +`make quick-release`) and run `test/kubemark/start-kubemark.sh` script. +This script will create a VM for master (along with mounted PD and firewall rules set), +then start kubelet and run the pods for the master components. Following this, it +sets up the HollowNodes as Pods on the external cluster and do all the setup necessary +to let them talk to the kubemark apiserver. It will use the configuration stored in +`cluster/kubemark/config-default.sh` - you can tweak it however you want, but note that +some features may not be implemented yet, as implementation of Hollow components/mocks +will probably be lagging behind ‘real’ one. For performance tests interesting variables +are `NUM_NODES` and `KUBEMARK_MASTER_SIZE`. After start-kubemark script is finished, +you’ll have a ready Kubemark cluster, and a kubeconfig file for talking to the Kubemark +cluster is stored in `test/kubemark/resources/kubeconfig.kubemark`. + +Currently we're running HollowNode with a limit of 0.09 CPU core/pod and 220MB of memory. +However, if we also take into account the resources absorbed by default cluster addons +and fluentD running on the 'external' cluster, this limit becomes ~0.1 CPU core/pod, +thus allowing ~10 HollowNodes to run per core (on an "n1-standard-8" VM node). + +#### Behind the scene details: + +start-kubemark.sh script does quite a lot of things: + +- Prepare a master machine named MASTER_NAME (this variable's value should be set by this point): + (*the steps below use gcloud, and should be easy to do outside of GCE*) + 1. Creates a Persistent Disk for use by the master (one more for etcd-events, if flagged) + 2. Creates a static IP address for the master in the cluster and assign it to variable MASTER_IP + 3. Creates a VM instance for the master, configured with the PD and IP created above. + 4. Set firewall rule in the master to open port 443\* for all TCP traffic by default. + +<sub>\* Port 443 is a secured port on the master machine which is used for all +external communication with the API server. In the last sentence *external* +means all traffic coming from other machines, including all the Nodes, not only +from outside of the cluster. Currently local components, i.e. ControllerManager +and Scheduler talk with API server using insecure port 8080.</sub> + +- [Optional to read] Establish necessary certs/keys required for setting up the PKI for kubemark cluster: + (*the steps below are independent of GCE and work for all providers*) + 1. Generate a randomly named temporary directory for storing PKI certs/keys which is delete-trapped on EXIT. + 2. Create a bearer token for 'admin' in master. + 3. Generate certificate for CA and (certificate + private-key) pair for each of master, kubelet and kubecfg. + 4. Generate kubelet and kubeproxy tokens for master. + 5. Write a kubeconfig locally to `test/kubemark/resources/kubeconfig.kubemark` for enabling local kubectl use. + +- Set up environment and start master components (through `start-kubemark-master.sh` script): + (*the steps below use gcloud for SSH and SCP to master, and should be easy to do outside of GCE*) + 1. SSH to the master machine and create a new directory (`/etc/srv/kubernetes`) and write all the + certs/keys/tokens/passwords to it. + 2. SCP all the master pod manifests, shell scripts (`start-kubemark-master.sh`, `configure-kubectl.sh`, etc), + config files for passing env variables (`kubemark-master-env.sh`) from the local machine to the master. + 3. SSH to the master machine and run the startup script `start-kubemark-master.sh` (and possibly others). + + Note: The directory structure and the functions performed by the startup script(s) can vary based on master distro. + We currently support the GCI image `gci-dev-56-8977-0-0` in GCE. + +- Set up and start HollowNodes (as pods) on the external cluster: + (*the steps below (except 2nd and 3rd) are independent of GCE and work for all providers*) + 1. Identify the right kubemark binary from the current kubernetes repo for the platform linux/amd64. + 2. Create a Docker image for HollowNode using this binary and upload it to a remote Docker repository. + (We use gcr.io/ as our remote docker repository in GCE, should be different for other providers) + 3. [One-off] Create and upload a Docker image for NodeProblemDetector (see kubernetes/node-problem-detector repo), + which is one of the containers in the HollowNode pod, besides HollowKubelet and HollowProxy. However we + use it with a hollow config that essentially has an empty set of rules and conditions to be detected. + This step is required only for other cloud providers, as the docker image for GCE already exists on GCR. + 4. Create secret which stores kubeconfig for use by HollowKubelet/HollowProxy, addons, and configMaps + for the HollowNode and the HollowNodeProblemDetector. + 5. Create a ReplicationController for HollowNodes that starts them up, after replacing all variables in + the hollow-node_template.json resource. + 6. Wait until all HollowNodes are in the Running phase. + +### Running e2e tests on Kubemark cluster + +To run standard e2e test on your Kubemark cluster created in the previous step +you execute `test/kubemark/run-e2e-tests.sh` script. It will configure ginkgo to +use Kubemark cluster instead of something else and start an e2e test. This +script should not need any changes to work on other cloud providers. + +By default (if nothing will be passed to it) the script will run a Density '30 +test. If you want to run a different e2e test you just need to provide flags you want to be +passed to `hack/ginkgo-e2e.sh` script, e.g. `--ginkgo.focus="Load"` to run the +Load test. + +By default, at the end of each test, it will delete namespaces and everything +under it (e.g. events, replication controllers) on Kubemark master, which takes +a lot of time. Such work aren't needed in most cases: if you delete your +Kubemark cluster after running `run-e2e-tests.sh`; you don't care about +namespace deletion performance, specifically related to etcd; etc. There is a +flag that enables you to avoid namespace deletion: `--delete-namespace=false`. +Adding the flag should let you see in logs: `Found DeleteNamespace=false, +skipping namespace deletion!` + +### Monitoring test execution and debugging problems + +Run-e2e-tests prints the same output on Kubemark as on ordinary e2e cluster, but +if you need to dig deeper you need to learn how to debug HollowNodes and how +Master machine (currently) differs from the ordinary one. + +If you need to debug master machine you can do similar things as you do on your +ordinary master. The difference between Kubemark setup and ordinary setup is +that in Kubemark etcd is run as a plain docker container, and all master +components are run as normal processes. There's no Kubelet overseeing them. Logs +are stored in exactly the same place, i.e. `/var/logs/` directory. Because +binaries are not supervised by anything they won't be restarted in the case of a +crash. + +To help you with debugging from inside the cluster startup script puts a +`~/configure-kubectl.sh` script on the master. It downloads `gcloud` and +`kubectl` tool and configures kubectl to work on unsecured master port (useful +if there are problems with security). After the script is run you can use +kubectl command from the master machine to play with the cluster. + +Debugging HollowNodes is a bit more tricky, as if you experience a problem on +one of them you need to learn which hollow-node pod corresponds to a given +HollowNode known by the Master. During self-registeration HollowNodes provide +their cluster IPs as Names, which means that if you need to find a HollowNode +named `10.2.4.5` you just need to find a Pod in external cluster with this +cluster IP. There's a helper script +`test/kubemark/get-real-pod-for-hollow-node.sh` that does this for you. + +When you have a Pod name you can use `kubectl logs` on external cluster to get +logs, or use a `kubectl describe pod` call to find an external Node on which +this particular HollowNode is running so you can ssh to it. + +E.g. you want to see the logs of HollowKubelet on which pod `my-pod` is running. +To do so you can execute: + +``` +$ kubectl kubernetes/test/kubemark/resources/kubeconfig.kubemark describe pod my-pod +``` + +Which outputs pod description and among it a line: + +``` +Node: 1.2.3.4/1.2.3.4 +``` + +To learn the `hollow-node` pod corresponding to node `1.2.3.4` you use +aforementioned script: + +``` +$ kubernetes/test/kubemark/get-real-pod-for-hollow-node.sh 1.2.3.4 +``` + +which will output the line: + +``` +hollow-node-1234 +``` + +Now you just use ordinary kubectl command to get the logs: + +``` +kubectl --namespace=kubemark logs hollow-node-1234 +``` + +All those things should work exactly the same on all cloud providers. + +### Turning down Kubemark cluster + +On GCE you just need to execute `test/kubemark/stop-kubemark.sh` script, which +will delete HollowNode ReplicationController and all the resources for you. On +other providers you’ll need to delete all this stuff by yourself. As part of +the effort mentioned above to refactor kubemark into provider-independent and +provider-specific parts, the resource deletion logic specific to the provider +would move out into a clean API. + +## Some current implementation details and future roadmap + +Kubemark master uses exactly the same binaries as ordinary Kubernetes does. This +means that it will never be out of date. On the other hand HollowNodes use +existing fake for Kubelet (called SimpleKubelet), which mocks its runtime +manager with `pkg/kubelet/dockertools/fake_manager.go`, where most logic sits. +Because there's no easy way of mocking other managers (e.g. VolumeManager), they +are not supported in Kubemark (e.g. we can't schedule Pods with volumes in them +yet). + +We currently plan to extend kubemark along the following directions: +- As you would have noticed at places above, we aim to make kubemark more structured + and easy to run across various providers without having to tweak the setup scripts, + using a well-defined kubemark-provider API. +- Allow kubemark to run on various distros (GCI, debian, redhat, etc) for any + given provider. +- Make Kubemark performance on ci-tests mimic real cluster ci-tests on metrics such as + CPU, memory and network bandwidth usage and realizing this goal through measurable + objectives (like the kubemark metric should vary no more than X% with the real + cluster metric). We could also use metrics reported by Prometheus. +- Improve logging of CI-test metrics (such as aggregated API call latencies, scheduling + call latencies, %ile for CPU/mem usage of different master components in density/load + tests) by packing them into well-structured artifacts instead of the (current) dumping + to logs. +- Create a Dashboard that lets easy viewing and comparison of these metrics across tests. + diff --git a/contributors/devel/sig-scalability/profiling.md b/contributors/devel/sig-scalability/profiling.md new file mode 100644 index 00000000..f7c8b2e5 --- /dev/null +++ b/contributors/devel/sig-scalability/profiling.md @@ -0,0 +1,76 @@ +# Profiling Kubernetes + +This document explain how to plug in profiler and how to profile Kubernetes services. To get familiar with the tools mentioned below, it is strongly recommended to read [Profiling Go Programs](https://blog.golang.org/profiling-go-programs). + +## Profiling library + +Go comes with inbuilt 'net/http/pprof' profiling library and profiling web service. The way service works is binding debug/pprof/ subtree on a running webserver to the profiler. Reading from subpages of debug/pprof returns pprof-formatted profiles of the running binary. The output can be processed offline by the tool of choice, or used as an input to handy 'go tool pprof', which can graphically represent the result. + +## Adding profiling to services to APIserver. + +TL;DR: Add lines: + +```go +m.mux.HandleFunc("/debug/pprof/", pprof.Index) +m.mux.HandleFunc("/debug/pprof/profile", pprof.Profile) +m.mux.HandleFunc("/debug/pprof/symbol", pprof.Symbol) +``` + +to the `init(c *Config)` method in 'pkg/master/master.go' and import 'net/http/pprof' package. + +In most use cases to use profiler service it's enough to do 'import _ net/http/pprof', which automatically registers a handler in the default http.Server. Slight inconvenience is that APIserver uses default server for intra-cluster communication, so plugging profiler to it is not really useful. In 'pkg/kubelet/server/server.go' more servers are created and started as separate goroutines. The one that is usually serving external traffic is secureServer. The handler for this traffic is defined in 'pkg/master/master.go' and stored in Handler variable. It is created from HTTP multiplexer, so the only thing that needs to be done is adding profiler handler functions to this multiplexer. This is exactly what lines after TL;DR do. + +## Connecting to the profiler + +Even when running profiler I found not really straightforward to use 'go tool pprof' with it. The problem is that at least for dev purposes certificates generated for APIserver are not signed by anyone trusted and because secureServer serves only secure traffic it isn't straightforward to connect to the service. The best workaround I found is by creating an ssh tunnel from the kubernetes_master open unsecured port to some external server, and use this server as a proxy. To save everyone looking for correct ssh flags, it is done by running: + +```sh +ssh kubernetes_master -L<local_port>:localhost:8080 +``` + +or analogous one for you Cloud provider. Afterwards you can e.g. run + +```sh +go tool pprof http://localhost:<local_port>/debug/pprof/profile +``` + +to get 30 sec. CPU profile. + +## Contention profiling + +To enable contention profiling you need to add line `rt.SetBlockProfileRate(1)` in addition to `m.mux.HandleFunc(...)` added before (`rt` stands for `runtime` in `master.go`). This enables 'debug/pprof/block' subpage, which can be used as an input to `go tool pprof`. + +## Profiling in tests + +To gather a profile from a test, the HTTP interface is probably not suitable. Instead, you can add the `-cpuprofile` flag to your KUBE_TEST_ARGS, e.g. + +```sh +make test-integration WHAT="./test/integration/scheduler" KUBE_TEST_ARGS="-cpuprofile cpu.out" +go tool pprof cpu.out +``` + +See the ['go test' flags](https://golang.org/cmd/go/#hdr-Description_of_testing_flags) for how to capture other types of profiles. + +## Profiling in a benchmark test + +Gathering a profile from a benchmark test works in the same way as regular tests, but sometimes there may be expensive setup that you want excluded from the profile. (i.e. any time you would use `b.ResetTimer()`) + +To solve this problem, you can explicitly start the profile in your test code like so. + +```go +func BenchmarkMyFeature(b *testing.B) { + // Expensive test setup... + b.ResetTimer() + f, err := os.Create("bench_profile.out") + if err != nil { + log.Fatal("could not create profile file: ", err) + } + if err := pprof.StartCPUProfile(f); err != nil { + log.Fatal("could not start CPU profile: ", err) + } + defer pprof.StopCPUProfile() + // Rest of the test... +} +``` + +> Note: Code added to a test to gather CPU profiles should not be merged. It is meant to be temporary while you create an analyze profiles. diff --git a/contributors/devel/sig-scheduling/OWNERS b/contributors/devel/sig-scheduling/OWNERS new file mode 100644 index 00000000..f6155ab6 --- /dev/null +++ b/contributors/devel/sig-scheduling/OWNERS @@ -0,0 +1,8 @@ +# See the OWNERS docs at https://go.k8s.io/owners + +reviewers: + - sig-scheduling-leads +approvers: + - sig-scheduling-leads +labels: + - sig/scheduling diff --git a/contributors/devel/sig-scheduling/scheduler.md b/contributors/devel/sig-scheduling/scheduler.md new file mode 100644 index 00000000..486b04a9 --- /dev/null +++ b/contributors/devel/sig-scheduling/scheduler.md @@ -0,0 +1,90 @@ +# The Kubernetes Scheduler + +The Kubernetes scheduler runs as a process alongside the other master components such as the API server. +Its interface to the API server is to watch for Pods with an empty PodSpec.NodeName, +and for each Pod, it posts a binding indicating where the Pod should be scheduled. + +## Exploring the code + +We are dividing scheduler into three layers from high level: +- [cmd/kube-scheduler/scheduler.go](http://releases.k8s.io/HEAD/cmd/kube-scheduler/scheduler.go): + This is the main() entry that does initialization before calling the scheduler framework. +- [pkg/scheduler/scheduler.go](http://releases.k8s.io/HEAD/pkg/scheduler/scheduler.go): + This is the scheduler framework that handles stuff (e.g. binding) beyond the scheduling algorithm. +- [pkg/scheduler/core/generic_scheduler.go](http://releases.k8s.io/HEAD/pkg/scheduler/core/generic_scheduler.go): + The scheduling algorithm that assigns nodes for pods. + +## The scheduling algorithm + +``` +For given pod: + + +---------------------------------------------+ + | Schedulable nodes: | + | | + | +--------+ +--------+ +--------+ | + | | node 1 | | node 2 | | node 3 | | + | +--------+ +--------+ +--------+ | + | | + +-------------------+-------------------------+ + | + | + v + +-------------------+-------------------------+ + + Pred. filters: node 3 doesn't have enough resource + + +-------------------+-------------------------+ + | + | + v + +-------------------+-------------------------+ + | remaining nodes: | + | +--------+ +--------+ | + | | node 1 | | node 2 | | + | +--------+ +--------+ | + | | + +-------------------+-------------------------+ + | + | + v + +-------------------+-------------------------+ + + Priority function: node 1: p=2 + node 2: p=5 + + +-------------------+-------------------------+ + | + | + v + select max{node priority} = node 2 +``` + +The Scheduler tries to find a node for each Pod, one at a time. +- First it applies a set of "predicates" to filter out inappropriate nodes. For example, if the PodSpec specifies resource requests, then the scheduler will filter out nodes that don't have at least that much resources available (computed as the capacity of the node minus the sum of the resource requests of the containers that are already running on the node). +- Second, it applies a set of "priority functions" +that rank the nodes that weren't filtered out by the predicate check. For example, it tries to spread Pods across nodes and zones while at the same time favoring the least (theoretically) loaded nodes (where "load" - in theory - is measured as the sum of the resource requests of the containers running on the node, divided by the node's capacity). +- Finally, the node with the highest priority is chosen (or, if there are multiple such nodes, then one of them is chosen at random). The code for this main scheduling loop is in the function `Schedule()` in [pkg/scheduler/core/generic_scheduler.go](http://releases.k8s.io/HEAD/pkg/scheduler/core/generic_scheduler.go) + +### Predicates and priorities policies + +Predicates are a set of policies applied one by one to filter out inappropriate nodes. +Priorities are a set of policies applied one by one to rank nodes (that made it through the filter of the predicates). +By default, Kubernetes provides built-in predicates and priorities policies documented in [scheduler_algorithm.md](scheduler_algorithm.md). +The predicates and priorities code are defined in [pkg/scheduler/algorithm/predicates/predicates.go](http://releases.k8s.io/HEAD/pkg/scheduler/algorithm/predicates/predicates.go) and [pkg/scheduler/algorithm/priorities](http://releases.k8s.io/HEAD/pkg/scheduler/algorithm/priorities/) , respectively. + + +## Scheduler extensibility + +The scheduler is extensible: the cluster administrator can choose which of the pre-defined +scheduling policies to apply, and can add new ones. + +### Modifying policies + +The policies that are applied when scheduling can be chosen in one of two ways. +The default policies used are selected by the functions `defaultPredicates()` and `defaultPriorities()` in +[pkg/scheduler/algorithmprovider/defaults/defaults.go](http://releases.k8s.io/HEAD/pkg/scheduler/algorithmprovider/defaults/defaults.go). +However, the choice of policies can be overridden by passing the command-line flag `--policy-config-file` to the scheduler, pointing to a JSON file specifying which scheduling policies to use. See [examples/scheduler-policy-config.json](https://git.k8s.io/examples/staging/scheduler-policy-config.json) for an example +config file. (Note that the config file format is versioned; the API is defined in [pkg/scheduler/api](http://releases.k8s.io/HEAD/pkg/scheduler/api/)). +Thus to add a new scheduling policy, you should modify [pkg/scheduler/algorithm/predicates/predicates.go](http://releases.k8s.io/HEAD/pkg/scheduler/algorithm/predicates/predicates.go) or add to the directory [pkg/scheduler/algorithm/priorities](http://releases.k8s.io/HEAD/pkg/scheduler/algorithm/priorities/), and either register the policy in `defaultPredicates()` or `defaultPriorities()`, or use a policy config file. + diff --git a/contributors/devel/sig-scheduling/scheduler_algorithm.md b/contributors/devel/sig-scheduling/scheduler_algorithm.md new file mode 100644 index 00000000..e6596b47 --- /dev/null +++ b/contributors/devel/sig-scheduling/scheduler_algorithm.md @@ -0,0 +1,40 @@ +# Scheduler Algorithm in Kubernetes + +For each unscheduled Pod, the Kubernetes scheduler tries to find a node across the cluster according to a set of rules. A general introduction to the Kubernetes scheduler can be found at [scheduler.md](scheduler.md). In this document, the algorithm of how to select a node for the Pod is explained. There are two steps before a destination node of a Pod is chosen. The first step is filtering all the nodes and the second is ranking the remaining nodes to find a best fit for the Pod. + +## Filtering the nodes + +The purpose of filtering the nodes is to filter out the nodes that do not meet certain requirements of the Pod. For example, if the free resource on a node (measured by the capacity minus the sum of the resource requests of all the Pods that already run on the node) is less than the Pod's required resource, the node should not be considered in the ranking phase so it is filtered out. Currently, there are several "predicates" implementing different filtering policies, including: + +- `NoDiskConflict`: Evaluate if a pod can fit due to the volumes it requests, and those that are already mounted. Currently supported volumes are: AWS EBS, GCE PD, ISCSI and Ceph RBD. Only Persistent Volume Claims for those supported types are checked. Persistent Volumes added directly to pods are not evaluated and are not constrained by this policy. +- `NoVolumeZoneConflict`: Evaluate if the volumes a pod requests are available on the node, given the Zone restrictions. +- `PodFitsResources`: Check if the free resource (CPU and Memory) meets the requirement of the Pod. The free resource is measured by the capacity minus the sum of requests of all Pods on the node. To learn more about the resource QoS in Kubernetes, please check [QoS proposal](../design-proposals/node/resource-qos.md). +- `PodFitsHostPorts`: Check if any HostPort required by the Pod is already occupied on the node. +- `HostName`: Filter out all nodes except the one specified in the PodSpec's NodeName field. +- `MatchNodeSelector`: Check if the labels of the node match the labels specified in the Pod's `nodeSelector` field and, as of Kubernetes v1.2, also match the `nodeAffinity` if present. See [here](https://kubernetes.io/docs/user-guide/node-selection/) for more details on both. +- `MaxEBSVolumeCount`: Ensure that the number of attached ElasticBlockStore volumes does not exceed a maximum value (by default, 39, since Amazon recommends a maximum of 40 with one of those 40 reserved for the root volume -- see [Amazon's documentation](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/volume_limits.html#linux-specific-volume-limits)). The maximum value can be controlled by setting the `KUBE_MAX_PD_VOLS` environment variable. +- `MaxGCEPDVolumeCount`: Ensure that the number of attached GCE PersistentDisk volumes does not exceed a maximum value (by default, 16, which is the maximum GCE allows -- see [GCE's documentation](https://cloud.google.com/compute/docs/disks/persistent-disks#limits_for_predefined_machine_types)). The maximum value can be controlled by setting the `KUBE_MAX_PD_VOLS` environment variable. +- `CheckNodeMemoryPressure`: Check if a pod can be scheduled on a node reporting memory pressure condition. Currently, no ``BestEffort`` pods should be placed on a node under memory pressure as it gets automatically evicted by kubelet. +- `CheckNodeDiskPressure`: Check if a pod can be scheduled on a node reporting disk pressure condition. Currently, no pods should be placed on a node under disk pressure as it gets automatically evicted by kubelet. + +The details of the above predicates can be found in [pkg/scheduler/algorithm/predicates/predicates.go](http://releases.k8s.io/HEAD/pkg/scheduler/algorithm/predicates/predicates.go). All predicates mentioned above can be used in combination to perform a sophisticated filtering policy. Kubernetes uses some, but not all, of these predicates by default. You can see which ones are used by default in [pkg/scheduler/algorithmprovider/defaults/defaults.go](http://releases.k8s.io/HEAD/pkg/scheduler/algorithmprovider/defaults/defaults.go). + +## Ranking the nodes + +The filtered nodes are considered suitable to host the Pod, and it is often that there are more than one nodes remaining. Kubernetes prioritizes the remaining nodes to find the "best" one for the Pod. The prioritization is performed by a set of priority functions. For each remaining node, a priority function gives a score which scales from 0-10 with 10 representing for "most preferred" and 0 for "least preferred". Each priority function is weighted by a positive number and the final score of each node is calculated by adding up all the weighted scores. For example, suppose there are two priority functions, `priorityFunc1` and `priorityFunc2` with weighting factors `weight1` and `weight2` respectively, the final score of some NodeA is: + + finalScoreNodeA = (weight1 * priorityFunc1) + (weight2 * priorityFunc2) + +After the scores of all nodes are calculated, the node with highest score is chosen as the host of the Pod. If there are more than one nodes with equal highest scores, a random one among them is chosen. + +Currently, Kubernetes scheduler provides some practical priority functions, including: + +- `LeastRequestedPriority`: The node is prioritized based on the fraction of the node that would be free if the new Pod were scheduled onto the node. (In other words, (capacity - sum of requests of all Pods already on the node - request of Pod that is being scheduled) / capacity). CPU and memory are equally weighted. The node with the highest free fraction is the most preferred. Note that this priority function has the effect of spreading Pods across the nodes with respect to resource consumption. +- `BalancedResourceAllocation`: This priority function tries to put the Pod on a node such that the CPU and Memory utilization rate is balanced after the Pod is deployed. +- `SelectorSpreadPriority`: Spread Pods by minimizing the number of Pods belonging to the same service, replication controller, or replica set on the same node. If zone information is present on the nodes, the priority will be adjusted so that pods are spread across zones and nodes. +- `CalculateAntiAffinityPriority`: Spread Pods by minimizing the number of Pods belonging to the same service on nodes with the same value for a particular label. +- `ImageLocalityPriority`: Nodes are prioritized based on locality of images requested by a pod. Nodes with larger size of already-installed packages required by the pod will be preferred over nodes with no already-installed packages required by the pod or a small total size of already-installed packages required by the pod. +- `NodeAffinityPriority`: (Kubernetes v1.2) Implements `preferredDuringSchedulingIgnoredDuringExecution` node affinity; see [here](https://kubernetes.io/docs/user-guide/node-selection/) for more details. + +The details of the above priority functions can be found in [pkg/scheduler/algorithm/priorities](http://releases.k8s.io/HEAD/pkg/scheduler/algorithm/priorities/). Kubernetes uses some, but not all, of these priority functions by default. You can see which ones are used by default in [pkg/scheduler/algorithmprovider/defaults/defaults.go](http://releases.k8s.io/HEAD/pkg/scheduler/algorithmprovider/defaults/defaults.go). Similar as predicates, you can combine the above priority functions and assign weight factors (positive number) to them as you want (check [scheduler.md](scheduler.md) for how to customize). + diff --git a/contributors/devel/sig-storage/OWNERS b/contributors/devel/sig-storage/OWNERS new file mode 100644 index 00000000..6dd5158f --- /dev/null +++ b/contributors/devel/sig-storage/OWNERS @@ -0,0 +1,8 @@ +# See the OWNERS docs at https://go.k8s.io/owners + +reviewers: + - sig-storage-leads +approvers: + - sig-storage-leads +labels: + - sig/storage diff --git a/contributors/devel/sig-storage/flexvolume.md b/contributors/devel/sig-storage/flexvolume.md new file mode 100644 index 00000000..12c46382 --- /dev/null +++ b/contributors/devel/sig-storage/flexvolume.md @@ -0,0 +1,155 @@ +# Flexvolume + +Flexvolume enables users to write their own drivers and add support for their volumes in Kubernetes. Vendor drivers should be installed in the volume plugin path on every node, and on master if the driver requires attach capability (unless `--enable-controller-attach-detach` Kubelet option is set to false, but this is highly discouraged because it is a legacy mode of operation). + +Flexvolume is a GA feature from Kubernetes 1.8 release onwards. + +## Prerequisites + +Install the vendor driver on all nodes (also on master nodes if "--enable-controller-attach-detach" Kubelet option is enabled) in the plugin path. Path for installing the plugin: `<plugindir>/<vendor~driver>/<driver>`. The default plugin directory is `/usr/libexec/kubernetes/kubelet-plugins/volume/exec/`. It can be changed in kubelet via the `--volume-plugin-dir` flag, and in controller manager via the `--flex-volume-plugin-dir` flag. + +For example to add a `cifs` driver, by vendor `foo` install the driver at: `/usr/libexec/kubernetes/kubelet-plugins/volume/exec/foo~cifs/cifs` + +The vendor and driver names must match flexVolume.driver in the volume spec, with '~' replaced with '/'. For example, if `flexVolume.driver` is set to `foo/cifs`, then the vendor is `foo`, and driver is `cifs`. + +## Dynamic Plugin Discovery +Beginning in v1.8, Flexvolume supports the ability to detect drivers on the fly. Instead of requiring drivers to exist at system initialization time or having to restart kubelet or controller manager, drivers can be installed, upgraded/downgraded, and uninstalled while the system is running. +For more information, please refer to the [design document](/contributors/design-proposals/storage/flexvolume-deployment.md). + +## Automated Plugin Installation/Upgrade +One possible way to install and upgrade your Flexvolume drivers is by using a DaemonSet. See [Recommended Driver Deployment Method](/contributors/design-proposals/storage/flexvolume-deployment.md#recommended-driver-deployment-method) for details, and see [here](https://git.k8s.io/examples/staging/volumes/flexvolume/deploy/) for an example. + +## Plugin details +The plugin expects the following call-outs are implemented for the backend drivers. Some call-outs are optional. Call-outs are invoked from Kubelet and Controller Manager. + +### Driver invocation model: + +#### Init: +Initializes the driver. Called during Kubelet & Controller manager initialization. On success, the function returns a capabilities map showing whether each Flexvolume capability is supported by the driver. +Current capabilities: +* `attach` - a boolean field indicating whether the driver requires attach and detach operations. This field is *required*, although for backward-compatibility the default value is set to `true`, i.e. requires attach and detach. +See [Driver output](#driver-output) for the capabilities map format. +``` +<driver executable> init +``` + +#### Attach: +Attach the volume specified by the given spec on the given node. On success, returns the device path where the device is attached on the node. Called from Controller Manager. + +This call-out does not pass "secrets" specified in Flexvolume spec. If your driver requires secrets, do not implement this call-out and instead use "mount" call-out and implement attach and mount in that call-out. + +``` +<driver executable> attach <json options> <node name> +``` + +#### Detach: +Detach the volume from the node. Called from Controller Manager. +``` +<driver executable> detach <mount device> <node name> +``` + +#### Wait for attach: +Wait for the volume to be attached on the remote node. On success, the path to the device is returned. Called from Controller Manager. The timeout should be 10m (based on https://git.k8s.io/kubernetes/pkg/kubelet/volumemanager/volume_manager.go#L88 ) + +``` +<driver executable> waitforattach <mount device> <json options> +``` + +#### Volume is Attached: +Check the volume is attached on the node. Called from Controller Manager. + +``` +<driver executable> isattached <json options> <node name> +``` + +#### Mount device: +Mount device mounts the device to a global path which individual pods can then bind mount. Called only from Kubelet. + +This call-out does not pass "secrets" specified in Flexvolume spec. If your driver requires secrets, do not implement this call-out and instead use "mount" call-out and implement attach and mount in that call-out. + +``` +<driver executable> mountdevice <mount dir> <mount device> <json options> +``` + +#### Unmount device: +Unmounts the global mount for the device. This is called once all bind mounts have been unmounted. Called only from Kubelet. + +``` +<driver executable> unmountdevice <mount device> +``` +In addition to the user-specified options and [default JSON options](#default-json-options), the following options capturing information about the pod are passed through and generated automatically. + +``` +kubernetes.io/pod.name +kubernetes.io/pod.namespace +kubernetes.io/pod.uid +kubernetes.io/serviceAccount.name +``` + +#### Mount: +Mount the volume at the mount dir. This call-out defaults to bind mount for drivers which implement attach & mount-device call-outs. Called only from Kubelet. + +``` +<driver executable> mount <mount dir> <json options> +``` + +#### Unmount: +Unmount the volume. This call-out defaults to bind mount for drivers which implement attach & mount-device call-outs. Called only from Kubelet. + +``` +<driver executable> unmount <mount dir> +``` + +See [lvm] & [nfs] for a quick example on how to write a simple flexvolume driver. + +### Driver output: + +Flexvolume expects the driver to reply with the status of the operation in the +following format. + +``` +{ + "status": "<Success/Failure/Not supported>", + "message": "<Reason for success/failure>", + "device": "<Path to the device attached. This field is valid only for attach & waitforattach call-outs>" + "volumeName": "<Cluster wide unique name of the volume. Valid only for getvolumename call-out>" + "attached": <True/False (Return true if volume is attached on the node. Valid only for isattached call-out)> + "capabilities": <Only included as part of the Init response> + { + "attach": <True/False (Return true if the driver implements attach and detach)> + } +} +``` + +### Default Json options + +In addition to the flags specified by the user in the Options field of the FlexVolumeSource, the following flags (set through their corresponding FlexVolumeSource fields) are also passed to the executable. +Note: Secrets are passed only to "mount/unmount" call-outs. + +``` +"kubernetes.io/fsType":"<FS type>", +"kubernetes.io/readwrite":"<rw>", +"kubernetes.io/fsGroup":"<FS group>", +"kubernetes.io/mountsDir":"<string>", +"kubernetes.io/pvOrVolumeName":"<Volume name if the volume is in-line in the pod spec; PV name if the volume is a PV>" + +"kubernetes.io/pod.name":"<string>", +"kubernetes.io/pod.namespace":"<string>", +"kubernetes.io/pod.uid":"<string>", +"kubernetes.io/serviceAccount.name":"<string>", + +"kubernetes.io/secret/key1":"<secret1>" +... +"kubernetes.io/secret/keyN":"<secretN>" +``` + +### Example of Flexvolume + +Please refer to the [Flexvolume example directory]. See [nginx-lvm.yaml] & [nginx-nfs.yaml] for a quick example on how to use Flexvolume in a pod. + + +[lvm]: https://git.k8s.io/examples/staging/volumes/flexvolume/lvm +[nfs]: https://git.k8s.io/examples/staging/volumes/flexvolume/nfs +[nginx-lvm.yaml]: https://git.k8s.io/examples/staging/volumes/flexvolume/nginx-lvm.yaml +[nginx-nfs.yaml]: https://git.k8s.io/examples/staging/volumes/flexvolume/nginx-nfs.yaml +[Flexvolume example directory]: https://git.k8s.io/examples/staging/volumes/flexvolume/ diff --git a/contributors/devel/sig-testing/OWNERS b/contributors/devel/sig-testing/OWNERS new file mode 100644 index 00000000..541bac08 --- /dev/null +++ b/contributors/devel/sig-testing/OWNERS @@ -0,0 +1,8 @@ +# See the OWNERS docs at https://go.k8s.io/owners + +reviewers: + - sig-testing-leads +approvers: + - sig-testing-leads +labels: + - sig/testing diff --git a/contributors/devel/sig-testing/bazel.md b/contributors/devel/sig-testing/bazel.md new file mode 100644 index 00000000..6916c0be --- /dev/null +++ b/contributors/devel/sig-testing/bazel.md @@ -0,0 +1,188 @@ +# Build and test with Bazel + +Building and testing Kubernetes with Bazel is supported but not yet default. + +Bazel is used to run all Kubernetes PRs on [Prow](https://prow.k8s.io), +as remote caching enables significantly reduced build and test times. + +Some repositories (such as kubernetes/test-infra) have switched to using Bazel +exclusively for all build, test, and release workflows. + +Go rules are managed by the [`gazelle`](https://github.com/bazelbuild/rules_go/tree/master/go/tools/gazelle) +tool, with some additional rules managed by the [`kazel`](https://git.k8s.io/repo-infra/kazel) tool. +These tools are called via the `hack/update-bazel.sh` script. + +Instructions for installing Bazel +can be found [here](https://www.bazel.io/versions/master/docs/install.html). +Please note that until [this Bazel +issue](https://github.com/bazelbuild/rules_docker/issues/454) is fixed, +`/usr/bin/env python` must be python2 in order for all the Bazel commands listed +below to succeed. + +Several convenience `make` rules have been created for common operations: + +* `make bazel-build`: builds all binaries in tree (`bazel build -- //... + -//vendor/...`) +* `make bazel-test`: runs all unit tests (`bazel test --config=unit -- //... + //hack:verify-all -//build/... -//vendor/...`) +* `make bazel-test-integration`: runs all integration tests (`bazel test + --config integration //test/integration/...`) +* `make bazel-release`: builds release tarballs, Docker images (for server + components), and Debian images (`bazel build //build/release-tars`) + +You can also interact with Bazel directly; for example, to run all `kubectl` unit +tests, run + +```console +$ bazel test //pkg/kubectl/... +``` + +## Planter +If you don't want to install Bazel, you can instead try using the unofficial +[Planter](https://git.k8s.io/test-infra/planter) tool, +which runs Bazel inside a Docker container. + +For example, you can run +```console +$ ../test-infra/planter/planter.sh make bazel-test +$ ../test-infra/planter/planter.sh bazel build //cmd/kubectl +``` + +## Continuous Integration + +There are several bazel CI jobs: +* [ci-kubernetes-bazel-build](http://k8s-testgrid.appspot.com/google-unit#bazel-build): builds everything + with Bazel +* [ci-kubernetes-bazel-test](http://k8s-testgrid.appspot.com/google-unit#bazel-test): runs unit tests in + with Bazel + +Similar jobs are run on all PRs; additionally, several of the e2e jobs use +Bazel-built binaries when launching and testing Kubernetes clusters. + +## Updating `BUILD` files + +To update `BUILD` files, run: + +```console +$ ./hack/update-bazel.sh +``` + +To prevent Go rules from being updated, consult the [gazelle +documentation](https://github.com/bazelbuild/rules_go/tree/master/go/tools/gazelle). + +Note that much like Go files and `gofmt`, `BUILD` files have standardized, +opinionated style rules, and running `hack/update-bazel.sh` will format them for you. + +If you want to auto-format `BUILD` files in your editor, use of +[Buildifier](https://github.com/bazelbuild/buildtools/blob/master/buildifier/README.md) +is recommended. + +Updating the `BUILD` file for a package will be required when: +* Files are added to or removed from a package +* Import dependencies change for a package +* A `BUILD` file has been updated and needs to be reformatted +* A new `BUILD` file has been added (parent `BUILD` files will be updated) + +## Known issues and limitations + +### [Cross-compilation of cgo is not currently natively supported](https://github.com/bazelbuild/rules_go/issues/1020) +All binaries are currently built for the host OS and architecture running Bazel. +(For example, you can't currently target linux/amd64 from macOS or linux/s390x +from an amd64 machine.) + +The Go rules support cross-compilation of pure Go code using the `--platforms` +flag, and this is being used successfully in the kubernetes/test-infra repo. + +It may already be possible to cross-compile cgo code if a custom CC toolchain is +set up, possibly reusing the kube-cross Docker image, but this area needs +further exploration. + +### The CC toolchain is not fully hermetic +Bazel requires several tools and development packages to be installed in the system, including `gcc`, `g++`, `glibc and libstdc++ development headers` and `glibc static development libraries`. Please check your distribution for exact names of the packages. Examples for some commonly used distributions are below: + +| Dependency | Debian/Ubuntu | CentOS | OpenSuSE | +|:---------------------:|-------------------------------|--------------------------------|-----------------------------------------| +| Build essentials | `apt install build-essential` | `yum groupinstall development` | `zypper install -t pattern devel_C_C++` | +| GCC C++ | `apt install g++` | `yum install gcc-c++` | `zypper install gcc-c++` | +| GNU Libc static files | `apt install libc6-dev` | `yum install glibc-static` | `zypper install glibc-devel-static` | + +If any of these packages change, they may also cause spurious build failures +as described in [this issue](https://github.com/bazelbuild/bazel/issues/4907). + +An example error might look something like +``` +ERROR: undeclared inclusion(s) in rule '//vendor/golang.org/x/text/cases:go_default_library.cgo_c_lib': +this rule is missing dependency declarations for the following files included by 'vendor/golang.org/x/text/cases/linux_amd64_stripped/go_default_library.cgo_codegen~/_cgo_export.c': + '/usr/lib/gcc/x86_64-linux-gnu/7/include/stddef.h' +``` + +The only way to recover from this error is to force Bazel to regenerate its +automatically-generated CC toolchain configuration by running `bazel clean +--expunge`. + +Improving cgo cross-compilation may help with all of this. + +### Changes to Go imports requires updating BUILD files +The Go rules in `BUILD` and `BUILD.bazel` files must be updated any time files +are added or removed or Go imports are changed. These rules are automatically +maintained by `gazelle`, which is run via `hack/update-bazel.sh`, but this is +still a source of friction. + +[Autogazelle](https://github.com/bazelbuild/bazel-gazelle/tree/master/cmd/autogazelle) +is a new experimental tool which may reduce or remove the need for developers +to run `hack/update-bazel.sh`, but no work has yet been done to support it in +kubernetes/kubernetes. + +### Code coverage support is incomplete for Go +Bazel and the Go rules have limited support for code coverage. Running something +like `bazel coverage -- //... -//vendor/...` will run tests in coverage mode, +but no report summary is currently generated. It may be possible to combine +`bazel coverage` with +[Gopherage](https://github.com/kubernetes/test-infra/tree/master/gopherage), +however. + +### Kubernetes code generators are not fully supported +The make-based build system in kubernetes/kubernetes runs several code +generators at build time: +* [conversion-gen](https://github.com/kubernetes/code-generator/tree/master/cmd/conversion-gen) +* [deepcopy-gen](https://github.com/kubernetes/code-generator/tree/master/cmd/deepcopy-gen) +* [defaulter-gen](https://github.com/kubernetes/code-generator/tree/master/cmd/defaulter-gen) +* [openapi-gen](https://github.com/kubernetes/kube-openapi/tree/master/cmd/openapi-gen) +* [go-bindata](https://github.com/jteeuwen/go-bindata/tree/master/go-bindata) + +Of these, only `openapi-gen` and `go-bindata` are currently supported when +building Kubernetes with Bazel. + +The `go-bindata` generated code is produced by hand-written genrules. + +The other code generators use special build tags of the form `// ++k8s:generator-name=arg`; for example, input files to the openapi-gen tool are +specified with `// +k8s:openapi-gen=true`. + +`kazel` is used to find all packages that require OpenAPI generation, and then a +handwritten genrule consumes this list of packages to run `openapi-gen`. + +For `openapi-gen`, a single output file is produced in a single Go package, which +makes this fairly compatible with Bazel. +All other Kubernetes code generators generally produce one output file per input +package, which is less compatible with the Bazel workflow. + +The make-based build system batches up all input packages into one call to the +code generator binary, but this is inefficient for Bazel's incrementality, as a +change in one package may result in unnecessarily recompiling many other +packages. +On the other hand, calling the code generator binary multiple times is less +efficient than calling it once, since many of the generators parse the tree for +Go type information and other metadata. + +One additional challenge is that many of the code generators add additional +Go imports which `gazelle` (and `autogazelle`) cannot infer, and so they must be +explicitly added as dependencies in the `BUILD` files. + +Kubernetes has even more code generators than this limited list, but the rest +are generally run as `hack/update-*.sh` scripts and checked into the repository, +and so are not immediately needed for Bazel parity. + +## Contacts +For help or discussion, join the [#bazel](https://kubernetes.slack.com/messages/bazel) +channel on Kubernetes Slack. diff --git a/contributors/devel/sig-testing/e2e-tests.md b/contributors/devel/sig-testing/e2e-tests.md new file mode 100644 index 00000000..848662fc --- /dev/null +++ b/contributors/devel/sig-testing/e2e-tests.md @@ -0,0 +1,764 @@ +# End-to-End Testing in Kubernetes + +**Table of Contents** + +- [End-to-End Testing in Kubernetes](#end-to-end-testing-in-kubernetes) + - [Overview](#overview) + - [Building Kubernetes and Running the Tests](#building-kubernetes-and-running-the-tests) + - [Cleaning up](#cleaning-up) + - [Advanced testing](#advanced-testing) + - [Extracting a specific version of kubernetes](#extracting-a-specific-version-of-kubernetes) + - [Bringing up a cluster for testing](#bringing-up-a-cluster-for-testing) + - [Federation e2e tests](#federation-e2e-tests) + - [Configuring federation e2e tests](#configuring-federation-e2e-tests) + - [Image Push Repository](#image-push-repository) + - [Build](#build) + - [Deploy federation control plane](#deploy-federation-control-plane) + - [Run the Tests](#run-the-tests) + - [Teardown](#teardown) + - [Shortcuts for test developers](#shortcuts-for-test-developers) + - [Debugging clusters](#debugging-clusters) + - [Local clusters](#local-clusters) + - [Testing against local clusters](#testing-against-local-clusters) + - [Version-skewed and upgrade testing](#version-skewed-and-upgrade-testing) + - [Test jobs naming convention](#test-jobs-naming-convention) + - [Kinds of tests](#kinds-of-tests) + - [Viper configuration and hierarchichal test parameters.](#viper-configuration-and-hierarchichal-test-parameters) + - [Conformance tests](#conformance-tests) + - [Continuous Integration](#continuous-integration) + - [What is CI?](#what-is-ci) + - [What runs in CI?](#what-runs-in-ci) + - [Non-default tests](#non-default-tests) + - [The PR-builder](#the-pr-builder) + - [Adding a test to CI](#adding-a-test-to-ci) + - [Moving a test out of CI](#moving-a-test-out-of-ci) + - [Performance Evaluation](#performance-evaluation) + - [One More Thing](#one-more-thing) + + +## Overview + +End-to-end (e2e) tests for Kubernetes provide a mechanism to test end-to-end +behavior of the system, and is the last signal to ensure end user operations +match developer specifications. Although unit and integration tests provide a +good signal, in a distributed system like Kubernetes it is not uncommon that a +minor change may pass all unit and integration tests, but cause unforeseen +changes at the system level. + +The primary objectives of the e2e tests are to ensure a consistent and reliable +behavior of the kubernetes code base, and to catch hard-to-test bugs before +users do, when unit and integration tests are insufficient. + +The e2e tests in kubernetes are built atop of +[Ginkgo](http://onsi.github.io/ginkgo/) and +[Gomega](http://onsi.github.io/gomega/). There are a host of features that this +Behavior-Driven Development (BDD) testing framework provides, and it is +recommended that the developer read the documentation prior to diving into the + tests. + +The purpose of *this* document is to serve as a primer for developers who are +looking to execute or add tests using a local development environment. + +Before writing new tests or making substantive changes to existing tests, you +should also read [Writing Good e2e Tests](writing-good-e2e-tests.md) + +## Building Kubernetes and Running the Tests + +There are a variety of ways to run e2e tests, but we aim to decrease the number +of ways to run e2e tests to a canonical way: `kubetest`. + +You can install `kubetest` as follows: +```sh +go get -u k8s.io/test-infra/kubetest +``` + +You can run an end-to-end test which will bring up a master and nodes, perform +some tests, and then tear everything down. Make sure you have followed the +getting started steps for your chosen cloud platform (which might involve +changing the --provider flag value to something other than "gce"). + +You can quickly recompile the e2e testing framework via `go install ./test/e2e`. +This will not do anything besides allow you to verify that the go code compiles. +If you want to run your e2e testing framework without re-provisioning the e2e setup, +you can do so via `make WHAT=test/e2e/e2e.test`, and then re-running the ginkgo tests. + +To build Kubernetes, up a cluster, run tests, and tear everything down, use: + +```sh +kubetest --build --up --test --down +``` + +If you'd like to just perform one of these steps, here are some examples: + +```sh +# Build binaries for testing +kubetest --build + +# Create a fresh cluster. Deletes a cluster first, if it exists +kubetest --up + +# Run all tests +kubetest --test + +# Run tests matching the regex "\[Feature:Performance\]" against a local cluster +# Specify "--provider=local" flag when running the tests locally +kubetest --test --test_args="--ginkgo.focus=\[Feature:Performance\]" --provider=local + +# Conversely, exclude tests that match the regex "Pods.*env" +kubetest --test --test_args="--ginkgo.skip=Pods.*env" + +# Run tests in parallel, skip any that must be run serially +GINKGO_PARALLEL=y kubetest --test --test_args="--ginkgo.skip=\[Serial\]" + +# Run tests in parallel, skip any that must be run serially and keep the test namespace if test failed +GINKGO_PARALLEL=y kubetest --test --test_args="--ginkgo.skip=\[Serial\] --delete-namespace-on-failure=false" + +# Flags can be combined, and their actions will take place in this order: +# --build, --up, --test, --down +# +# You can also specify an alternative provider, such as 'aws' +# +# e.g.: +kubetest --provider=aws --build --up --test --down + +# -ctl can be used to quickly call kubectl against your e2e cluster. Useful for +# cleaning up after a failed test or viewing logs. +# kubectl output is default on, you can use --verbose-commands=false to suppress output. +kubetest -ctl='get events' +kubetest -ctl='delete pod foobar' +``` + +The tests are built into a single binary which can be used to deploy a +Kubernetes system or run tests against an already-deployed Kubernetes system. +See `kubetest --help` (or the flag definitions in `hack/e2e.go`) for +more options, such as reusing an existing cluster. + +### Cleaning up + +During a run, pressing `control-C` should result in an orderly shutdown, but if +something goes wrong and you still have some VMs running you can force a cleanup +with this command: + +```sh +kubetest --down +``` + +## Advanced testing + +### Extracting a specific version of kubernetes + +The `kubetest` binary can download and extract a specific version of kubernetes, +both the server, client and test binaries. The `--extract=E` flag enables this +functionality. + +There are a variety of values to pass this flag: + +```sh +# Official builds: <ci|release>/<latest|stable>[-N.N] +kubetest --extract=ci/latest --up # Deploy the latest ci build. +kubetest --extract=ci/latest-1.5 --up # Deploy the latest 1.5 CI build. +kubetest --extract=release/latest --up # Deploy the latest RC. +kubetest --extract=release/stable-1.5 --up # Deploy the 1.5 release. + +# A specific version: +kubetest --extract=v1.5.1 --up # Deploy 1.5.1 +kubetest --extract=v1.5.2-beta.0 --up # Deploy 1.5.2-beta.0 +kubetest --extract=gs://foo/bar --up # --stage=gs://foo/bar + +# Whatever GKE is using (gke, gke-staging, gke-test): +kubetest --extract=gke --up # Deploy whatever GKE prod uses + +# Using a GCI version: +kubetest --extract=gci/gci-canary --up # Deploy the version for next gci release +kubetest --extract=gci/gci-57 # Deploy the version bound to gci m57 +kubetest --extract=gci/gci-57/ci/latest # Deploy the latest CI build using gci m57 for the VM image + +# Reuse whatever is already built +kubetest --up # Most common. Note, no extract flag +kubetest --build --up # Most common. Note, no extract flag +kubetest --build --stage=gs://foo/bar --extract=local --up # Extract the staged version +``` + +### Bringing up a cluster for testing + +If you want, you may bring up a cluster in some other manner and run tests +against it. To do so, or to do other non-standard test things, you can pass +arguments into Ginkgo using `--test_args` (e.g. see above). For the purposes of +brevity, we will look at a subset of the options, which are listed below: + +``` +--ginkgo.dryRun=false: If set, ginkgo will walk the test hierarchy without +actually running anything. + +--ginkgo.failFast=false: If set, ginkgo will stop running a test suite after a +failure occurs. + +--ginkgo.failOnPending=false: If set, ginkgo will mark the test suite as failed +if any specs are pending. + +--ginkgo.focus="": If set, ginkgo will only run specs that match this regular +expression. + +--ginkgo.noColor="n": If set to "y", ginkgo will not use color in the output + +--ginkgo.skip="": If set, ginkgo will only run specs that do not match this +regular expression. + +--ginkgo.trace=false: If set, default reporter prints out the full stack trace +when a failure occurs + +--ginkgo.v=false: If set, default reporter print out all specs as they begin. + +--host="": The host, or api-server, to connect to + +--kubeconfig="": Path to kubeconfig containing embedded authinfo. + +--provider="": The name of the Kubernetes provider (gce, gke, local, vagrant, +etc.) + +--repo-root="../../": Root directory of kubernetes repository, for finding test +files. +``` + +Prior to running the tests, you may want to first create a simple auth file in +your home directory, e.g. `$HOME/.kube/config`, with the following: + +``` +{ + "User": "root", + "Password": "" +} +``` + +As mentioned earlier there are a host of other options that are available, but +they are left to the developer. + +**NOTE:** If you are running tests on a local cluster repeatedly, you may need +to periodically perform some manual cleanup: + + - `rm -rf /var/run/kubernetes`, clear kube generated credentials, sometimes +stale permissions can cause problems. + + - `sudo iptables -F`, clear ip tables rules left by the kube-proxy. + +### Reproducing failures in flaky tests +You can run a test repeatedly until it fails. This is useful when debugging +flaky tests. In order to do so, you need to set the following environment +variable: +```sh +$ export GINKGO_UNTIL_IT_FAILS=true +``` + +After setting the environment variable, you can run the tests as before. The e2e +script adds `--untilItFails=true` to ginkgo args if the environment variable is +set. The flags asks ginkgo to run the test repeatedly until it fails. + +### Federation e2e tests + +By default, `e2e.go` provisions a single Kubernetes cluster, and any `Feature:Federation` ginkgo tests will be skipped. + +Federation e2e testing involve bringing up multiple "underlying" Kubernetes clusters, +and deploying the federation control plane as a Kubernetes application on the underlying clusters. + +The federation e2e tests are still managed via `e2e.go`, but require some extra configuration items. + +#### Configuring federation e2e tests + +The following environment variables will enable federation e2e building, provisioning and testing. + +```sh +$ export FEDERATION=true +$ export E2E_ZONES="us-central1-a us-central1-b us-central1-f" +``` + +A Kubernetes cluster will be provisioned in each zone listed in `E2E_ZONES`. A zone can only appear once in the `E2E_ZONES` list. + +#### Image Push Repository + +Next, specify the docker repository where your ci images will be pushed. + +* **If `--provider=gce` or `--provider=gke`**: + + If you use the same GCP project where you to run the e2e tests as the container image repository, + FEDERATION_PUSH_REPO_BASE environment variable will be defaulted to "gcr.io/${DEFAULT_GCP_PROJECT_NAME}". + You can skip ahead to the **Build** section. + + You can simply set your push repo base based on your project name, and the necessary repositories will be + auto-created when you first push your container images. + + ```sh + $ export FEDERATION_PUSH_REPO_BASE="gcr.io/${GCE_PROJECT_NAME}" + ``` + + Skip ahead to the **Build** section. + +* **For all other providers**: + + You'll be responsible for creating and managing access to the repositories manually. + + ```sh + $ export FEDERATION_PUSH_REPO_BASE="quay.io/colin_hom" + ``` + + Given this example, the `federation-apiserver` container image will be pushed to the repository + `quay.io/colin_hom/federation-apiserver`. + + The docker client on the machine running `e2e.go` must have push access for the following pre-existing repositories: + + * `${FEDERATION_PUSH_REPO_BASE}/federation-apiserver` + * `${FEDERATION_PUSH_REPO_BASE}/federation-controller-manager` + + These repositories must allow public read access, as the e2e node docker daemons will not have any credentials. If you're using + GCE/GKE as your provider, the repositories will have read-access by default. + +#### Build + +* Compile the binaries and build container images: + + ```sh + $ KUBE_RELEASE_RUN_TESTS=n KUBE_FASTBUILD=true kubetest -build + ``` + +* Push the federation container images + + ```sh + $ federation/develop/push-federation-images.sh + ``` + +#### Deploy federation control plane + +The following command will create the underlying Kubernetes clusters in each of `E2E_ZONES`, and then provision the +federation control plane in the cluster occupying the last zone in the `E2E_ZONES` list. + +```sh +$ kubetest --up +``` + +#### Run the Tests + +This will run only the `Feature:Federation` e2e tests. You can omit the `ginkgo.focus` argument to run the entire e2e suite. + +```sh +$ kubetest --test --test_args="--ginkgo.focus=\[Feature:Federation\]" +``` + +#### Teardown + +```sh +$ kubetest --down +``` + +#### Shortcuts for test developers + +* To speed up `--up`, provision a single-node kubernetes cluster in a single e2e zone: + + `NUM_NODES=1 E2E_ZONES="us-central1-f"` + + Keep in mind that some tests may require multiple underlying clusters and/or minimum compute resource availability. + +* If you're hacking around with the federation control plane deployment itself, + you can quickly re-deploy the federation control plane Kubernetes manifests without tearing any resources down. + To re-deploy the federation control plane after running `--up` for the first time: + + ```sh + $ federation/cluster/federation-up.sh + ``` + +### Debugging clusters + +If a cluster fails to initialize, or you'd like to better understand cluster +state to debug a failed e2e test, you can use the `cluster/log-dump.sh` script +to gather logs. + +This script requires that the cluster provider supports ssh. Assuming it does, +running: + +```sh +$ federation/cluster/log-dump.sh <directory> +``` + +will ssh to the master and all nodes and download a variety of useful logs to +the provided directory (which should already exist). + +The Google-run Jenkins builds automatically collected these logs for every +build, saving them in the `artifacts` directory uploaded to GCS. + +### Local clusters + +It can be much faster to iterate on a local cluster instead of a cloud-based +one. To start a local cluster, you can run: + +```sh +# The PATH construction is needed because PATH is one of the special-cased +# environment variables not passed by sudo -E +sudo PATH=$PATH hack/local-up-cluster.sh +``` + +This will start a single-node Kubernetes cluster than runs pods using the local +docker daemon. Press Control-C to stop the cluster. + +You can generate a valid kubeconfig file by following instructions printed at the +end of aforementioned script. + +#### Testing against local clusters + +In order to run an E2E test against a locally running cluster, first make sure +to have a local build of the tests: + +```sh +kubetest --build +``` + +Then point the tests at a custom host directly: + +```sh +export KUBECONFIG=/path/to/kubeconfig +kubetest --provider=local --test +``` + +To control the tests that are run: + +```sh +kubetest --provider=local --test --test_args="--ginkgo.focus=Secrets" +``` + +You will also likely need to specify `minStartupPods` to match the number of +nodes in your cluster. If you're testing against a cluster set up by +`local-up-cluster.sh`, you will need to do the following: + +```sh +kubetest --provider=local --test --test_args="--minStartupPods=1 --ginkgo.focus=Secrets" +``` + +### Version-skewed and upgrade testing + +We run version-skewed tests to check that newer versions of Kubernetes work +similarly enough to older versions. The general strategy is to cover the following cases: + +1. One version of `kubectl` with another version of the cluster and tests (e.g. + that v1.2 and v1.4 `kubectl` doesn't break v1.3 tests running against a v1.3 + cluster). +1. A newer version of the Kubernetes master with older nodes and tests (e.g. + that upgrading a master to v1.3 with nodes at v1.2 still passes v1.2 tests). +1. A newer version of the whole cluster with older tests (e.g. that a cluster + upgraded---master and nodes---to v1.3 still passes v1.2 tests). +1. That an upgraded cluster functions the same as a brand-new cluster of the + same version (e.g. a cluster upgraded to v1.3 passes the same v1.3 tests as + a newly-created v1.3 cluster). + +[kubetest](https://git.k8s.io/test-infra/kubetest) is +the authoritative source on how to run version-skewed tests, but below is a +quick-and-dirty tutorial. + +```sh +# Assume you have two copies of the Kubernetes repository checked out, at +# ./kubernetes and ./kubernetes_old + +# If using GKE: +export CLUSTER_API_VERSION=${OLD_VERSION} + +# Deploy a cluster at the old version; see above for more details +cd ./kubernetes_old +kubetest --up + +# Upgrade the cluster to the new version +# +# If using GKE, add --upgrade-target=${NEW_VERSION} +# +# You can target Feature:MasterUpgrade or Feature:ClusterUpgrade +cd ../kubernetes +kubetest --provider=gke --test --check-version-skew=false --test_args="--ginkgo.focus=\[Feature:MasterUpgrade\]" + +# Run old tests with new kubectl +cd ../kubernetes_old +kubetest --provider=gke --test --test_args="--kubectl-path=$(pwd)/../kubernetes/cluster/kubectl.sh" +``` + +If you are just testing version-skew, you may want to just deploy at one +version and then test at another version, instead of going through the whole +upgrade process: + +```sh +# With the same setup as above + +# Deploy a cluster at the new version +cd ./kubernetes +kubetest --up + +# Run new tests with old kubectl +kubetest --test --test_args="--kubectl-path=$(pwd)/../kubernetes_old/cluster/kubectl.sh" + +# Run old tests with new kubectl +cd ../kubernetes_old +kubetest --test --test_args="--kubectl-path=$(pwd)/../kubernetes/cluster/kubectl.sh" +``` + +#### Test jobs naming convention + +**Version skew tests** are named as +`<cloud-provider>-<master&node-version>-<kubectl-version>-<image-name>-kubectl-skew` +e.g: `gke-1.5-1.6-cvm-kubectl-skew` means cloud provider is GKE; +master and nodes are built from `release-1.5` branch; +`kubectl` is built from `release-1.6` branch; +image name is cvm (container_vm). +The test suite is always the older one in version skew tests. e.g. from release-1.5 in this case. + +**Upgrade tests**: + +If a test job name ends with `upgrade-cluster`, it means we first upgrade +the cluster (i.e. master and nodes) and then run the old test suite with new kubectl. + +If a test job name ends with `upgrade-cluster-new`, it means we first upgrade +the cluster (i.e. master and nodes) and then run the new test suite with new kubectl. + +If a test job name ends with `upgrade-master`, it means we first upgrade +the master and keep the nodes in old version and then run the old test suite with new kubectl. + +There are some examples in the table, +where `->` means upgrading; container_vm (cvm) and gci are image names. + +| test name | test suite | master version (image) | node version (image) | kubectl +| --------- | :--------: | :----: | :---:| :---: +| gce-1.5-1.6-upgrade-cluster | 1.5 | 1.5->1.6 | 1.5->1.6 | 1.6 +| gce-1.5-1.6-upgrade-cluster-new | 1.6 | 1.5->1.6 | 1.5->1.6 | 1.6 +| gce-1.5-1.6-upgrade-master | 1.5 | 1.5->1.6 | 1.5 | 1.6 +| gke-container_vm-1.5-container_vm-1.6-upgrade-cluster | 1.5 | 1.5->1.6 (cvm) | 1.5->1.6 (cvm) | 1.6 +| gke-gci-1.5-container_vm-1.6-upgrade-cluster-new | 1.6 | 1.5->1.6 (gci) | 1.5->1.6 (cvm) | 1.6 +| gke-gci-1.5-container_vm-1.6-upgrade-master | 1.5 | 1.5->1.6 (gci) | 1.5 (cvm) | 1.6 + +## Kinds of tests + +We are working on implementing clearer partitioning of our e2e tests to make +running a known set of tests easier (#10548). Tests can be labeled with any of +the following labels, in order of increasing precedence (that is, each label +listed below supersedes the previous ones): + + - If a test has no labels, it is expected to run fast (under five minutes), be +able to be run in parallel, and be consistent. + + - `[Slow]`: If a test takes more than five minutes to run (by itself or in +parallel with many other tests), it is labeled `[Slow]`. This partition allows +us to run almost all of our tests quickly in parallel, without waiting for the +stragglers to finish. + + - `[Serial]`: If a test cannot be run in parallel with other tests (e.g. it +takes too many resources or restarts nodes), it is labeled `[Serial]`, and +should be run in serial as part of a separate suite. + + - `[Disruptive]`: If a test restarts components that might cause other tests +to fail or break the cluster completely, it is labeled `[Disruptive]`. Any +`[Disruptive]` test is also assumed to qualify for the `[Serial]` label, but +need not be labeled as both. These tests are not run against soak clusters to +avoid restarting components. + + - `[Flaky]`: If a test is found to be flaky and we have decided that it's too +hard to fix in the short term (e.g. it's going to take a full engineer-week), it +receives the `[Flaky]` label until it is fixed. The `[Flaky]` label should be +used very sparingly, and should be accompanied with a reference to the issue for +de-flaking the test, because while a test remains labeled `[Flaky]`, it is not +monitored closely in CI. `[Flaky]` tests are by default not run, unless a +`focus` or `skip` argument is explicitly given. + + - `[Feature:.+]`: If a test has non-default requirements to run or targets +some non-core functionality, and thus should not be run as part of the standard +suite, it receives a `[Feature:.+]` label, e.g. `[Feature:Performance]` or +`[Feature:Ingress]`. `[Feature:.+]` tests are not run in our core suites, +instead running in custom suites. If a feature is experimental or alpha and is +not enabled by default due to being incomplete or potentially subject to +breaking changes, it does *not* block PR merges, and thus should run in +some separate test suites owned by the feature owner(s) +(see [Continuous Integration](#continuous-integration) below). + + - `[Conformance]`: Designate that this test is included in the Conformance +test suite for [Conformance Testing](../sig-architecture/conformance-tests.md). This test must +meet a number of [requirements](../sig-architecture/conformance-tests.md#conformance-test-requirements) +to be eligible for this tag. This tag does not supersed any other labels. + + - `[LinuxOnly]`: If a test is known to be using Linux-specific features +(e.g.: seLinuxOptions) or is unable to run on Windows nodes, it is labeled +`[LinuxOnly]`. When using Windows nodes, this tag should be added to the +`skip` argument. + + - The following tags are not considered to be exhaustively applied, but are +intended to further categorize existing `[Conformance]` tests, or tests that are +being considered as candidate for promotion to `[Conformance]` as we work to +refine requirements: + - `[Privileged]`: This is a test that requires privileged access + - `[Internet]`: This is a test that assumes access to the public internet + - `[Deprecated]`: This is a test that exercises a deprecated feature + - `[Alpha]`: This is a test that exercises an alpha feature + - `[Beta]`: This is a test that exercises a beta feature + +Every test should be owned by a [SIG](/sig-list.md), +and have a corresponding `[sig-<name>]` label. + +### Viper configuration and hierarchichal test parameters. + +The future of e2e test configuration idioms will be increasingly defined using viper, and decreasingly via flags. + +Flags in general fall apart once tests become sufficiently complicated. So, even if we could use another flag library, it wouldn't be ideal. + +To use viper, rather than flags, to configure your tests: + +- Just add "e2e.json" to the current directory you are in, and define parameters in it... i.e. `"kubeconfig":"/tmp/x"`. + +Note that advanced testing parameters, and hierarchichally defined parameters, are only defined in viper, to see what they are, you can dive into [TestContextType](https://git.k8s.io/kubernetes/test/e2e/framework/test_context.go). + +In time, it is our intent to add or autogenerate a sample viper configuration that includes all e2e parameters, to ship with kubernetes. + +### Conformance tests + +For more information on Conformance tests please see the [Conformance Testing](../sig-architecture/conformance-tests.md) + +## Continuous Integration + +A quick overview of how we run e2e CI on Kubernetes. + +### What is CI? + +We run a battery of [release-blocking jobs](https://k8s-testgrid.appspot.com/sig-release-master-blocking) +against `HEAD` of the master branch on a continuous basis, and block merges +via [Tide](https://git.k8s.io/test-infra/prow/cmd/tide) on a subset of those +tests if they fail. + +CI results can be found at [ci-test.k8s.io](http://ci-test.k8s.io), e.g. +[ci-test.k8s.io/kubernetes-e2e-gce/10594](http://ci-test.k8s.io/kubernetes-e2e-gce/10594). + +### What runs in CI? + +We run all default tests (those that aren't marked `[Flaky]` or `[Feature:.+]`) +against GCE and GKE. To minimize the time from regression-to-green-run, we +partition tests across different jobs: + + - `kubernetes-e2e-<provider>` runs all non-`[Slow]`, non-`[Serial]`, +non-`[Disruptive]`, non-`[Flaky]`, non-`[Feature:.+]` tests in parallel. + + - `kubernetes-e2e-<provider>-slow` runs all `[Slow]`, non-`[Serial]`, +non-`[Disruptive]`, non-`[Flaky]`, non-`[Feature:.+]` tests in parallel. + + - `kubernetes-e2e-<provider>-serial` runs all `[Serial]` and `[Disruptive]`, +non-`[Flaky]`, non-`[Feature:.+]` tests in serial. + +We also run non-default tests if the tests exercise general-availability ("GA") +features that require a special environment to run in, e.g. +`kubernetes-e2e-gce-scalability` and `kubernetes-kubemark-gce`, which test for +Kubernetes performance. + +#### Non-default tests + +Many `[Feature:.+]` tests we don't run in CI. These tests are for features that +are experimental (often in the `experimental` API), and aren't enabled by +default. + +### The PR-builder + +We also run a battery of tests against every PR before we merge it. These tests +are equivalent to `kubernetes-gce`: it runs all non-`[Slow]`, non-`[Serial]`, +non-`[Disruptive]`, non-`[Flaky]`, non-`[Feature:.+]` tests in parallel. These +tests are considered "smoke tests" to give a decent signal that the PR doesn't +break most functionality. Results for your PR can be found at +[pr-test.k8s.io](http://pr-test.k8s.io), e.g. +[pr-test.k8s.io/20354](http://pr-test.k8s.io/20354) for #20354. + +### Adding a test to CI + +As mentioned above, prior to adding a new test, it is a good idea to perform a +`-ginkgo.dryRun=true` on the system, in order to see if a behavior is already +being tested, or to determine if it may be possible to augment an existing set +of tests for a specific use case. + +If a behavior does not currently have coverage and a developer wishes to add a +new e2e test, navigate to the ./test/e2e directory and create a new test using +the existing suite as a guide. + +**NOTE:** To build/run with tests in a new directory within ./test/e2e, add the +directory to import list in ./test/e2e/e2e_test.go + +TODO(#20357): Create a self-documented example which has been disabled, but can +be copied to create new tests and outlines the capabilities and libraries used. + +When writing a test, consult #kinds-of-tests above to determine how your test +should be marked, (e.g. `[Slow]`, `[Serial]`; remember, by default we assume a +test can run in parallel with other tests!). + +When first adding a test it should *not* go straight into CI, because failures +block ordinary development. A test should only be added to CI after is has been +running in some non-CI suite long enough to establish a track record showing +that the test does not fail when run against *working* software. Note also that +tests running in CI are generally running on a well-loaded cluster, so must +contend for resources; see above about [kinds of tests](#kinds_of_tests). + +Generally, a feature starts as `experimental`, and will be run in some suite +owned by the team developing the feature. If a feature is in beta or GA, it +*should* block PR merges and releases. In moving from experimental to beta or GA, tests +that are expected to pass by default should simply remove the `[Feature:.+]` +label, and will be incorporated into our core suites. If tests are not expected +to pass by default, (e.g. they require a special environment such as added +quota,) they should remain with the `[Feature:.+]` label. + +Occasionally, we'll want to add tests to better exercise features that are +already GA. These tests also shouldn't go straight to CI. They should begin by +being marked as `[Flaky]` to be run outside of CI, and once a track-record for +them is established, they may be promoted out of `[Flaky]`. + +### Moving a test out of CI + +If we have determined that a test is known-flaky and cannot be fixed in the +short-term, we may move it out of CI indefinitely. This move should be used +sparingly, as it effectively means that we have no coverage of that test. When a +test is demoted, it should be marked `[Flaky]` with a comment accompanying the +label with a reference to an issue opened to fix the test. + +## Performance Evaluation + +Another benefit of the e2e tests is the ability to create reproducible loads on +the system, which can then be used to determine the responsiveness, or analyze +other characteristics of the system. For example, the density tests load the +system to 30,50,100 pods per/node and measures the different characteristics of +the system, such as throughput, api-latency, etc. + +For a good overview of how we analyze performance data, please read the +following [post](https://kubernetes.io/blog/2015/09/kubernetes-performance-measurements-and/) + +For developers who are interested in doing their own performance analysis, we +recommend setting up [prometheus](http://prometheus.io/) for data collection, +and using [grafana](https://prometheus.io/docs/visualization/grafana/) to +visualize the data. There also exists the option of pushing your own metrics in +from the tests using a +[prom-push-gateway](http://prometheus.io/docs/instrumenting/pushing/). +Containers for all of these components can be found +[here](https://hub.docker.com/u/prom/). + +For more accurate measurements, you may wish to set up prometheus external to +kubernetes in an environment where it can access the major system components +(api-server, controller-manager, scheduler). This is especially useful when +attempting to gather metrics in a load-balanced api-server environment, because +all api-servers can be analyzed independently as well as collectively. On +startup, configuration file is passed to prometheus that specifies the endpoints +that prometheus will scrape, as well as the sampling interval. + +``` +#prometheus.conf +job: { + name: "kubernetes" + scrape_interval: "1s" + target_group: { + # apiserver(s) + target: "http://localhost:8080/metrics" + # scheduler + target: "http://localhost:10251/metrics" + # controller-manager + target: "http://localhost:10252/metrics" + } +} +``` + +Once prometheus is scraping the kubernetes endpoints, that data can then be +plotted using promdash, and alerts can be created against the assortment of +metrics that kubernetes provides. + +## One More Thing + +You should also know the [testing conventions](../../guide/coding-conventions.md#testing-conventions). + +**HAPPY TESTING!** diff --git a/contributors/devel/sig-testing/flaky-tests.md b/contributors/devel/sig-testing/flaky-tests.md new file mode 100644 index 00000000..14302592 --- /dev/null +++ b/contributors/devel/sig-testing/flaky-tests.md @@ -0,0 +1,201 @@ +# Flaky tests + +Any test that fails occasionally is "flaky". Since our merges only proceed when +all tests are green, and we have a number of different CI systems running the +tests in various combinations, even a small percentage of flakes results in a +lot of pain for people waiting for their PRs to merge. + +Therefore, it's very important that we write tests defensively. Situations that +"almost never happen" happen with some regularity when run thousands of times in +resource-constrained environments. Since flakes can often be quite hard to +reproduce while still being common enough to block merges occasionally, it's +additionally important that the test logs be useful for narrowing down exactly +what caused the failure. + +Note that flakes can occur in unit tests, integration tests, or end-to-end +tests, but probably occur most commonly in end-to-end tests. + +## Hunting Flakes + +You may notice lots of your PRs or ones you watch are having a common +pre-submit failure, but less frequent issues that are still of concern take +more analysis over time. There are metrics recorded and viewable in: +- [TestGrid](https://k8s-testgrid.appspot.com/presubmits-kubernetes-blocking#Summary) +- [Velodrome](http://velodrome.k8s.io/dashboard/db/bigquery-metrics?orgId=1) + +It is worth noting tests are going to fail in presubmit a lot due +to unbuildable code, but that wont happen as much on the same commit unless +there's a true issue in the code or a broader problem like a dep failed to +pull in. + +## Filing issues for flaky tests + +Because flakes may be rare, it's very important that all relevant logs be +discoverable from the issue. + +1. Search for the test name. If you find an open issue and you're 90% sure the + flake is exactly the same, add a comment instead of making a new issue. +2. If you make a new issue, you should title it with the test name, prefixed by + "e2e/unit/integration flake:" (whichever is appropriate) +3. Reference any old issues you found in step one. Also, make a comment in the + old issue referencing your new issue, because people monitoring only their + email do not see the backlinks github adds. Alternatively, tag the person or + people who most recently worked on it. +4. Paste, in block quotes, the entire log of the individual failing test, not + just the failure line. +5. Link to durable storage with the rest of the logs. This means (for all the + tests that Google runs) the GCS link is mandatory! The Jenkins test result + link is nice but strictly optional: not only does it expire more quickly, + it's not accessible to non-Googlers. + +## Finding failed flaky test cases + +Find flaky tests issues on GitHub under the [kind/flake issue label][flake]. +There are significant numbers of flaky tests reported on a regular basis and P2 +flakes are under-investigated. Fixing flakes is a quick way to gain expertise +and community goodwill. + +[flake]: https://github.com/kubernetes/kubernetes/issues?q=is%3Aopen+is%3Aissue+label%3Akind%2Fflake + +## Expectations when a flaky test is assigned to you + +Note that we won't randomly assign these issues to you unless you've opted in or +you're part of a group that has opted in. We are more than happy to accept help +from anyone in fixing these, but due to the severity of the problem when merges +are blocked, we need reasonably quick turn-around time on test flakes. Therefore +we have the following guidelines: + +1. If a flaky test is assigned to you, it's more important than anything else + you're doing unless you can get a special dispensation (in which case it will + be reassigned). If you have too many flaky tests assigned to you, or you + have such a dispensation, then it's *still* your responsibility to find new + owners (this may just mean giving stuff back to the relevant Team or SIG Lead). +2. You should make a reasonable effort to reproduce it. Somewhere between an + hour and half a day of concentrated effort is "reasonable". It is perfectly + reasonable to ask for help! +3. If you can reproduce it (or it's obvious from the logs what happened), you + should then be able to fix it, or in the case where someone is clearly more + qualified to fix it, reassign it with very clear instructions. +4. Once you have made a change that you believe fixes a flake, it is conservative + to keep the issue for the flake open and see if it manifests again after the + change is merged. +5. If you can't reproduce a flake: __don't just close it!__ Every time a flake comes + back, at least 2 hours of merge time is wasted. So we need to make monotonic + progress towards narrowing it down every time a flake occurs. If you can't + figure it out from the logs, add log messages that would have help you figure + it out. If you make changes to make a flake more reproducible, please link + your pull request to the flake you're working on. +6. If a flake has been open, could not be reproduced, and has not manifested in + 3 months, it is reasonable to close the flake issue with a note saying + why. + +# Reproducing unit test flakes + +Try the [stress command](https://godoc.org/golang.org/x/tools/cmd/stress). + +Just + +``` +$ go install golang.org/x/tools/cmd/stress +``` + +Then build your test binary + +``` +$ go test -c -race +``` + +Then run it under stress + +``` +$ stress ./package.test -test.run=FlakyTest +``` + +It runs the command and writes output to `/tmp/gostress-*` files when it fails. +It periodically reports with run counts. Be careful with tests that use the +`net/http/httptest` package; they could exhaust the available ports on your +system! + +# Hunting flaky unit tests in Kubernetes + +Sometimes unit tests are flaky. This means that due to (usually) race +conditions, they will occasionally fail, even though most of the time they pass. + +We have a goal of 99.9% flake free tests. This means that there is only one +flake in one thousand runs of a test. + +Running a test 1000 times on your own machine can be tedious and time consuming. +Fortunately, there is a better way to achieve this using Kubernetes. + +_Note: these instructions are mildly hacky for now, as we get run once semantics +and logging they will get better_ + +There is a testing image `brendanburns/flake` up on the docker hub. We will use +this image to test our fix. + +Create a replication controller with the following config: + +```yaml +apiVersion: v1 +kind: ReplicationController +metadata: + name: flakecontroller +spec: + replicas: 24 + template: + metadata: + labels: + name: flake + spec: + containers: + - name: flake + image: brendanburns/flake + env: + - name: TEST_PACKAGE + value: pkg/tools + - name: REPO_SPEC + value: https://github.com/kubernetes/kubernetes +``` + +Note that we omit the labels and the selector fields of the replication +controller, because they will be populated from the labels field of the pod +template by default. + +```sh +kubectl create -f ./controller.yaml +``` + +This will spin up 24 instances of the test. They will run to completion, then +exit, and the kubelet will restart them, accumulating more and more runs of the +test. + +You can examine the recent runs of the test by calling `docker ps -a` and +looking for tasks that exited with non-zero exit codes. Unfortunately, docker +ps -a only keeps around the exit status of the last 15-20 containers with the +same image, so you have to check them frequently. + +You can use this script to automate checking for failures, assuming your cluster +is running on GCE and has four nodes: + +```sh +echo "" > output.txt +for i in {1..4}; do + echo "Checking kubernetes-node-${i}" + echo "kubernetes-node-${i}:" >> output.txt + gcloud compute ssh "kubernetes-node-${i}" --command="sudo docker ps -a" >> output.txt +done +grep "Exited ([^0])" output.txt +``` + +Eventually you will have sufficient runs for your purposes. At that point you +can delete the replication controller by running: + +```sh +kubectl delete replicationcontroller flakecontroller +``` + +If you do a final check for flakes with `docker ps -a`, ignore tasks that +exited -1, since that's what happens when you stop the replication controller. + +Happy flake hunting! + diff --git a/contributors/devel/gubernator-images/filterpage.png b/contributors/devel/sig-testing/gubernator-images/filterpage.png Binary files differindex 2d08bd8e..2d08bd8e 100644 --- a/contributors/devel/gubernator-images/filterpage.png +++ b/contributors/devel/sig-testing/gubernator-images/filterpage.png diff --git a/contributors/devel/gubernator-images/filterpage1.png b/contributors/devel/sig-testing/gubernator-images/filterpage1.png Binary files differindex 838cb0fa..838cb0fa 100644 --- a/contributors/devel/gubernator-images/filterpage1.png +++ b/contributors/devel/sig-testing/gubernator-images/filterpage1.png diff --git a/contributors/devel/gubernator-images/filterpage2.png b/contributors/devel/sig-testing/gubernator-images/filterpage2.png Binary files differindex 63da782e..63da782e 100644 --- a/contributors/devel/gubernator-images/filterpage2.png +++ b/contributors/devel/sig-testing/gubernator-images/filterpage2.png diff --git a/contributors/devel/gubernator-images/filterpage3.png b/contributors/devel/sig-testing/gubernator-images/filterpage3.png Binary files differindex 33066d78..33066d78 100644 --- a/contributors/devel/gubernator-images/filterpage3.png +++ b/contributors/devel/sig-testing/gubernator-images/filterpage3.png diff --git a/contributors/devel/gubernator-images/skipping1.png b/contributors/devel/sig-testing/gubernator-images/skipping1.png Binary files differindex a5dea440..a5dea440 100644 --- a/contributors/devel/gubernator-images/skipping1.png +++ b/contributors/devel/sig-testing/gubernator-images/skipping1.png diff --git a/contributors/devel/gubernator-images/skipping2.png b/contributors/devel/sig-testing/gubernator-images/skipping2.png Binary files differindex b133347e..b133347e 100644 --- a/contributors/devel/gubernator-images/skipping2.png +++ b/contributors/devel/sig-testing/gubernator-images/skipping2.png diff --git a/contributors/devel/gubernator-images/testfailures.png b/contributors/devel/sig-testing/gubernator-images/testfailures.png Binary files differindex 1b331248..1b331248 100644 --- a/contributors/devel/gubernator-images/testfailures.png +++ b/contributors/devel/sig-testing/gubernator-images/testfailures.png diff --git a/contributors/devel/sig-testing/gubernator.md b/contributors/devel/sig-testing/gubernator.md new file mode 100644 index 00000000..b03d11a1 --- /dev/null +++ b/contributors/devel/sig-testing/gubernator.md @@ -0,0 +1,136 @@ +# Gubernator + +*This document is oriented at developers who want to use Gubernator to debug while developing for Kubernetes.* + + +- [Gubernator](#gubernator) + - [What is Gubernator?](#what-is-gubernator) + - [Gubernator Features](#gubernator-features) + - [Test Failures list](#test-failures-list) + - [Log Filtering](#log-filtering) + - [Gubernator for Local Tests](#gubernator-for-local-tests) + - [Future Work](#future-work) + + +## What is Gubernator? + +[Gubernator](https://k8s-gubernator.appspot.com/) is a webpage for viewing and filtering Kubernetes +test results. + +Gubernator simplifies the debugging process and makes it easier to track down failures by automating many +steps commonly taken in searching through logs, and by offering tools to filter through logs to find relevant lines. +Gubernator automates the steps of finding the failed tests, displaying relevant logs, and determining the +failed pods and the corresponding pod UID, namespace, and container ID. +It also allows for filtering of the log files to display relevant lines based on selected keywords, and +allows for multiple logs to be woven together by timestamp. + +Gubernator runs on Google App Engine and fetches logs stored on Google Cloud Storage. + +## Gubernator Features + +### Test Failures list + +Comments made by k8s-ci-robot will post a link to a page listing the failed tests. +Each failed test comes with the corresponding error log from a junit file and a link +to filter logs for that test. + +Based on the message logged in the junit file, the pod name may be displayed. + + + +[Test Failures List Example](https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/kubernetes-e2e-gke/11721) + +### Log Filtering + +The log filtering page comes with checkboxes and textboxes to aid in filtering. Filtered keywords will be bolded +and lines including keywords will be highlighted. Up to four lines around the line of interest will also be displayed. + + + +If less than 100 lines are skipped, the "... skipping xx lines ..." message can be clicked to expand and show +the hidden lines. + +Before expansion: + +After expansion: + + +If the pod name was displayed in the Test Failures list, it will automatically be included in the filters. +If it is not found in the error message, it can be manually entered into the textbox. Once a pod name +is entered, the Pod UID, Namespace, and ContainerID may be automatically filled in as well. These can be +altered as well. To apply the filter, check off the options corresponding to the filter. + + + +To add a filter, type the term to be filtered into the textbox labeled "Add filter:" and press enter. +Additional filters will be displayed as checkboxes under the textbox. + + + +To choose which logs to view check off the checkboxes corresponding to the logs of interest. If multiple logs are +included, the "Weave by timestamp" option can weave the selected logs together based on the timestamp in each line. + + + +[Log Filtering Example 1](https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/kubelet-gce-e2e-ci/5535/nodelog?pod=pod-configmaps-b5b876cb-3e1e-11e6-8956-42010af0001d&junit=junit_03.xml&wrap=on&logfiles=%2Fkubernetes-jenkins%2Flogs%2Fkubelet-gce-e2e-ci%2F5535%2Fartifacts%2Ftmp-node-e2e-7a5a3b40-e2e-node-coreos-stable20160622-image%2Fkube-apiserver.log&logfiles=%2Fkubernetes-jenkins%2Flogs%2Fkubelet-gce-e2e-ci%2F5535%2Fartifacts%2Ftmp-node-e2e-7a5a3b40-e2e-node-coreos-stable20160622-image%2Fkubelet.log&UID=on&poduid=b5b8a59e-3e1e-11e6-b358-42010af0001d&ns=e2e-tests-configmap-oi12h&cID=tmp-node-e2e-7a5a3b40-e2e-node-coreos-stable20160622-image) + +[Log Filtering Example 2](https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/kubernetes-e2e-gke/11721/nodelog?pod=client-containers-a53f813c-503e-11e6-88dd-0242ac110003&junit=junit_19.xml&wrap=on) + + +### Gubernator for Local Tests + +*Currently Gubernator can only be used with remote node e2e tests.* + +**NOTE: Using Gubernator with local tests will publicly upload your test logs to Google Cloud Storage** + +To use Gubernator to view logs from local test runs, set the GUBERNATOR tag to true. +A URL link to view the test results will be printed to the console. +Please note that running with the Gubernator tag will bypass the user confirmation for uploading to GCS. + +```console + +$ make test-e2e-node REMOTE=true GUBERNATOR=true +... +================================================================ +Running gubernator.sh + +Gubernator linked below: +k8s-gubernator.appspot.com/build/yourusername-g8r-logs/logs/e2e-node/timestamp +``` + +The gubernator.sh script can be run after running a remote node e2e test for the same effect. + +```console +$ ./test/e2e_node/gubernator.sh +Do you want to run gubernator.sh and upload logs publicly to GCS? [y/n]y +... +Gubernator linked below: +k8s-gubernator.appspot.com/build/yourusername-g8r-logs/logs/e2e-node/timestamp +``` + +## Future Work + +Gubernator provides a framework for debugging failures and introduces useful features. +There is still a lot of room for more features and growth to make the debugging process more efficient. + +How to contribute (see https://git.k8s.io/test-infra/gubernator/README.md) + +* Extend GUBERNATOR flag to all local tests + +* More accurate identification of pod name, container ID, etc. + * Change content of logged strings for failures to include more information + * Better regex in Gubernator + +* Automate discovery of more keywords + * Volume Name + * Disk Name + * Pod IP + +* Clickable API objects in the displayed lines in order to add them as filters + +* Construct story of pod's lifetime + * Have concise view of what a pod went through from when pod was started to failure + +* Improve UI + * Have separate folders of logs in rows instead of in one long column + * Improve interface for adding additional features (maybe instead of textbox and checkbox, have chips) diff --git a/contributors/devel/sig-testing/integration-tests.md b/contributors/devel/sig-testing/integration-tests.md new file mode 100644 index 00000000..ac5aee9f --- /dev/null +++ b/contributors/devel/sig-testing/integration-tests.md @@ -0,0 +1,78 @@ +# Integration Testing in Kubernetes + +**Table of Contents** + +- [Integration testing in Kubernetes](#integration-tests) + - [Install etcd dependency](#install-etcd-dependency) + - [Etcd test data](#etcd-test-data) + - [Run integration tests](#run-integration-tests) + - [Run a specific integration test](#run-a-specific-integration-test) + +This assumes you already read the [testing guide](testing.md). + +## Integration tests + +* Integration tests should only access other resources on the local machine + - Most commonly etcd or a service listening on localhost. +* All significant features require integration tests. + - This includes kubectl commands +* The preferred method of testing multiple scenarios or inputs +is [table driven testing](https://github.com/golang/go/wiki/TableDrivenTests) + - Example: [TestNamespaceAuthorization](https://git.k8s.io/kubernetes/test/integration/auth/auth_test.go) +* Each test should create its own master, httpserver and config. + - Example: [TestPodUpdateActiveDeadlineSeconds](https://git.k8s.io/kubernetes/test/integration/pods/pods_test.go) +* See [coding conventions](../../guide/coding-conventions.md). + +### Install etcd dependency + +Kubernetes integration tests require your `PATH` to include an +[etcd](https://github.com/coreos/etcd/releases) installation. Kubernetes +includes a script to help install etcd on your machine. + +```sh +# Install etcd and add to PATH + +# Option a) install inside kubernetes root +hack/install-etcd.sh # Installs in ./third_party/etcd +echo export PATH="\$PATH:$(pwd)/third_party/etcd" >> ~/.profile # Add to PATH + +# Option b) install manually +grep -E "image.*etcd" cluster/gce/manifests/etcd.manifest # Find version +# Install that version using yum/apt-get/etc +echo export PATH="\$PATH:<LOCATION>" >> ~/.profile # Add to PATH +``` + +### Etcd test data + +Many tests start an etcd server internally, storing test data in the operating system's temporary directory. + +If you see test failures because the temporary directory does not have sufficient space, +or is on a volume with unpredictable write latency, you can override the test data directory +for those internal etcd instances with the `TEST_ETCD_DIR` environment variable. + +### Run integration tests + +The integration tests are run using `make test-integration`. +The Kubernetes integration tests are written using the normal golang testing +package but expect to have a running etcd instance to connect to. The `test-integration.sh` +script wraps `make test` and sets up an etcd instance for the integration tests to use. + +```sh +make test-integration # Run all integration tests. +``` + +This script runs the golang tests in package +[`test/integration`](https://git.k8s.io/kubernetes/test/integration). + +### Run a specific integration test + +You can also use the `KUBE_TEST_ARGS` environment variable with the `make test-integration` +to run a specific integration test case: + +```sh +# Run integration test TestPodUpdateActiveDeadlineSeconds with the verbose flag set. +make test-integration WHAT=./test/integration/pods GOFLAGS="-v" KUBE_TEST_ARGS="-run ^TestPodUpdateActiveDeadlineSeconds$" +``` + +If you set `KUBE_TEST_ARGS`, the test case will be run with only the `v1` API +version and the watch cache test is skipped. diff --git a/contributors/devel/sig-testing/testing.md b/contributors/devel/sig-testing/testing.md new file mode 100644 index 00000000..fd9d9427 --- /dev/null +++ b/contributors/devel/sig-testing/testing.md @@ -0,0 +1,257 @@ +# Testing guide + +**Table of Contents** + +- [Testing guide](#testing-guide) + - [Unit tests](#unit-tests) + - [Run all unit tests](#run-all-unit-tests) + - [Set go flags during unit tests](#set-go-flags-during-unit-tests) + - [Run unit tests from certain packages](#run-unit-tests-from-certain-packages) + - [Run specific unit test cases in a package](#run-specific-unit-test-cases-in-a-package) + - [Stress running unit tests](#stress-running-unit-tests) + - [Unit test coverage](#unit-test-coverage) + - [Benchmark unit tests](#benchmark-unit-tests) + - [Integration tests](#integration-tests) + - [End-to-End tests](#end-to-end-tests) + + +This assumes you already read the [development guide](../development.md) to +install go, godeps, and configure your git client. All command examples are +relative to the `kubernetes` root directory. + +Before sending pull requests you should at least make sure your changes have +passed both unit and integration tests. + +Kubernetes only merges pull requests when unit, integration, and e2e tests are +passing, so it is often a good idea to make sure the e2e tests work as well. + +## Unit tests + +* Unit tests should be fully hermetic + - Only access resources in the test binary. +* All packages and any significant files require unit tests. +* The preferred method of testing multiple scenarios or input is + [table driven testing](https://github.com/golang/go/wiki/TableDrivenTests) + - Example: [TestNamespaceAuthorization](https://git.k8s.io/kubernetes/test/integration/auth/auth_test.go) +* Unit tests must pass on macOS and Windows platforms. + - Tests using linux-specific features must be skipped or compiled out. + - Skipped is better, compiled out is required when it won't compile. +* Concurrent unit test runs must pass. +* See [coding conventions](../../guide/coding-conventions.md). + +### Run all unit tests + +`make test` is the entrypoint for running the unit tests that ensures that +`GOPATH` is set up correctly. + +```sh +cd kubernetes +make test # Run all unit tests. +``` +If you have `GOPATH` set up correctly, you can +also just use `go test` directly. + +```sh +cd kubernetes +go test ./... # Run all unit tests +``` + +The remainder of this documentation presumes that you use `Make` as an +entry point, but remember that the ability to use `go test` exists should you +desire. + +If any unit test fails with a timeout panic (see [#1594](https://github.com/kubernetes/community/issues/1594)) on the testing package, you can increase the `KUBE_TIMEOUT` value as shown below. + +```sh +make test KUBE_TIMEOUT="-timeout 300s" +``` + +### Set go flags during unit tests + +You can set [go flags](https://golang.org/cmd/go/) by setting the +`GOFLAGS` environment variable. + +### Run unit tests from certain packages + +`make test` accepts packages as arguments; the `k8s.io/kubernetes` prefix is +added automatically to these: + +```sh +make test WHAT=./pkg/kubelet # run tests for pkg/kubelet +``` + +Expressed strictly with `go test`, the above command is equivalent to the following: + +```sh +go test ./pkg/kubelet +``` + +To run tests for a package and all of its subpackages, you need to append `...` +to the package path: + +```sh +make test WHAT=./pkg/api/... # run tests for pkg/api and all its subpackages +``` + +To run multiple targets you need quotes: + +```sh +make test WHAT="./pkg/kubelet ./pkg/scheduler" # run tests for pkg/kubelet and pkg/scheduler +``` + +In a shell, it's often handy to use brace expansion: + +```sh +make test WHAT=./pkg/{kubelet,scheduler} # run tests for pkg/kubelet and +pkg/scheduler +``` + +### Run specific unit test cases in a package + +You can set the test args using the `KUBE_TEST_ARGS` environment variable. +You can use this to pass the `-run` argument to `go test`, which accepts a +regular expression for the name of the test that should be run. + +```sh +# Runs TestValidatePod in pkg/api/validation with the verbose flag set +make test WHAT=./pkg/apis/core/validation GOFLAGS="-v" KUBE_TEST_ARGS='-run ^TestValidatePod$' + +# Runs tests that match the regex ValidatePod|ValidateConfigMap in pkg/api/validation +make test WHAT=./pkg/apis/core/validation GOFLAGS="-v" KUBE_TEST_ARGS="-run ValidatePod\|ValidateConfigMap$" +``` + +Or if we are using `go test` as our entry point, we could run: + +```sh +go test ./pkg/apis/core/validation -v -run ^TestValidatePods$ +``` + +For other supported test flags, see the [golang +documentation](https://golang.org/cmd/go/#hdr-Testing_flags). + +### Stress running unit tests + +Running the same tests repeatedly is one way to root out flakes. +You can do this efficiently. + +```sh +# Have 2 workers run all tests 5 times each (10 total iterations). +make test PARALLEL=2 ITERATION=5 +``` + +For more advanced ideas please see [flaky-tests.md](flaky-tests.md). + +### Unit test coverage + +Currently, collecting coverage is only supported for the Go unit tests. + +To run all unit tests and generate an HTML coverage report, run the following: + +```sh +make test KUBE_COVER=y +``` + +At the end of the run, an HTML report will be generated with the path +printed to stdout. + +To run tests and collect coverage in only one package, pass its relative path +under the `kubernetes` directory as an argument, for example: + +```sh +make test WHAT=./pkg/kubectl KUBE_COVER=y +``` + +Multiple arguments can be passed, in which case the coverage results will be +combined for all tests run. + +### Benchmark unit tests + +To run benchmark tests, you'll typically use something like: + +```sh +make test WHAT=./pkg/scheduler/internal/cache KUBE_TEST_ARGS='-benchmem -run=XXX -bench=BenchmarkExpirePods' +``` + +Alternatively, to express in pure Go, you could write the following: + +```sh +go test ./pkg/scheduler/internal/cache -benchmem -run=XXX -bench=Benchmark +``` + +This will do the following: + +1. `-run=XXX` is a regular expression filter on the name of test cases to run. + Go will execute both the tests matching the `-bench` regex and the `-run` + regex. Since we only want to execute benchmark tests, we set the `-run` regex + to XXX, which will not match any tests. +2. `-bench=Benchmark` will run test methods with Benchmark in the name + * See `grep -nr Benchmark .` for examples +3. `-benchmem` enables memory allocation stats + +See `go help test` and `go help testflag` for additional info. + +## Integration tests + +Please refer to [Integration Testing in Kubernetes](integration-tests.md). + +## End-to-End tests + +Please refer to [End-to-End Testing in Kubernetes](e2e-tests.md). + +## Running your contribution in the Kubernetes CI +Once you open a PR, [`prow`][prow-doc] runs pre-submit tests in CI. + +If you are not a [Kubernetes org member][member], another org member will need to run [`/ok-to-test`][ok-to-test] on your PR. + +Find out more about [other commands][prow-cmds] you can use to interact with prow through GitHub comments. + +### Troubleshooting a failure +Click on `Details` and look at the [`gubernator`](gubernator.k8s.io/) output for the test. + +If the failure seems unrelated to the change you're submitting: +- Is it a flake? + - Check if an issue has already been opened for that flake + - If not, open a new one (like [this example][new-issue-example]) and [label it `kind/flake`][kind/flake] + - Run [`/retest`][retest] on your PR to re-trigger the tests + +- Is it a failure that shouldn't be happening (in other words, is the test now wrong) + - Get in touch with the SIG + - preferably as a comment on your PR, by tagging the [team][k-teams] (for example a [reviewers team for the SIG][k-teams-review]) + - if you don't get a response in 24h, engage with the SIG on their channel on the Kubernetes slack and/or attend one of the [SIG meetings][sig-meetings] to ask for input. + +[prow-doc]: https://kubernetes.io/blog/2018/08/29/the-machines-can-do-the-work-a-story-of-kubernetes-testing-ci-and-automating-the-contributor-experience/#enter-prow +[member]: https://github.com/kubernetes/community/blob/master/community-membership.md#member +[k-teams]: https://github.com/orgs/kubernetes/teams +[k-teams-review]: https://github.com/orgs/kubernetes/teams?utf8=%E2%9C%93&query=review +[ok-to-test]: https://prow.k8s.io/command-help#ok_to_test +[prow-cmds]: https://prow.k8s.io/command-help +[retest]: https://prow.k8s.io/command-help#retest +[new-issue-example]: https://github.com/kubernetes/kubernetes/issues/71430 +[kind/flake]: https://prow.k8s.io/command-help#kind +[sig-meetings]: https://github.com/kubernetes/community/blob/master/sig-list.md + +#### Escalating failures to a SIG +- Figure out corresponding SIG from test name/description +- Mention the SIG's GitHub handle on the issue, optionally `cc` the SIG's chair(s) (locate them under kubernetes/community/sig-<name\>) +- Optionally (or if you haven't heard back on the issue after 24h) reach out to the SIG on slack + +### Testgrid +[`testgrid`](https://testgrid.k8s.io/) is a visualization of the Kubernetes CI. + +It is useful as a way to: +- see the run history of a test you are debugging (access it starting from a gubernator report for that test) +- get an overview of the project's general health + +`testgrid` is organised in: +- tests + - collection of assertions in a test file + - each test is typically owned by a single SIG + - each test is represented as a row on the grid +- jobs + - collection of tests + - each job is typically owned by a single SIG + - each job is represented as a tab +- dashboards + - collection of jobs + - each dashboard is represented as a button + - some dashboards collect jobs/tests in the domain of a specific SIG (named after and owned by those SIGs), and dashboards to monitor project wide health (owned by SIG-release) diff --git a/contributors/devel/sig-testing/writing-good-e2e-tests.md b/contributors/devel/sig-testing/writing-good-e2e-tests.md new file mode 100644 index 00000000..836479c2 --- /dev/null +++ b/contributors/devel/sig-testing/writing-good-e2e-tests.md @@ -0,0 +1,231 @@ +# Writing good e2e tests for Kubernetes # + +## Patterns and Anti-Patterns ## + +### Goals of e2e tests ### + +Beyond the obvious goal of providing end-to-end system test coverage, +there are a few less obvious goals that you should bear in mind when +designing, writing and debugging your end-to-end tests. In +particular, "flaky" tests, which pass most of the time but fail +intermittently for difficult-to-diagnose reasons are extremely costly +in terms of blurring our regression signals and slowing down our +automated merge velocity. Up-front time and effort designing your test +to be reliable is very well spent. Bear in mind that we have hundreds +of tests, each running in dozens of different environments, and if any +test in any test environment fails, we have to assume that we +potentially have some sort of regression. So if a significant number +of tests fail even only 1% of the time, basic statistics dictates that +we will almost never have a "green" regression indicator. Stated +another way, writing a test that is only 99% reliable is just about +useless in the harsh reality of a CI environment. In fact it's worse +than useless, because not only does it not provide a reliable +regression indicator, but it also costs a lot of subsequent debugging +time, and delayed merges. + +#### Debuggability #### + +If your test fails, it should provide as detailed as possible reasons +for the failure in its output. "Timeout" is not a useful error +message. "Timed out after 60 seconds waiting for pod xxx to enter +running state, still in pending state" is much more useful to someone +trying to figure out why your test failed and what to do about it. +Specifically, +[assertion](https://onsi.github.io/gomega/#making-assertions) code +like the following generates rather useless errors: + +``` +Expect(err).NotTo(HaveOccurred()) +``` + +Rather +[annotate](https://onsi.github.io/gomega/#annotating-assertions) your assertion with something like this: + +``` +Expect(err).NotTo(HaveOccurred(), "Failed to create %d foobars, only created %d", foobarsReqd, foobarsCreated) +``` + +On the other hand, overly verbose logging, particularly of non-error conditions, can make +it unnecessarily difficult to figure out whether a test failed and if +so why? So don't log lots of irrelevant stuff either. + +#### Ability to run in non-dedicated test clusters #### + +To reduce end-to-end delay and improve resource utilization when +running e2e tests, we try, where possible, to run large numbers of +tests in parallel against the same test cluster. This means that: + +1. you should avoid making any assumption (implicit or explicit) that +your test is the only thing running against the cluster. For example, +making the assumption that your test can run a pod on every node in a +cluster is not a safe assumption, as some other tests, running at the +same time as yours, might have saturated one or more nodes in the +cluster. Similarly, running a pod in the system namespace, and +assuming that will increase the count of pods in the system +namespace by one is not safe, as some other test might be creating or +deleting pods in the system namespace at the same time as your test. +If you do legitimately need to write a test like that, make sure to +label it ["\[Serial\]"](e2e-tests.md#kinds-of-tests) so that it's easy +to identify, and not run in parallel with any other tests. +1. You should avoid doing things to the cluster that make it difficult +for other tests to reliably do what they're trying to do, at the same +time. For example, rebooting nodes, disconnecting network interfaces, +or upgrading cluster software as part of your test is likely to +violate the assumptions that other tests might have made about a +reasonably stable cluster environment. If you need to write such +tests, please label them as +["\[Disruptive\]"](e2e-tests.md#kinds-of-tests) so that it's easy to +identify them, and not run them in parallel with other tests. +1. You should avoid making assumptions about the Kubernetes API that +are not part of the API specification, as your tests will break as +soon as these assumptions become invalid. For example, relying on +specific Events, Event reasons or Event messages will make your tests +very brittle. + +#### Speed of execution #### + +We have hundreds of e2e tests, some of which we run in serial, one +after the other, in some cases. If each test takes just a few minutes +to run, that very quickly adds up to many, many hours of total +execution time. We try to keep such total execution time down to a +few tens of minutes at most. Therefore, try (very hard) to keep the +execution time of your individual tests below 2 minutes, ideally +shorter than that. Concretely, adding inappropriately long 'sleep' +statements or other gratuitous waits to tests is a killer. If under +normal circumstances your pod enters the running state within 10 +seconds, and 99.9% of the time within 30 seconds, it would be +gratuitous to wait 5 minutes for this to happen. Rather just fail +after 30 seconds, with a clear error message as to why your test +failed ("e.g. Pod x failed to become ready after 30 seconds, it +usually takes 10 seconds"). If you do have a truly legitimate reason +for waiting longer than that, or writing a test which takes longer +than 2 minutes to run, comment very clearly in the code why this is +necessary, and label the test as +["\[Slow\]"](e2e-tests.md#kinds-of-tests), so that it's easy to +identify and avoid in test runs that are required to complete +timeously (for example those that are run against every code +submission before it is allowed to be merged). +Note that completing within, say, 2 minutes only when the test +passes is not generally good enough. Your test should also fail in a +reasonable time. We have seen tests that, for example, wait up to 10 +minutes for each of several pods to become ready. Under good +conditions these tests might pass within a few seconds, but if the +pods never become ready (e.g. due to a system regression) they take a +very long time to fail and typically cause the entire test run to time +out, so that no results are produced. Again, this is a lot less +useful than a test that fails reliably within a minute or two when the +system is not working correctly. + +#### Resilience to relatively rare, temporary infrastructure glitches or delays #### + +Remember that your test will be run many thousands of +times, at different times of day and night, probably on different +cloud providers, under different load conditions. And often the +underlying state of these systems is stored in eventually consistent +data stores. So, for example, if a resource creation request is +theoretically asynchronous, even if you observe it to be practically +synchronous most of the time, write your test to assume that it's +asynchronous (e.g. make the "create" call, and poll or watch the +resource until it's in the correct state before proceeding). +Similarly, don't assume that API endpoints are 100% available. +They're not. Under high load conditions, API calls might temporarily +fail or time-out. In such cases it's appropriate to back off and retry +a few times before failing your test completely (in which case make +the error message very clear about what happened, e.g. "Retried +http://... 3 times - all failed with xxx". Use the standard +retry mechanisms provided in the libraries detailed below. + +### Some concrete tools at your disposal ### + +Obviously most of the above goals apply to many tests, not just yours. +So we've developed a set of reusable test infrastructure, libraries +and best practices to help you to do the right thing, or at least do +the same thing as other tests, so that if that turns out to be the +wrong thing, it can be fixed in one place, not hundreds, to be the +right thing. + +Here are a few pointers: + ++ [E2e Framework](https://git.k8s.io/kubernetes/test/e2e/framework/framework.go): + Familiarise yourself with this test framework and how to use it. + Amongst others, it automatically creates uniquely named namespaces + within which your tests can run to avoid name clashes, and reliably + automates cleaning up the mess after your test has completed (it + just deletes everything in the namespace). This helps to ensure + that tests do not leak resources. Note that deleting a namespace + (and by implication everything in it) is currently an expensive + operation. So the fewer resources you create, the less cleaning up + the framework needs to do, and the faster your test (and other + tests running concurrently with yours) will complete. Your tests + should always use this framework. Trying other home-grown + approaches to avoiding name clashes and resource leaks has proven + to be a very bad idea. ++ [E2e utils library](https://git.k8s.io/kubernetes/test/e2e/framework/util.go): + This handy library provides tons of reusable code for a host of + commonly needed test functionality, including waiting for resources + to enter specified states, safely and consistently retrying failed + operations, usefully reporting errors, and much more. Make sure + that you're familiar with what's available there, and use it. + Likewise, if you come across a generally useful mechanism that's + not yet implemented there, add it so that others can benefit from + your brilliance. In particular pay attention to the variety of + timeout and retry related constants at the top of that file. Always + try to reuse these constants rather than try to dream up your own + values. Even if the values there are not precisely what you would + like to use (timeout periods, retry counts etc), the benefit of + having them be consistent and centrally configurable across our + entire test suite typically outweighs your personal preferences. ++ **Follow the examples of stable, well-written tests:** Some of our + existing end-to-end tests are better written and more reliable than + others. A few examples of well-written tests include: + [Replication Controllers](https://git.k8s.io/kubernetes/test/e2e/apps/rc.go), + [Services](https://git.k8s.io/kubernetes/test/e2e/network/service.go), + [Reboot](https://git.k8s.io/kubernetes/test/e2e/lifecycle/reboot.go). ++ [Ginkgo Test Framework](https://github.com/onsi/ginkgo): This is the + test library and runner upon which our e2e tests are built. Before + you write or refactor a test, read the docs and make sure that you + understand how it works. In particular be aware that every test is + uniquely identified and described (e.g. in test reports) by the + concatenation of its `Describe` clause and nested `It` clauses. + So for example `Describe("Pods",...).... It(""should be scheduled + with cpu and memory limits")` produces a sane test identifier and + descriptor `Pods should be scheduled with cpu and memory limits`, + which makes it clear what's being tested, and hence what's not + working if it fails. Other good examples include: + +``` + CAdvisor should be healthy on every node +``` + +and + +``` + Daemon set should run and stop complex daemon +``` + + On the contrary +(these are real examples), the following are less good test +descriptors: + +``` + KubeProxy should test kube-proxy +``` + +and + +``` +Nodes [Disruptive] Network when a node becomes unreachable +[replication controller] recreates pods scheduled on the +unreachable node AND allows scheduling of pods on a node after +it rejoins the cluster +``` + +An improvement might be + +``` +Unreachable nodes are evacuated and then repopulated upon rejoining [Disruptive] +``` + +Note that opening issues for specific better tooling is welcome, and +code implementing that tooling is even more welcome :-). + diff --git a/contributors/devel/staging.md b/contributors/devel/staging.md index 79ae762f..81bf52d8 100644 --- a/contributors/devel/staging.md +++ b/contributors/devel/staging.md @@ -1,34 +1,3 @@ -# Staging Directory and Publishing +This file has moved to https://git.k8s.io/community/contributors/devel/sig-architecture/staging.md. -The [staging/ directory](https://git.k8s.io/kubernetes/staging) of Kubernetes contains a number of pseudo repositories ("staging repos"). They are symlinked into Kubernetes' [vendor/ directory](https://git.k8s.io/kubernetes/vendor/k8s.io) for Golang to pick them up. - -We publish the staging repos using the [publishing bot](https://git.k8s.io/publishing-bot). It uses `git filter-branch` essentially to [cut the staging directories into separate git trees](https://de.slideshare.net/sttts/cutting-the-kubernetes-monorepo-in-pieces-never-learnt-more-about-git) and pushing the new commits to the corresponding real repositories in the [kubernetes organization on Github](https://github.com/kubernetes). - -The list of staging repositories and their published branches are listed in [publisher.go inside of the bot](https://git.k8s.io/publishing-bot/cmd/publishing-bot/publisher.go). Though it is planned to move this out into the k8s.io/kubernetes repository. - -At the time of this writing, this includes the branches - -- master, -- release-1.8 / release-5.0, -- and release-1.9 / release-6.0 - -of the following staging repos in the k8s.io org: - -- api -- apiextensions-apiserver -- apimachinery -- apiserver -- client-go -- code-generator -- kube-aggregator -- metrics -- sample-apiserver -- sample-controller - -Kubernetes tags (e.g., v1.9.1-beta1) are also applied automatically to the published repositories, prefixed with kubernetes- (e.g., kubernetes-1.9.1-beta1). The client-go semver tags (on client-go only!) including release-notes are still done manually. - -The semver tags are still the (well tested) official releases. The kubernetes-1.x.y tags have limited test coverage (we have some automatic tests in place in the bot), but can be used by early adopters of client-go and the other libraries. Moreover, they help to vendor the correct version of k8s.io/api and k8s.io/apimachinery. - -If further repos under staging are need, adding them to the bot is easy. Contact one of the [owners of the bot](https://git.k8s.io/publishing-bot/OWNERS). - -Currently, the bot is hosted on the CI cluster of Redhat's OpenShift (ready to be moved out to a public CNCF cluster if we have that in the future). +This file is a placeholder to preserve links. Please remove after 2019-07-01 or the release of kubernetes 1.15, whichever comes first. diff --git a/contributors/devel/strategic-merge-patch.md b/contributors/devel/strategic-merge-patch.md index 4f45ef8e..a0240159 100644 --- a/contributors/devel/strategic-merge-patch.md +++ b/contributors/devel/strategic-merge-patch.md @@ -1,449 +1,3 @@ -Strategic Merge Patch -===================== - -# Background - -Kubernetes supports a customized version of JSON merge patch called strategic merge patch. This -patch format is used by `kubectl apply`, `kubectl edit` and `kubectl patch`, and contains -specialized directives to control how specific fields are merged. - -In the standard JSON merge patch, JSON objects are always merged but lists are -always replaced. Often that isn't what we want. Let's say we start with the -following Pod: - -```yaml -spec: - containers: - - name: nginx - image: nginx-1.0 -``` - -and we POST that to the server (as JSON). Then let's say we want to *add* a -container to this Pod. - -```yaml -PATCH /api/v1/namespaces/default/pods/pod-name -spec: - containers: - - name: log-tailer - image: log-tailer-1.0 -``` - -If we were to use standard Merge Patch, the entire container list would be -replaced with the single log-tailer container. However, our intent is for the -container lists to merge together based on the `name` field. - -To solve this problem, Strategic Merge Patch uses the go struct tag of the API -objects to determine what lists should be merged and which ones should not. -The metadata is available as struct tags on the API objects -themselves and also available to clients as [OpenAPI annotations](https://github.com/kubernetes/kubernetes/blob/master/api/openapi-spec/README.md#x-kubernetes-patch-strategy-and-x-kubernetes-patch-merge-key). -In the above example, the `patchStrategy` metadata for the `containers` -field would be `merge` and the `patchMergeKey` would be `name`. - - -# Basic Patch Format - -Strategic Merge Patch supports special operations through directives. - -There are multiple directives: - -- replace -- merge -- delete -- delete from primitive list - -`replace`, `merge` and `delete` are mutual exclusive. - -## `replace` Directive - -### Purpose - -`replace` directive indicates that the element that contains it should be replaced instead of being merged. - -### Syntax - -`replace` directive is used in both patch with directive marker and go struct tags. - -Example usage in the patch: - -``` -$patch: replace -``` - -### Example - -`replace` directive can be used on both map and list. - -#### Map - -To indicate that a map should not be merged and instead should be taken literally: - -```yaml -$patch: replace # recursive and applies to all fields of the map it's in -containers: -- name: nginx - image: nginx-1.0 -``` - -#### List of Maps - -To override the container list to be strictly replaced, regardless of the default: - -```yaml -containers: - - name: nginx - image: nginx-1.0 - - $patch: replace # any further $patch operations nested in this list will be ignored -``` - - -## `delete` Directive - -### Purpose - -`delete` directive indicates that the element that contains it should be deleted. - -### Syntax - -`delete` directive is used only in the patch with directive marker. -It can be used on both map and list of maps. -``` -$patch: delete -``` - -### Example - -#### List of Maps - -To delete an element of a list that should be merged: - -```yaml -containers: - - name: nginx - image: nginx-1.0 - - $patch: delete - name: log-tailer # merge key and value goes here -``` - -Note: Delete operation will delete all entries in the list that match the merge key. - -#### Maps - -One way to delete a map is using `delete` directive. -Applying this patch will delete the rollingUpdate map. -```yaml -rollingUpdate: - $patch: delete -``` - -An equivalent way to delete this map is -```yaml -rollingUpdate: null -``` - -## `merge` Directive - -### Purpose - -`merge` directive indicates that the element that contains it should be merged instead of being replaced. - -### Syntax - -`merge` directive is used only in the go struct tags. - - -## `deleteFromPrimitiveList` Directive - -### Purpose - -We have two patch strategies for lists of primitives: replace and merge. -Replace is the default patch strategy for list, which will replace the whole list on update and it will preserve the order; -while merge strategy works as an unordered set. We call a primitive list with merge strategy an unordered set. -The patch strategy is defined in the go struct tag of the API objects. - -`deleteFromPrimitiveList` directive indicates that the elements in this list should be deleted from the original primitive list. - -### Syntax - -It is used only as the prefix of the key in the patch. -``` -$deleteFromPrimitiveList/<keyOfPrimitiveList>: [a primitive list] -``` - -### Example - -##### List of Primitives (Unordered Set) - -`finalizers` uses `merge` as patch strategy. -```go -Finalizers []string `json:"finalizers,omitempty" patchStrategy:"merge" protobuf:"bytes,14,rep,name=finalizers"` -``` - -Suppose we have defined a `finalizers` and we call it the original finalizers: - -```yaml -finalizers: - - a - - b - - c -``` - -To delete items "b" and "c" from the original finalizers, the patch will be: - -```yaml -# The directive includes the prefix $deleteFromPrimitiveList and -# followed by a '/' and the name of the list. -# The values in this list will be deleted after applying the patch. -$deleteFromPrimitiveList/finalizers: - - b - - c -``` - -After applying the patch on the original finalizers, it will become: - -```yaml -finalizers: - - a -``` - -Note: When merging two set, the primitives are first deduplicated and then merged. -In an erroneous case, the set may be created with duplicates. Deleting an -item that has duplicates will delete all matching items. - -## `setElementOrder` Directive - -### Purpose - -`setElementOrder` directive provides a way to specify the order of a list. -The relative order specified in this directive will be retained. -Please refer to [proposal](/contributors/design-proposals/cli/preserve-order-in-strategic-merge-patch.md) for more information. - -### Syntax - -It is used only as the prefix of the key in the patch. -``` -$setElementOrder/<keyOfList>: [a list] -``` - -### Example - -#### List of Primitives - -Suppose we have a list of `finalizers`: -```yaml -finalizers: - - a - - b - - c -``` - -To reorder the elements order in the list, we can send a patch: -```yaml -# The directive includes the prefix $setElementOrder and -# followed by a '/' and the name of the list. -$setElementOrder/finalizers: - - b - - c - - a -``` - -After applying the patch, it will be: -```yaml -finalizers: - - b - - c - - a -``` - -#### List of Maps - -Suppose we have a list of `containers` whose `mergeKey` is `name`: -```yaml -containers: - - name: a - ... - - name: b - ... - - name: c - ... -``` - -To reorder the elements order in the list, we can send a patch: -```yaml -# each map in the list should only include the mergeKey -$setElementOrder/containers: - - name: b - - name: c - - name: a -``` - -After applying the patch, it will be: -```yaml -containers: - - name: b - ... - - name: c - ... - - name: a - ... -``` - - -## `retainKeys` Directive - -### Purpose - -`retainKeys` directive provides a mechanism for union types to clear mutual exclusive fields. -When this directive is present in the patch, all the fields not in this directive will be cleared. -Please refer to [proposal](/contributors/design-proposals/api-machinery/add-new-patchStrategy-to-clear-fields-not-present-in-patch.md) for more information. - -### Syntax - -``` -$retainKeys: [a list of field keys] -``` - -### Example - -#### Map - -Suppose we have a union type: -``` -union: - foo: a - other: b -``` - -And we have a patch: -``` -union: - retainKeys: - - another - - bar - another: d - bar: c -``` - -After applying this patch, we get: -``` -union: - # Field foo and other have been cleared w/o explicitly set them to null. - another: d - bar: c -``` - -# Changing patch format - -As issues and limitations have been discovered with the strategic merge -patch implementation, it has been necessary to change the patch format -to support additional semantics - such as merging lists of -primitives and defining order when merging lists. - -## Requirements for any changes to the patch format - -**Note:** Changes to the strategic merge patch must be backwards compatible such -that patch requests valid in previous versions continue to be valid. -That is, old patch formats sent by old clients to new servers with -must continue to function correctly. - -Previously valid patch requests do not need to keep the exact same -behavior, but do need to behave correctly. - -**Example:** if a patch request previously randomized the order of elements -in a list and we want to provide a deterministic order, we must continue -to support old patch format but we can make the ordering deterministic -for the old format. - -### Client version skew - -Because the server does not publish which patch versions it supports, -and it silently ignores patch directives that it does not recognize, -new patches should behave correctly when sent to old servers that -may not support all of the patch directives. - -While the patch API must be backwards compatible, it must also -be forward compatible for 1 version. This is needed because `kubectl` must -support talking to older and newer server versions without knowing what -parts of patch are supported on each, and generate patches that work correctly on both. - -## Strategies for introducing new patch behavior - -#### 1. Add optional semantic meaning to the existing patch format. - -**Note:** Must not require new data or elements to be present that was not required before. Meaning must not break old interpretation of old patches. - -**Good Example:** - -Old format - - ordering of elements in patch had no meaning and the final ordering was arbitrary - -New format - - ordering of elements in patch has meaning and the final ordering is deterministic based on the ordering in the patch - -**Bad Example:** - -Old format - - fields not present in a patch for Kind foo are ignored - - unmodified fields for Kind foo are optional in patch request - -New format - - fields not present in a patch for Kind foo are cleared - - unmodified fields for Kind foo are required in patch request - -This example won't work, because old patch formats will contain data that is now -considered required. To support this, introduce a new directive to guard the -new patch format. - -#### 2. Add support for new directives in the patch format - -- Optional directives may be introduced to change how the patch is applied by the server - **backwards compatible** (old patch against newer server). - - May control how the patch is applied - - May contain patch information - such as elements to delete from a list - - Must NOT impose new requirements on the old patch format - -- New patch requests should be a superset of old patch requests - **forwards compatible** (newer patch against older server) - - *Old servers will ignore directives they do not recognize* - - Must include the full patch that would have been sent before the new directives were added. - - Must NOT rely on the directive being supported by the server - -**Good Example:** - -Old format - - fields not present in a patch for Kind foo are ignored - - unmodified fields for Kind foo are optional in patch request - -New format *without* directive - - Same as old - -New format *with* directive - - fields not present in a patch for Kind foo are cleared - - unmodified fields for Kind foo are required in patch request - -In this example, the behavior was unchanged when the directive was missing, -retaining the old behavior for old patch requests. - -**Bad Example:** - -Old format - - fields not present in a patch for Kind foo are ignored - - unmodified fields for Kind foo are optional in patch request - -New format *with* directive - - Same as old - -New format *without* directive - - fields not present in a patch for Kind foo are cleared - - unmodified fields for Kind foo are required in patch request - -In this example, the behavior was changed when the directive was missing, -breaking compatibility. - -## Alternatives - -The previous strategy is necessary because there is no notion of -patch versions. Having the client negotiate the patch version -with the server would allow changing the patch format, but at -the cost of supporting multiple patch formats in the server and client. -Using client provided directives to evolve how a patch is merged -provides some limited support for multiple versions. +This file has moved to https://git.k8s.io/community/contributors/devel/sig-api-machinery/strategic-merge-patch.md. +This file is a placeholder to preserve links. Please remove after 2019-07-01 or the release of kubernetes 1.15, whichever comes first. diff --git a/contributors/devel/testing.md b/contributors/devel/testing.md index 4d41fda9..d904278d 100644 --- a/contributors/devel/testing.md +++ b/contributors/devel/testing.md @@ -1,285 +1,3 @@ -# Testing guide +This file has moved to https://git.k8s.io/community/contributors/devel/sig-testing/testing.md. -**Table of Contents** - -- [Testing guide](#testing-guide) - - [Unit tests](#unit-tests) - - [Run all unit tests](#run-all-unit-tests) - - [Set go flags during unit tests](#set-go-flags-during-unit-tests) - - [Run unit tests from certain packages](#run-unit-tests-from-certain-packages) - - [Run specific unit test cases in a package](#run-specific-unit-test-cases-in-a-package) - - [Stress running unit tests](#stress-running-unit-tests) - - [Unit test coverage](#unit-test-coverage) - - [Benchmark unit tests](#benchmark-unit-tests) - - [Integration tests](#integration-tests) - - [Install etcd dependency](#install-etcd-dependency) - - [Etcd test data](#etcd-test-data) - - [Run integration tests](#run-integration-tests) - - [Run a specific integration test](#run-a-specific-integration-test) - - [End-to-End tests](#end-to-end-tests) - - -This assumes you already read the [development guide](development.md) to -install go, godeps, and configure your git client. All command examples are -relative to the `kubernetes` root directory. - -Before sending pull requests you should at least make sure your changes have -passed both unit and integration tests. - -Kubernetes only merges pull requests when unit, integration, and e2e tests are -passing, so it is often a good idea to make sure the e2e tests work as well. - -## Unit tests - -* Unit tests should be fully hermetic - - Only access resources in the test binary. -* All packages and any significant files require unit tests. -* The preferred method of testing multiple scenarios or input is - [table driven testing](https://github.com/golang/go/wiki/TableDrivenTests) - - Example: [TestNamespaceAuthorization](https://git.k8s.io/kubernetes/test/integration/auth/auth_test.go) -* Unit tests must pass on macOS and Windows platforms. - - Tests using linux-specific features must be skipped or compiled out. - - Skipped is better, compiled out is required when it won't compile. -* Concurrent unit test runs must pass. -* See [coding conventions](../guide/coding-conventions.md). - -### Run all unit tests - -`make test` is the entrypoint for running the unit tests that ensures that -`GOPATH` is set up correctly. If you have `GOPATH` set up correctly, you can -also just use `go test` directly. - -```sh -cd kubernetes -make test # Run all unit tests. -``` - -If any unit test fails with a timeout panic (see [#1594](https://github.com/kubernetes/community/issues/1594)) on the testing package, you can increase the `KUBE_TIMEOUT` value as shown below. - -```sh -make test KUBE_TIMEOUT="-timeout 300s" -``` - -### Set go flags during unit tests - -You can set [go flags](https://golang.org/cmd/go/) by setting the -`GOFLAGS` environment variable. - -### Run unit tests from certain packages - -`make test` accepts packages as arguments; the `k8s.io/kubernetes` prefix is -added automatically to these: - -```sh -make test WHAT=./pkg/api # run tests for pkg/api -``` - -To run multiple targets you need quotes: - -```sh -make test WHAT="./pkg/api ./pkg/kubelet" # run tests for pkg/api and pkg/kubelet -``` - -In a shell, it's often handy to use brace expansion: - -```sh -make test WHAT=./pkg/{api,kubelet} # run tests for pkg/api and pkg/kubelet -``` - -### Run specific unit test cases in a package - -You can set the test args using the `KUBE_TEST_ARGS` environment variable. -You can use this to pass the `-run` argument to `go test`, which accepts a -regular expression for the name of the test that should be run. - -```sh -# Runs TestValidatePod in pkg/api/validation with the verbose flag set -make test WHAT=./pkg/api/validation GOFLAGS="-v" KUBE_TEST_ARGS='-run ^TestValidatePod$' - -# Runs tests that match the regex ValidatePod|ValidateConfigMap in pkg/api/validation -make test WHAT=./pkg/api/validation GOFLAGS="-v" KUBE_TEST_ARGS="-run ValidatePod\|ValidateConfigMap$" -``` - -For other supported test flags, see the [golang -documentation](https://golang.org/cmd/go/#hdr-Testing_flags). - -### Stress running unit tests - -Running the same tests repeatedly is one way to root out flakes. -You can do this efficiently. - -```sh -# Have 2 workers run all tests 5 times each (10 total iterations). -make test PARALLEL=2 ITERATION=5 -``` - -For more advanced ideas please see [flaky-tests.md](flaky-tests.md). - -### Unit test coverage - -Currently, collecting coverage is only supported for the Go unit tests. - -To run all unit tests and generate an HTML coverage report, run the following: - -```sh -make test KUBE_COVER=y -``` - -At the end of the run, an HTML report will be generated with the path -printed to stdout. - -To run tests and collect coverage in only one package, pass its relative path -under the `kubernetes` directory as an argument, for example: - -```sh -make test WHAT=./pkg/kubectl KUBE_COVER=y -``` - -Multiple arguments can be passed, in which case the coverage results will be -combined for all tests run. - -### Benchmark unit tests - -To run benchmark tests, you'll typically use something like: - -```sh -go test ./pkg/apiserver -benchmem -run=XXX -bench=BenchmarkWatch -``` - -This will do the following: - -1. `-run=XXX` is a regular expression filter on the name of test cases to run -2. `-bench=BenchmarkWatch` will run test methods with BenchmarkWatch in the name - * See `grep -nr BenchmarkWatch .` for examples -3. `-benchmem` enables memory allocation stats - -See `go help test` and `go help testflag` for additional info. - -## Integration tests - -* Integration tests should only access other resources on the local machine - - Most commonly etcd or a service listening on localhost. -* All significant features require integration tests. - - This includes kubectl commands -* The preferred method of testing multiple scenarios or inputs -is [table driven testing](https://github.com/golang/go/wiki/TableDrivenTests) - - Example: [TestNamespaceAuthorization](https://git.k8s.io/kubernetes/test/integration/auth/auth_test.go) -* Each test should create its own master, httpserver and config. - - Example: [TestPodUpdateActiveDeadlineSeconds](https://git.k8s.io/kubernetes/test/integration/pods/pods_test.go) -* See [coding conventions](coding-conventions.md). - -### Install etcd dependency - -Kubernetes integration tests require your `PATH` to include an -[etcd](https://github.com/coreos/etcd/releases) installation. Kubernetes -includes a script to help install etcd on your machine. - -```sh -# Install etcd and add to PATH - -# Option a) install inside kubernetes root -hack/install-etcd.sh # Installs in ./third_party/etcd -echo export PATH="\$PATH:$(pwd)/third_party/etcd" >> ~/.profile # Add to PATH - -# Option b) install manually -grep -E "image.*etcd" cluster/gce/manifests/etcd.manifest # Find version -# Install that version using yum/apt-get/etc -echo export PATH="\$PATH:<LOCATION>" >> ~/.profile # Add to PATH -``` - -### Etcd test data - -Many tests start an etcd server internally, storing test data in the operating system's temporary directory. - -If you see test failures because the temporary directory does not have sufficient space, -or is on a volume with unpredictable write latency, you can override the test data directory -for those internal etcd instances with the `TEST_ETCD_DIR` environment variable. - -### Run integration tests - -The integration tests are run using `make test-integration`. -The Kubernetes integration tests are written using the normal golang testing -package but expect to have a running etcd instance to connect to. The `test-integration.sh` -script wraps `make test` and sets up an etcd instance for the integration tests to use. - -```sh -make test-integration # Run all integration tests. -``` - -This script runs the golang tests in package -[`test/integration`](https://git.k8s.io/kubernetes/test/integration). - -### Run a specific integration test - -You can also use the `KUBE_TEST_ARGS` environment variable with the `make test-integration` -to run a specific integration test case: - -```sh -# Run integration test TestPodUpdateActiveDeadlineSeconds with the verbose flag set. -make test-integration WHAT=./test/integration/pods GOFLAGS="-v" KUBE_TEST_ARGS="-run ^TestPodUpdateActiveDeadlineSeconds$" -``` - -If you set `KUBE_TEST_ARGS`, the test case will be run with only the `v1` API -version and the watch cache test is skipped. - -## End-to-End tests - -Please refer to [End-to-End Testing in Kubernetes](e2e-tests.md). - -## Running your contribution in the Kubernetes CI -Once you open a PR, [`prow`][prow-doc] runs pre-submit tests in CI. - -If you are not a [Kubernetes org member][member], another org member will need to run [`/ok-to-test`][ok-to-test] on your PR. - -Find out more about [other commands][prow-cmds] you can use to interact with prow through GitHub comments. - -### Troubleshooting a failure -Click on `Details` and look at the [`gubernator`](gubernator.k8s.io/) output for the test. - -If the failure seems unrelated to the change you're submitting: -- Is it a flake? - - Check if an issue has already been opened for that flake - - If not, open a new one (like [this example][new-issue-example]) and [label it `kind/flake`][kind/flake] - - Run [`/retest`][retest] on your PR to re-trigger the tests - -- Is it a failure that shouldn't be happening (in other words, is the test now wrong) - - Get in touch with the SIG - - preferably as a comment on your PR, by tagging the [team][k-teams] (for example a [reviewers team for the SIG][k-teams-review]) - - if you don't get a response in 24h, engage with the SIG on their channel on the Kubernetes slack and/or attend one of the [SIG meetings][sig-meetings] to ask for input. - -[prow-doc]: https://kubernetes.io/blog/2018/08/29/the-machines-can-do-the-work-a-story-of-kubernetes-testing-ci-and-automating-the-contributor-experience/#enter-prow -[member]: https://github.com/kubernetes/community/blob/master/community-membership.md#member -[k-teams]: https://github.com/orgs/kubernetes/teams -[k-teams-review]: https://github.com/orgs/kubernetes/teams?utf8=%E2%9C%93&query=review -[ok-to-test]: https://prow.k8s.io/command-help#ok_to_test -[prow-cmds]: https://prow.k8s.io/command-help -[retest]: https://prow.k8s.io/command-help#retest -[new-issue-example]: https://github.com/kubernetes/kubernetes/issues/71430 -[kind/flake]: https://prow.k8s.io/command-help#kind -[sig-meetings]: https://github.com/kubernetes/community/blob/master/sig-list.md - -#### Escalating failures to a SIG -- Figure out corresponding SIG from test name/description -- Mention the SIG's GitHub handle on the issue, optionally `cc` the SIG's chair(s) (locate them under kubernetes/community/sig-<name\>) -- Optionally (or if you haven't heard back on the issue after 24h) reach out to the SIG on slack - -### Testgrid -[`testgrid`](https://testgrid.k8s.io/) is a visualization of the Kubernetes CI. - -It is useful as a way to: -- see the run history of a test you are debugging (access it starting from a gubernator report for that test) -- get an overview of the project's general health - -`testgrid` is organised in: -- tests - - collection of assertions in a test file - - each test is typically owned by a single SIG - - each test is represented as a row on the grid -- jobs - - collection of tests - - each job is typically owned by a single SIG - - each job is represented as a tab -- dashboards - - collection of jobs - - each dashboard is represented as a button - - some dashboards collect jobs/tests in the domain of a specific SIG (named after and owned by those SIGs), and dashboards to monitor project wide health (owned by SIG-release) +This file is a placeholder to preserve links. Please remove after 2019-07-01 or the release of kubernetes 1.15, whichever comes first. diff --git a/contributors/devel/writing-good-e2e-tests.md b/contributors/devel/writing-good-e2e-tests.md index 836479c2..b39208eb 100644 --- a/contributors/devel/writing-good-e2e-tests.md +++ b/contributors/devel/writing-good-e2e-tests.md @@ -1,231 +1,3 @@ -# Writing good e2e tests for Kubernetes # - -## Patterns and Anti-Patterns ## - -### Goals of e2e tests ### - -Beyond the obvious goal of providing end-to-end system test coverage, -there are a few less obvious goals that you should bear in mind when -designing, writing and debugging your end-to-end tests. In -particular, "flaky" tests, which pass most of the time but fail -intermittently for difficult-to-diagnose reasons are extremely costly -in terms of blurring our regression signals and slowing down our -automated merge velocity. Up-front time and effort designing your test -to be reliable is very well spent. Bear in mind that we have hundreds -of tests, each running in dozens of different environments, and if any -test in any test environment fails, we have to assume that we -potentially have some sort of regression. So if a significant number -of tests fail even only 1% of the time, basic statistics dictates that -we will almost never have a "green" regression indicator. Stated -another way, writing a test that is only 99% reliable is just about -useless in the harsh reality of a CI environment. In fact it's worse -than useless, because not only does it not provide a reliable -regression indicator, but it also costs a lot of subsequent debugging -time, and delayed merges. - -#### Debuggability #### - -If your test fails, it should provide as detailed as possible reasons -for the failure in its output. "Timeout" is not a useful error -message. "Timed out after 60 seconds waiting for pod xxx to enter -running state, still in pending state" is much more useful to someone -trying to figure out why your test failed and what to do about it. -Specifically, -[assertion](https://onsi.github.io/gomega/#making-assertions) code -like the following generates rather useless errors: - -``` -Expect(err).NotTo(HaveOccurred()) -``` - -Rather -[annotate](https://onsi.github.io/gomega/#annotating-assertions) your assertion with something like this: - -``` -Expect(err).NotTo(HaveOccurred(), "Failed to create %d foobars, only created %d", foobarsReqd, foobarsCreated) -``` - -On the other hand, overly verbose logging, particularly of non-error conditions, can make -it unnecessarily difficult to figure out whether a test failed and if -so why? So don't log lots of irrelevant stuff either. - -#### Ability to run in non-dedicated test clusters #### - -To reduce end-to-end delay and improve resource utilization when -running e2e tests, we try, where possible, to run large numbers of -tests in parallel against the same test cluster. This means that: - -1. you should avoid making any assumption (implicit or explicit) that -your test is the only thing running against the cluster. For example, -making the assumption that your test can run a pod on every node in a -cluster is not a safe assumption, as some other tests, running at the -same time as yours, might have saturated one or more nodes in the -cluster. Similarly, running a pod in the system namespace, and -assuming that will increase the count of pods in the system -namespace by one is not safe, as some other test might be creating or -deleting pods in the system namespace at the same time as your test. -If you do legitimately need to write a test like that, make sure to -label it ["\[Serial\]"](e2e-tests.md#kinds-of-tests) so that it's easy -to identify, and not run in parallel with any other tests. -1. You should avoid doing things to the cluster that make it difficult -for other tests to reliably do what they're trying to do, at the same -time. For example, rebooting nodes, disconnecting network interfaces, -or upgrading cluster software as part of your test is likely to -violate the assumptions that other tests might have made about a -reasonably stable cluster environment. If you need to write such -tests, please label them as -["\[Disruptive\]"](e2e-tests.md#kinds-of-tests) so that it's easy to -identify them, and not run them in parallel with other tests. -1. You should avoid making assumptions about the Kubernetes API that -are not part of the API specification, as your tests will break as -soon as these assumptions become invalid. For example, relying on -specific Events, Event reasons or Event messages will make your tests -very brittle. - -#### Speed of execution #### - -We have hundreds of e2e tests, some of which we run in serial, one -after the other, in some cases. If each test takes just a few minutes -to run, that very quickly adds up to many, many hours of total -execution time. We try to keep such total execution time down to a -few tens of minutes at most. Therefore, try (very hard) to keep the -execution time of your individual tests below 2 minutes, ideally -shorter than that. Concretely, adding inappropriately long 'sleep' -statements or other gratuitous waits to tests is a killer. If under -normal circumstances your pod enters the running state within 10 -seconds, and 99.9% of the time within 30 seconds, it would be -gratuitous to wait 5 minutes for this to happen. Rather just fail -after 30 seconds, with a clear error message as to why your test -failed ("e.g. Pod x failed to become ready after 30 seconds, it -usually takes 10 seconds"). If you do have a truly legitimate reason -for waiting longer than that, or writing a test which takes longer -than 2 minutes to run, comment very clearly in the code why this is -necessary, and label the test as -["\[Slow\]"](e2e-tests.md#kinds-of-tests), so that it's easy to -identify and avoid in test runs that are required to complete -timeously (for example those that are run against every code -submission before it is allowed to be merged). -Note that completing within, say, 2 minutes only when the test -passes is not generally good enough. Your test should also fail in a -reasonable time. We have seen tests that, for example, wait up to 10 -minutes for each of several pods to become ready. Under good -conditions these tests might pass within a few seconds, but if the -pods never become ready (e.g. due to a system regression) they take a -very long time to fail and typically cause the entire test run to time -out, so that no results are produced. Again, this is a lot less -useful than a test that fails reliably within a minute or two when the -system is not working correctly. - -#### Resilience to relatively rare, temporary infrastructure glitches or delays #### - -Remember that your test will be run many thousands of -times, at different times of day and night, probably on different -cloud providers, under different load conditions. And often the -underlying state of these systems is stored in eventually consistent -data stores. So, for example, if a resource creation request is -theoretically asynchronous, even if you observe it to be practically -synchronous most of the time, write your test to assume that it's -asynchronous (e.g. make the "create" call, and poll or watch the -resource until it's in the correct state before proceeding). -Similarly, don't assume that API endpoints are 100% available. -They're not. Under high load conditions, API calls might temporarily -fail or time-out. In such cases it's appropriate to back off and retry -a few times before failing your test completely (in which case make -the error message very clear about what happened, e.g. "Retried -http://... 3 times - all failed with xxx". Use the standard -retry mechanisms provided in the libraries detailed below. - -### Some concrete tools at your disposal ### - -Obviously most of the above goals apply to many tests, not just yours. -So we've developed a set of reusable test infrastructure, libraries -and best practices to help you to do the right thing, or at least do -the same thing as other tests, so that if that turns out to be the -wrong thing, it can be fixed in one place, not hundreds, to be the -right thing. - -Here are a few pointers: - -+ [E2e Framework](https://git.k8s.io/kubernetes/test/e2e/framework/framework.go): - Familiarise yourself with this test framework and how to use it. - Amongst others, it automatically creates uniquely named namespaces - within which your tests can run to avoid name clashes, and reliably - automates cleaning up the mess after your test has completed (it - just deletes everything in the namespace). This helps to ensure - that tests do not leak resources. Note that deleting a namespace - (and by implication everything in it) is currently an expensive - operation. So the fewer resources you create, the less cleaning up - the framework needs to do, and the faster your test (and other - tests running concurrently with yours) will complete. Your tests - should always use this framework. Trying other home-grown - approaches to avoiding name clashes and resource leaks has proven - to be a very bad idea. -+ [E2e utils library](https://git.k8s.io/kubernetes/test/e2e/framework/util.go): - This handy library provides tons of reusable code for a host of - commonly needed test functionality, including waiting for resources - to enter specified states, safely and consistently retrying failed - operations, usefully reporting errors, and much more. Make sure - that you're familiar with what's available there, and use it. - Likewise, if you come across a generally useful mechanism that's - not yet implemented there, add it so that others can benefit from - your brilliance. In particular pay attention to the variety of - timeout and retry related constants at the top of that file. Always - try to reuse these constants rather than try to dream up your own - values. Even if the values there are not precisely what you would - like to use (timeout periods, retry counts etc), the benefit of - having them be consistent and centrally configurable across our - entire test suite typically outweighs your personal preferences. -+ **Follow the examples of stable, well-written tests:** Some of our - existing end-to-end tests are better written and more reliable than - others. A few examples of well-written tests include: - [Replication Controllers](https://git.k8s.io/kubernetes/test/e2e/apps/rc.go), - [Services](https://git.k8s.io/kubernetes/test/e2e/network/service.go), - [Reboot](https://git.k8s.io/kubernetes/test/e2e/lifecycle/reboot.go). -+ [Ginkgo Test Framework](https://github.com/onsi/ginkgo): This is the - test library and runner upon which our e2e tests are built. Before - you write or refactor a test, read the docs and make sure that you - understand how it works. In particular be aware that every test is - uniquely identified and described (e.g. in test reports) by the - concatenation of its `Describe` clause and nested `It` clauses. - So for example `Describe("Pods",...).... It(""should be scheduled - with cpu and memory limits")` produces a sane test identifier and - descriptor `Pods should be scheduled with cpu and memory limits`, - which makes it clear what's being tested, and hence what's not - working if it fails. Other good examples include: - -``` - CAdvisor should be healthy on every node -``` - -and - -``` - Daemon set should run and stop complex daemon -``` - - On the contrary -(these are real examples), the following are less good test -descriptors: - -``` - KubeProxy should test kube-proxy -``` - -and - -``` -Nodes [Disruptive] Network when a node becomes unreachable -[replication controller] recreates pods scheduled on the -unreachable node AND allows scheduling of pods on a node after -it rejoins the cluster -``` - -An improvement might be - -``` -Unreachable nodes are evacuated and then repopulated upon rejoining [Disruptive] -``` - -Note that opening issues for specific better tooling is welcome, and -code implementing that tooling is even more welcome :-). +This file has moved to https://git.k8s.io/community/contributors/devel/sig-testing/writing-good-e2e-tests.md. +This file is a placeholder to preserve links. Please remove by April 30, 2019 or the release of kubernetes 1.13, whichever comes first.
\ No newline at end of file diff --git a/contributors/guide/OWNERS b/contributors/guide/OWNERS index a9abb261..745a9be0 100644 --- a/contributors/guide/OWNERS +++ b/contributors/guide/OWNERS @@ -1,3 +1,5 @@ +# See the OWNERS docs at https://go.k8s.io/owners + reviewers: - castrojo - guineveresaenger @@ -8,6 +10,7 @@ reviewers: approvers: - castrojo - parispittman + - guineveresaenger labels: - sig/contributor-experience - area/contributor-guide diff --git a/contributors/guide/README.md b/contributors/guide/README.md index e053b949..82f16556 100644 --- a/contributors/guide/README.md +++ b/contributors/guide/README.md @@ -47,7 +47,7 @@ Before you can contribute, you will need to sign the [Contributor License Agreem ## Code of Conduct -Please make sure to read and observe our [Code of Conduct](https://github.com/cncf/foundation/blob/master/code-of-conduct.md). +Please make sure to read and observe our [Code of Conduct](/code-of-conduct.md). ## Setting up your development environment @@ -80,6 +80,7 @@ There's always a need for more test coverage. You get the idea - if you ever see something you think should be fixed, you should own it. Here is how you get started. If you have no idea what to start on, you can browse the [Contributor Role Board](https://discuss.kubernetes.io/c/contributors/role-board) to see who is looking for help. +Those interested in contributing without writing code may also find ideas in the [Non-Code Contributions Guide](non-code-contributions.md). ### Find a good first topic @@ -207,7 +208,7 @@ To make it easier for your PR to receive reviews, consider the reviewers will ne * break large changes into a logical series of smaller patches which individually make easily understandable changes, and in aggregate solve a broader issue * label PRs with appropriate SIGs and reviewers: to do this read the messages the bot sends you to guide you through the PR process -Reviewers, the people giving the review, are highly encouraged to revisit the [Code of Conduct](https://github.com/cncf/foundation/blob/master/code-of-conduct.md) and must go above and beyond to promote a collaborative, respectful community. +Reviewers, the people giving the review, are highly encouraged to revisit the [Code of Conduct](/code-of-conduct.md) and must go above and beyond to promote a collaborative, respectful community. When reviewing PRs from others [The Gentle Art of Patch Review](http://sage.thesharps.us/2014/09/01/the-gentle-art-of-patch-review/) suggests an iterative series of focuses which is designed to lead new contributors to positive collaboration without inundating them initially with nuances: * Is the idea behind the contribution sound? @@ -217,15 +218,15 @@ When reviewing PRs from others [The Gentle Art of Patch Review](http://sage.thes ## Testing Testing is the responsibility of all contributors and is in part owned by all SIGss, but is also coordinated by [sig-testing](/sig-testing). -Refer to the [Testing Guide](/contributors/devel/testing.md) for more information. +Refer to the [Testing Guide](/contributors/devel/sig-testing/testing.md) for more information. There are multiple types of tests. The location of the test code varies with type, as do the specifics of the environment needed to successfully run the test: * Unit: These confirm that a particular function behaves as intended. Golang includes a native ability for unit testing via the [testing](https://golang.org/pkg/testing/) package. Unit test source code can be found adjacent to the corresponding source code within a given package. For example: functions defined in [kubernetes/cmd/kubeadm/app/util/version.go](https://git.k8s.io/kubernetes/cmd/kubeadm/app/util/version.go) will have unit tests in [kubernetes/cmd/kubeadm/app/util/version_test.go](https://git.k8s.io/kubernetes/cmd/kubeadm/app/util/version_test.go). These are easily run locally by any developer on any OS. * Integration: These tests cover interactions of package components or interactions between kubernetes components and some other non-kubernetes system resource (eg: etcd). An example would be testing whether a piece of code can correctly store data to or retrieve data from etcd. Integration tests are stored in [kubernetes/test/integration/](https://git.k8s.io/kubernetes/test/integration). Running these can require the developer set up additional functionality on their development system. -* End-to-end ("e2e"): These are broad tests of overall system behavior and coherence. These are more complicated as they require a functional kubernetes cluster built from the sources to be tested. A separate [document detailing e2e testing](/contributors/devel/e2e-tests.md) and test cases themselves can be found in [kubernetes/test/e2e/](https://git.k8s.io/kubernetes/test/e2e). -* Conformance: These are a set of testcases, currently a subset of the integration/e2e tests, that the Architecture SIG has approved to define the core set of interoperable features that all Kubernetes deployments must support. For more information on Conformance tests please see the [Conformance Testing](/contributors/devel/conformance-tests.md) Document. +* End-to-end ("e2e"): These are broad tests of overall system behavior and coherence. These are more complicated as they require a functional kubernetes cluster built from the sources to be tested. A separate [document detailing e2e testing](/contributors/devel/sig-testing/e2e-tests.md) and test cases themselves can be found in [kubernetes/test/e2e/](https://git.k8s.io/kubernetes/test/e2e). +* Conformance: These are a set of testcases, currently a subset of the integration/e2e tests, that the Architecture SIG has approved to define the core set of interoperable features that all Kubernetes deployments must support. For more information on Conformance tests please see the [Conformance Testing](/contributors/devel/sig-architecture/conformance-tests.md) Document. Continuous integration will run these tests either as pre-submits on PRs, post-submits against master/release branches, or both. The results appear on [testgrid](https://testgrid.k8s.io). diff --git a/contributors/guide/coding-conventions.md b/contributors/guide/coding-conventions.md index 63cc18ce..fe6f376c 100644 --- a/contributors/guide/coding-conventions.md +++ b/contributors/guide/coding-conventions.md @@ -55,13 +55,13 @@ the name of the directory in which the .go file exists. sync.Mutex`). When multiple locks are present, give each lock a distinct name following Go conventions - `stateLock`, `mapLock` etc. - - [API changes](/contributors/devel/api_changes.md) + - [API changes](/contributors/devel/sig-architecture/api_changes.md) - [API conventions](/contributors/devel/api-conventions.md) - - [Kubectl conventions](/contributors/devel/kubectl-conventions.md) + - [Kubectl conventions](/contributors/devel/sig-cli/kubectl-conventions.md) - - [Logging conventions](/contributors/devel/logging.md) + - [Logging conventions](/contributors/devel/sig-instrumentation/logging.md) ## Testing conventions @@ -72,7 +72,7 @@ tests example, see [TestNamespaceAuthorization](https://git.k8s.io/kubernetes/test/integration/auth/auth_test.go) - Significant features should come with integration (test/integration) and/or -[end-to-end (test/e2e) tests](/contributors/devel/e2e-tests.md) +[end-to-end (test/e2e) tests](/contributors/devel/sig-testing/e2e-tests.md) - Including new kubectl commands and major features of existing commands - Unit tests must pass on macOS and Windows platforms - if you use Linux @@ -86,7 +86,7 @@ required when your code does not compile on Windows). asynchronous thing to happen (e.g. wait for 1 seconds and expect a Pod to be running). Wait and retry instead. - - See the [testing guide](/contributors/devel/testing.md) for additional testing advice. + - See the [testing guide](/contributors/devel/sig-testing/testing.md) for additional testing advice. ## Directory and file conventions @@ -119,7 +119,7 @@ respectively. Actual application examples belong in /examples. - Go code for normal third-party dependencies is managed using [Godep](https://github.com/tools/godep) and is described in the kubernetes -[godep guide](/contributors/devel/godep.md) +[godep guide](/contributors/devel/sig-architecture/godep.md) - Other third-party code belongs in `/third_party` - forked third party Go code goes in `/third_party/forked` diff --git a/contributors/guide/contributor-cheatsheet.md b/contributors/guide/contributor-cheatsheet.md index 180a368f..320cd980 100644 --- a/contributors/guide/contributor-cheatsheet.md +++ b/contributors/guide/contributor-cheatsheet.md @@ -20,7 +20,7 @@ A list of common resources when contributing to Kubernetes. - [GitHub labels](https://go.k8s.io/github-labels) - [Release Buckets](https://gcsweb.k8s.io/gcs/kubernetes-release/) - Developer Guide - - [Cherry Picking Guide](/contributors/devel/cherry-picks.md) + - [Cherry Picking Guide](/contributors/devel/sig-release/cherry-picks.md) - [Kubernetes Code Search](https://cs.k8s.io/), maintained by [@dims](https://github.com/dims) diff --git a/contributors/guide/github-workflow.md b/contributors/guide/github-workflow.md index 221a7921..cef4e0a3 100644 --- a/contributors/guide/github-workflow.md +++ b/contributors/guide/github-workflow.md @@ -149,17 +149,17 @@ make test make test WHAT=./pkg/api/helper GOFLAGS=-v # Run integration tests, requires etcd -# For more info, visit https://git.k8s.io/community/contributors/devel/testing.md#integration-tests +# For more info, visit https://git.k8s.io/community/contributors/devel/sig-testing/testing.md#integration-tests make test-integration # Run e2e tests by building test binaries, turn up a test cluster, run all tests, and tear the cluster down # Equivalent to: go run hack/e2e.go -- -v --build --up --test --down # Note: running all e2e tests takes a LONG time! To run specific e2e tests, visit: -# https://git.k8s.io/community/contributors/devel/e2e-tests.md#building-kubernetes-and-running-the-tests +# https://git.k8s.io/community/contributors/devel/sig-testing/e2e-tests.md#building-kubernetes-and-running-the-tests make test-e2e ``` -See the [testing guide](/contributors/devel/testing.md) and [end-to-end tests](/contributors/devel/e2e-tests.md) +See the [testing guide](/contributors/devel/sig-testing/testing.md) and [end-to-end tests](/contributors/devel/sig-testing/e2e-tests.md) for additional information and scenarios. Run `make help` for additional information on these make targets. @@ -199,7 +199,7 @@ git push -f ${your_remote_name} myfeature ### 7 Create a pull request -1. Visit your fork at https://github.com/$user/kubernetes +1. Visit your fork at `https://github.com/$user/kubernetes` 2. Click the `Compare & Pull Request` button next to your `myfeature` branch. 3. Check out the pull request [process](/contributors/guide/pull-requests.md) for more details and advice. @@ -219,10 +219,6 @@ Commit changes made in response to review comments to the same branch on your fork. Very small PRs are easy to review. Very large PRs are very difficult to review. -At the assigned reviewer's discretion, a PR may be switched to use -[Reviewable](https://reviewable.k8s.io) instead. Once a PR is switched to -Reviewable, please ONLY send or reply to comments through Reviewable. Mixing -code review tools can be very confusing. #### Squash and Merge diff --git a/contributors/guide/issue-triage.md b/contributors/guide/issue-triage.md index ff67ba3e..879648a9 100644 --- a/contributors/guide/issue-triage.md +++ b/contributors/guide/issue-triage.md @@ -206,7 +206,7 @@ block the release on it. A few days before release, we will probably move all that milestone in bulk. More information can be found in the developer guide section for -[targeting issues and PRs to a milestone release](/contributors/devel/release.md). +[targeting issues and PRs to a milestone release](/contributors/devel/sig-release/release.md). ## Closing issues Issues that are identified as a support request, duplicate, not-reproducible diff --git a/contributors/guide/non-code-contributions.md b/contributors/guide/non-code-contributions.md index 29bce79c..95c328a1 100644 --- a/contributors/guide/non-code-contributions.md +++ b/contributors/guide/non-code-contributions.md @@ -27,10 +27,11 @@ These are roles that either span the project as a whole, or span several areas o - Outward facing community work (might be more CNCF-oriented) - Hosting meetups and general evangelism - Presentation of work to meetups - - Design - - Web Development - - Artistic contributions - - Conference-specific or Project-specific + - Visual Communication + - Diagrams and visual explanations of concepts + - Infographic design + - Icon design + - Various artistic contributions to strengthen kubernetes brand, evangelize the project, and develop community - Non-Documentation writing - Blogging about early experiences - Operational manuals @@ -63,6 +64,7 @@ These are roles that are important to each and every SIG within the Kubernetes p - Updates - Reviewing/logging technical ownership for documentation that might need updating - Translation +- UX/UI Design - Release roles - All roles have shadows for onboarding new members - Project management @@ -74,6 +76,7 @@ These are roles that are important to each and every SIG within the Kubernetes p - Events - Organizing/helping run Face-to-Face meetings for SIGs/WGs/subprojects - Putting together SIG Intros & Deep-dives for KubeCon/CloudNativeCon +- Managing & Uploading Recordings to YouTube #### Non-Code Tasks in Primarily-Code roles These are roles that are not code-based, but require knowledge of either general coding, or specific domain knowledge of the Kubernetes code base. diff --git a/contributors/guide/pull-requests.md b/contributors/guide/pull-requests.md index a24310a6..cdab6ce4 100644 --- a/contributors/guide/pull-requests.md +++ b/contributors/guide/pull-requests.md @@ -115,7 +115,7 @@ The GitHub robots will add and remove the `do-not-merge/hold` label as you use t ## Pull Requests and the Release Cycle -If a pull request has been reviewed, but held or not approved, it might be due to the current phase in the [Release Cycle](/contributors/devel/release.md). Occasionally, a SIG may freeze their own code base when working towards a specific feature or goal that could impact other development. During this time, your pull request could remain unmerged while their release work is completed. +If a pull request has been reviewed, but held or not approved, it might be due to the current phase in the [Release Cycle](/contributors/devel/sig-release/release.md). Occasionally, a SIG may freeze their own code base when working towards a specific feature or goal that could impact other development. During this time, your pull request could remain unmerged while their release work is completed. If you feel your pull request is in this state, contact the appropriate [SIG](https://git.k8s.io/community/sig-list.md) or [SIG-Release](https://git.k8s.io/sig-release) for clarification. @@ -182,7 +182,7 @@ Let's talk about best practices so your pull request gets reviewed quickly. * [Development guide](/contributors/devel/development.md) * [Coding conventions](../guide/coding-conventions.md) * [API conventions](/contributors/devel/api-conventions.md) -* [Kubectl conventions](/contributors/devel/kubectl-conventions.md) +* [Kubectl conventions](/contributors/devel/sig-cli/kubectl-conventions.md) ## 1. Is the feature wanted? File a Kubernetes Enhancement Proposal Are you sure Feature-X is something the Kubernetes team wants or will accept? Is it implemented to fit with other changes in flight? Are you willing to bet a few days or weeks of work on it? diff --git a/contributors/guide/release-notes.md b/contributors/guide/release-notes.md index 655dff1c..81dca597 100644 --- a/contributors/guide/release-notes.md +++ b/contributors/guide/release-notes.md @@ -30,4 +30,4 @@ For pull requests that don't need to be mentioned at release time, use the `/rel To see how to format your release notes, view the kubernetes/kubernetes [pull request template](https://git.k8s.io/kubernetes/.github/PULL_REQUEST_TEMPLATE.md) for a brief example. Pull Request titles and body comments can be modified at any time prior to the release to make them friendly for release notes. -Release notes apply to pull requests on the master branch. For cherry-pick pull requests, see the [cherry-pick instructions](/contributors/devel/cherry-picks.md). The only exception to these rules is when a pull request is not a cherry-pick and is targeted directly to the non-master branch. In this case, a `release-note-*` label is required for that non-master pull request. +Release notes apply to pull requests on the master branch. For cherry-pick pull requests, see the [cherry-pick instructions](/contributors/devel/sig-release/cherry-picks.md). The only exception to these rules is when a pull request is not a cherry-pick and is targeted directly to the non-master branch. In this case, a `release-note-*` label is required for that non-master pull request. diff --git a/contributors/guide/style-guide.md b/contributors/guide/style-guide.md new file mode 100644 index 00000000..05ccbb04 --- /dev/null +++ b/contributors/guide/style-guide.md @@ -0,0 +1,678 @@ +--- +title: Documentation Style Guide +--- + +# Documentation Style Guide + +This style guide is for content in the Kubernetes github [community repository]. +It is an extension of the [Kubernetes documentation style-guide]. + +These are **guidelines**, not rules. Use your best judgement. + +- [Cheatsheet](#cheatsheet) +- [Content design, formatting, and language](#content-formatting-and-language) + - [Contact information](#contact-information) + - [Dates and times](#dates-and-times) + - [Diagrams, images and other assets](#diagrams-images-and-other-assets) + - [Document Layout](#document-layout) + - [Formatting text](#formatting-text) + - [Language, grammar, and tone](#language-grammar-and-tone) + - [Moving a document](#moving-a-document) + - [Punctuation](#punctuation) + - [Quotation](#quotation) +- [Markdown formatting](#markdown-and-formatting) + - [Code Blocks](code-blocks) + - [Emphasis](#emphasis) + - [Headings](#headings) + - [Horizontal Lines](#horizontal-lines) + - [Line Length](#line-length) + - [Links](#links) + - [Lists](#lists) + - [Metadata](#metadata) + - [Tables](#tables) +- [Attribution](#attribution) + + +## Cheatsheet + +### Cheatsheet: Content design, formatting, and language + +**[Contact information:](#contact-information)** +- Use official Kubernetes contact information. + +**[Dates and times:](#dates-and-times)** +- Format dates as `month day, year`. (December 13, 2018) +- When conveying a date in numerical form, use [ISO 8601] Format: `yyyy-mm-dd`. +- Use the 24 hour clock when referencing time. +- Times for single events (example: KubeCon) should be expressed in an absolute + time zone such as Pacific Standard Time (PST) or Coordinated Universal Time + (UTC). +- Times for reoccurring events should be expressed in a time zone that follows + Daylight Savings Time (DST) such as Pacific Time (PT) or Eastern Time (ET). +- Supply a link to a globally available time zone converter service. + - `http://www.thetimezoneconverter.com/?t=<TIME REFERENCE>&tz=<TZ REFERENCE>` + +**[Diagrams, images and other assets:](#diagrams-images-and-other-assets)** +- Images and other assets should be stored in the same directory as the document + that is referencing it. +- Filenames should be lowercase and descriptive of what they are referencing. +- Avoid excessively large images or include a smaller one while linking to a + higher resolution version of the same image. +- Use the [Kubernetes icon set] for architectural diagrams. + +**[Document Layout:](#document-layout)** +- Documents should follow the general template of: + - Document metadata (if appropriate). + - Title in `H1` (a single `#`). + - A brief description or summary of the document. + - A table of contents. + - The general body of document. +- Do not repeat content. Instead link back to the canonical source. +- Large content or topic shifts should be separated with a horizontal rule. + +**[Formatting text:](#formatting-text)** +- API objects: + - Follow the established [API naming convention] when referring to API Objects. + - Do not split API object names into their components. + - Use `code` style for API objects or object parameters. +- Use **bold text** for user interface elements. +- Use _italics_ to emphasize a new topic or subject for the first time. +- Use angle brackets (`<` and `>`) to enclose a placeholder reference. +- Apply `code` styling to: + - Filenames, directories, and paths. + - Command line examples and flags. + - Object field names. + +**[Language, grammar and tone:](#language)** +- Documentation should be written in English. +- Prefer an active voice and present tense when possible. +- Use simple and direct language. +- Use gender-neutral language. +- Avoid personal pronouns ("I," "we," "us," "our," and "ours"). +- Address the reader as "you" instead of "we". +- Do not use Latin phrases. +- Avoid jargon and idioms. +- If using acronyms, ensure they are clearly defined in the same document. +- If using an abbreviation, spell it out the first time it is used in the + document unless it is commonly known. (example: TCP/IP) + +**[Moving a document:](#moving-a-document)** +- Use `[git-mv]` to move documents. +- Commit moved documents separately from any other changes. +- When a document has moved, leave a tombstone file with a removal date in its + place. + +**[Punctuation:](#punctuation)** +- Do not use punctuation in headings. +- End full sentences with a period. + - **Exception:** When a sentence ends with a URL or if the text would be + unclear if the period is a part of the previous object or word. +- Add a single space after a period when beginning a new sentence. +- Avoid usage of exclamation points unless they are a part of a code example. +- Use an [Oxford comma] when a list contains 3 or more elements. + +**[Quotation:](#quotes)** +- Use double-quotation marks (`" "`) over single-quotation marks (`' '`). + - **Exception:** In code snippets where quotation marks have specific meaning. + - **Exception:** When nesting quotation marks inside another set of quotation + marks. +- Punctuation should be outside of quotation marks following the international + (British) standard. + + +### Cheatsheet: Markdown + +**[Code blocks:](#code-blocks)** +- When possible, reference the language at the beginning of a Code Block. +- When a code block is used to reference a shell, do not include the command + prompt (`$`). + - **Exception:** When a code block is used to display raw shell output. +- Separate commands from output. + +**[Emphasis:](#emphasis)** +- Use two asterisks (`**`) for **Bold** text. +- Use an underscore (`_`) for _Italics_. +- Use two tildes (`~~`) for ~~Strikethrough~~. + +**[Headings:](#headings)** +- Use a single `H1` (`#`) Heading per document. + - **Exception:** `H1` may be used multiple times in the same document when + there is a large content shift or "chapter" change. +- Follow the Header hierarchy of `H2` > `H3` > `H4` > `H5` > `H6`. +- Use sentence-style capitalization in titles (first word and proper nouns). +- Avoid using special characters. +- Leave exactly 1 new line after a heading. +- Avoid using links in headings. + +**[Horizontal rules:](#horizontal-lines)** +- Use three dashes (`---`) to denote a horizontal rule. +- Use a horizontal rule (`---`) to logically separate large sections. + +**[Line length:](#line-length)** +- Prefer an 80 character line limit. + +**[Links:](#links)** +- Prefer using reference style links over inline style links. +- When linking within the same directory, use a relative link. +- When linking to a document outside of the current directory, use the absolute + path from the root of the repository. +- When linking to a file in another Kubernetes github repository, use the + `k8s.io` url shortener. + - git.k8s.io -> github.com/kubernetes + - sigs.k8s.io -> github.com/kubernetes-sigs + +**[Lists:](#lists)** +- Capitalize the first character of each entry unless the item is explicitly + case sensitive. +- End each entry with a period if it is a sentence or phrase. +- Use a colon (`:`) to separate a list item name from the explanatory text. +- Leave a blank line after each list. +- Use `-` for unordered lists. +- For ordered lists repeating `1.` may be used. +- When inserting a code block into an ordered list, indent (space) an additional + two times. + +**[Metadata:](metadata)** +- If the document is intended to be surfaced on the Contributor Site; include a + yaml metadata header at the beginning of the document. +- Metadata must include the `title` attribute. + +**[Tables:](#tables)** +- Use tables for structured information. +- Tables do not need to adhere to the suggested line length. +- Avoid long inline links. +- Do not use excessively wide tables. + +--- + +## Content design, formatting, and language + +### Contact information + +- Use official Kubernetes contact information. + - Use official community contact email addresses. There should be no personal + or work contact information included in public documentation; instead use + addresses like the [SIG Google groups] or managed accounts such as + community@kubernetes.io. + - **Good example:** community@kubernetes.io + - **Bad example:** bob@example.com + + +### Dates and times + +The Kubernetes Contributor Community spans many regions and time zones. +Following a consistent pattern and avoiding shorthand improves the readability +for every member. + +- Format dates as `month day, year`. (December 13, 2018) + - **Good example:** October 24, 2018 + - **Bad example:** 10/24/18 +- When conveying a date in numerical form, use [ISO 8601] Format: `yyyy-mm-dd`. + - **Good example:** 2018-10-24 + - **Bad example:** 10/24/18 +- Use the 24 hour clock when referencing time. + - **Good example:** 15:30 + - **Bad example:** 3:30pm +- Times for single events (example: KubeCon) should be expressed in an absolute + time zone such as Pacific Standard Time (PST) or Coordinated Universal Time + (UTC). + - **Good example:** The Seattle Contributor Summit starts at 9:00 PST + - **Bad example:** The Seattle Contributor Summit starts at 9:00 PT +- Times for reoccurring events should be expressed in a time zone that follows + Daylight Savings Time (DST) such as Pacific Time (PT) or Eastern Time (ET). + - Times that follow DST are used as they adjusts automatically. If UTC or + other non-DST compatible time zones were used, content would have to be + updated multiple times per year to adjust times. + - **Good example:** 13:30 PT + - **Bad example:** 16:30 EST +- Supply a link to a globally available time zone converter service. + - `http://www.thetimezoneconverter.com/?t=<TIME REFERENCE>&tz=<TZ REFERENCE>` + + ``` + The weekly SIG meeting is at [13:30 PT]. + + [13:30 PT]: http://www.thetimezoneconverter.com/?t=13:30&tz=PT%20%28Pacific%20Time%29 + ``` + + +### Diagrams, images and other assets + +- Images and other assets should be stored in the same directory as the document + that is referencing it. +- Filenames should be lowercase and descriptive of what they are referencing. + - **Good example:** `deployment-workflow.jpg` + - **Bad example:** `image1.jpg` +- Avoid excessively large images or include a smaller one while linking to a + higher resolution version of the same image. +- Use the [Kubernetes icon set] for architectural diagrams. + + +### Document Layout + +Adhering to a standard document layout ensures that each page can intuitively +be navigated once a reader is familiar with the standard layout. + +- Documents should follow the general template of: + - Document metadata (if appropriate). + - Title in `H1` (a single `#`). + - A brief description or summary of the document. + - A table of contents. + - The general body of document. +- Do not repeat content. Instead link back to the canonical source. + - It is easy for content to become out of sync if it is maintained in + multiple places. Linking back to the canonical source ensures that the + documentation will be accurate and up to date. +- Large content or topic shifts should be separated with a horizontal rule. + + +### Formatting text + +The formatting guidelines have been selected to mirror or augment the +[Kubernetes documentation style-guide]. Remaining consistent across the +different content sources improves the overall readability and understanding of +the documentation being presented in addition to giving the project a unified +external appearance. + +- API objects: + - Follow the established [API naming convention] when referring to API Objects. + - Do not split API object names into their components. + - **Good example:** A `Pod` contains a `PodTemplateSpec`. + - **Bad example:** A `Pod` contains a `Pod Template Spec`. + - Use `code` style for API objects or object parameters. + - **Good example:** A `Deployment` contains a `DeploymentSpec`. + - **Bad example:** A Deployment contains a DeploymentSpec. +- Use angle brackets (`<` and `>`) to surround a placeholder references. + - **Good example:** `kubectl describe pod <pod-name>` + - **Bad example:** `kubectl describe pod pod-name` +- Use **bold text** for user interface elements. + - **Good example:** Select **Other**. + - **Bad example:** Select "Other". +- Use _italic text_ to emphasize a new subject for the first time. + - **Good example:** A _cluster_ is a set of nodes. + - **Bad example:** A "cluster" is a set of nodes. +- `Code` styling should be applied to: + - Filenames, directories, and paths. + - **Good example:** The default manifest path is `/etc/kubernetes/manifests`. + - **Bad example:** The default manifest path is /etc/kubernetes/manifests. + - Command line examples and flags. + - **Good example:** The flag `--advertise-address` is used to denote the + IP address on which to advertise the apiserver to members of the cluster. + - **Bad example:** The flag --advertise-address is used to denote the IP + address on which to advertise the apiserver to members of the cluster. + - Object field names. + - **Good example:** Set the `externalTrafficPolicy` to Local. + - **Bad example:** Set the externalTrafficPolicy to Local. + + +### Language, grammar and tone + +- Documentation should be written in English. +- Prefer an active voice and present tense when possible. + - Active voice is when the subject of the sentence performs the action. + Whereas with passive voice the subject receives the action. Writing with an + active voice in mind easily conveys to the reader who or what is performing + the action. + - **Good example:** Updating the Deployment triggers a new ReplicaSet to be + created. + - **Bad example:** A ReplicaSet is created by updating the Deployment. +- Use simple and direct language. + - Avoid using unnecessary or extra language. Be straightforward and direct. + - **Good example:** Wait for the Pod to start. + - **Bad example:** Please be patient and wait for the Pod to start. +- Use gender-neutral language. + - Avoid gendered pronouns preferring the [singular "they"][singular-they] + unless referring to the person's by their preferred gender. For further + information on the subject, see [Microsoft's guide to bias-free communication] + and [Wikipedia's entry for the Singular they]. + - **Good example:** chair or moderator + - **Bad example:** chairman +- Avoid personal pronouns ("I," "we," "us," "our," and "ours"). + - In most cases personal pronouns should be avoided as they can lead to + confusion regarding who they are referring to. + - **Good example:** The release-team shepherded the successful release of 1.13. + - **Bad example:** We shepherded the successful release of 1.13. +- Address the reader as "you" instead of "we". + - Addressing the reader directly using "you" clearly denotes the target. + There is no confusion as there would be with "we" or "us". + - **Good example:** You will create a new cluster with kubeadm. + - **Bad example:** We will create a new cluster with kubeadm. +- Do not use Latin phrases. + - [Latin phrases] can make it difficult for readers not familiar with them to + grasp their meaning. + - Some useful alternatives include: + + | Latin Phrase | Alternative | + |:------------:|:-----------:| + | e.g. | for example | + | et al. | and others | + | i.e. | that is | + | via | using | + + - **Good example:** For example Deployments, ReplicaSets... + - **Bad example:** e.g. Deployments, ReplicaSets... +- Avoid jargons and idioms. + - Jargon and idioms tend to rely on regional or tribal knowledge. They make + it difficult to understand for both newcomers and those whose native + language is something other than English. They should be avoided when + possible. + - **Good example:** Internally, the kube-apiserver... + - **Bad example:** Under the hood the kube-apiserver... + - **Good example:** We will start the project in early 2019. + - **Bad example:** We will kick off the initiative in 2019. +- If using an abbreviation, spell it out the first time it is used in the + document unless it is commonly known. (example: TCP/IP) + - Abbreviations in this context applies to abbreviations, acronyms and + initialisms. + - **Good example:** A _CustomResourceDefinition_ (CRD) extends the Kubernetes + API. + - **Bad example:** A CRD extends the Kubernetes API. + + +### Moving a Document + +- Use `[git-mv]` to move documents. + - `git-mv` will safely move/rename a file, directory, or symlink and + automatically update the git index. + - **Good example:** `git mv /old/mydoc.md /new/mydoc.md` + - **Bad example:** `mv /old/mydoc.md /new/mydoc.md` +- Commit moved documents separately from any other changes. + - A separate commit clearly preserves the history of the relocated documents + and makes it easier to review. +- When a document has moved, leave a tombstone file with a removal date in its + place. + - Tombstones function as a pointer and give users a time to update their own + documentation and bookmarks. Their usefulness is time-bounded and should be + removed when they would logically no longer serve their purpose. + ```markdown + This file has moved to https://git.k8s.io/community/contributors/guide/README.md. + + This file is a placeholder to preserve links. Please remove after 2019-03-10 or the release of kubernetes 1.10, whichever comes first. + ``` + + +### Punctuation + +- Do not use punctuation in headings. +- End full sentences with a period. + - **Exception:** When a sentence ends with a URL or if the text would be + unclear if the period is a part of the previous object or word. +- Add a single space after a period when beginning a new sentence. +- Avoid usage of exclamation points unless they are a part of a code example. +- Use an [Oxford comma] when a list contains 3 or more elements. + - **Good example:** Deployments, ReplicaSets, and DaemonSets. + - **Bad example:** Deployments, ReplicaSets and DaemonSets. + + +### Quotation + +- Use double-quotation marks (`" "`) over single-quotation marks (`' '`). + - **Exception:** In code snippets where quotation marks have specific meaning. + - **Exception:** When nesting quotation marks inside another set of quotation + marks. +- Punctuation should be outside of quotation marks following the international + (British) standard. + + +--- + + +## Markdown formatting + +### Code blocks + +- When possible, reference the language at the beginning of a Code Block. + - The two markdown renderers used by the Kubernetes community + ([GitHub][gh-code-hl-list] and [Hugo][hugo-code-hl-list]) support code + highlighting. This can be enabled by supplying the name of the language + after the three back-ticks (`` ``` ``) at the start of a code block. + - **Good example:** + ````` + ```go + import ( + "fmt" + ... + ) + ``` + ````` + - **Bad example:** + ````` + ``` + import ( + "fmt" + ... + ) + ``` + ````` +- When a code block is used to reference a shell, do not include the command + prompt (`$`) + - When a code block is referencing a shell, it is implied that it is a + command prompt. The exception to this is when a code block is being used + for raw shell output such as debug logs. + - **Good example:** + ``` + kubectl get pods -o wide + ``` + - **Bad example:** + ``` + $ kubectl get pods -o wide + ``` +- Separate commands from output. + - Separating the command from the output makes both the command and output + more generally readable. + - **Good example:** + ``` + kubectl get pods + ``` + ``` + NAME READY STATUS RESTARTS AGE IP NODE + nginx 1/1 Running 0 13s 10.200.0.4 worker0 + ``` + - **Bad example:** + ``` + kubectl get pods + NAME READY STATUS RESTARTS AGE IP NODE + nginx 1/1 Running 0 13s 10.200.0.4 worker0 + ``` + + +### Emphasis + +Markdown has multiple ways of indicating each type of emphasis. Adhering to a +standard across documentation improves supportability. + +- Use two asterisks (`**`) for **Bold** text. + - **Good example:** `This is **bold** text.` + - **Bad example:** `This should not be used for __bold__.` +- Use an underscore (`_`) for _Italics_. + - **Good example:** This is _italics_.` + - **Bad example:** This should not be used for *italics*.` +- Use two tildes (`~~`) for ~~Strikethrough~~. + - **Good example:** `This is ~~strikethrough~~` + - **Bad example:** `This should not be used for ~strikethrough~.` + + +### Headings + + Adhering to a standard across documentation improves both readability and + overall supportability across multiple documents. + +- Use a single `H1` (`#`) Heading per document. + - **Exception:** `H1` may be used multiple times in the same document when + there is a large content shift or "chapter" change. +- Follow the Header hierarchy of `H2` > `H3` > `H4` > `H5` > `H6`. +- Use sentence-style capitalization in titles (first word and proper nouns). +- Avoid using special characters. +- Leave exactly 1 new line after a heading. +- Avoid using links in headings. + + +### Horizontal rules + +Markdown has multiple ways of indicating a horizontal rule. Adhering to a +standard across documentation improves supportability. + +- Use three dashes (`---`) to denote a horizontal rule. + - **Good example:** `---` + - **Bad example:** `===` +- Use a horizontal rule (`---`) to logically separate large sections. + + +### Line length + +- Prefer an 80 character line limit. + - There is no specific general best practice for Markdown line length. The + commonly used 80 character guideline is preferable for general text review + and editing. + + +### Links + +Markdown provides two primary methods to link to content: inline links and +relative links. However, how and what they're being linked to can vary widely. + +- Prefer using reference style links over inline style links. + - Reference links are shorter and easier to read. They have the added benefit + of being reusable throughout the entire document. + - The link itself should be at the bottom of the document. If the document is + large or covers many topics, place the link at the end of the logical + chapter or section. + - **Example:** + ``` + See the [Code of Conduct] for more information. + + [code of conduct]: https://git.k8s.io/community/code-of-conduct.md + ``` + - **Example:** + ``` + See the [Code of Conduct][coc] for more information. + + [coc]: https://git.k8s.io/community/code-of-conduct.md + ``` +- When linking within the same directory, use a relative link. + - Links to files within the same directory are short and readable already. + They do not warrant expanding the full path. + - When the file is referenced multiple times within the same document, + consider using a reference link for a quicker shorthand reference. + - **Example:** + ``` + See the [Code of Conduct](code-of-conduct.md) for more information + ``` + - **Example:** + ``` + See the [Code of Conduct][coc] for more information + + [coc]: code-of-conduct.md + ``` +- When linking to a document outside of the current directory, use the absolute + path from the root of the repository. + - Using the absolute path ensures that if the source document is relocated, + the link to the target or destination document will remain intact and not + have to be updated. + - **Example:** + ``` + See the [Coding Convention] doc for more information. + + [Coding Convention]: /contributors/guide/coding-conventions.md + ``` +- When linking to a file in another Kubernetes github repository, use the + `k8s.io` url shortener. + - The shorthand version will auto-expand linking to documents within the + master branch and can be used for multiple purposes. + + | Short URL | Expanded URL | + |:-------------------:|:----------------------------------:| + | https://git.k8s.io | https://github.com/kubernetes | + | https://sigs.k8s.io | https://github.com/kubernetes-sigs | + + - **Example:** + ``` + The super cool [prow tool] resides in the test-infra repo under the kubernetes organization + + [prow tool]: https://git.k8s.io/test-infra/prow/README.md + ``` + + +### Lists + + Adhering to a standard across documentation improves both readability and + overall supportability across multiple documents. + +- Capitalize the first character of each entry unless the item is explicitly + case sensitive. +- End each entry with a period if it is a sentence or phrase. +- Use a colon (`:`) to separate a list item name from the explanatory text. +- Leave a blank line after each list. +- Use `-` for unordered lists. +- For ordered lists a repeating `1.` may be used. +- When inserting a code block into an ordered list, indent (space) an additional + two times. + + +### Metadata + +- If the document is intended to be surfaced on the Contributor Site; include a + yaml metadata header at the beginning of the document. + - If the document is to be added to the Contributor Site, adding metadata + at the beginning of the document will improve the overall presentation of + the information. This metadata is similar to the metadata used in the + KEP process and is often referred to as _Frontmatter_ in common static + site generators such as [Jekyll] and [Hugo]. + - The metadata header is a yaml block between two sets of `---`. + - **Example:** + ``` + --- + title: Super Awesome Doc + --- + ``` +- Metadata must include the `title` attribute. + - `title` will be used as the title of the document when rendered with + [Hugo]. + + +### Tables + +- Use tables for structured information. + - **Example:** + ``` + | Column 1 | Column 2 | Column 3 | + |:--------------:|:--------------:|:--------------:| + | test 1 | test 2 | test 3 | + | another test 1 | another test 2 | another test 3 | + ``` +- Tables do not need to adhere to the suggested line length. + - Markdown tables have an inherently longer line length, and cannot be + line wrapped. +- Avoid long inline links. + - Long inline links can make it difficult to work with markdown tables. + Prefer to use reference style links instead. +- Do not use excessively wide tables. + - Large wide tables do not render well. Try to break the information down + into something more easily presentable. + + +## Attribution + +This style guide is heavily influenced by the great work from the content +management teams from [SIG-Docs], [Gitlab], [Google], and [Microsoft]. Without +their previous efforts this guide would not be nearly as concise as it should. + +[community repository]: https://git.k8s.io/community +[Kubernetes documentation style-guide]: https://kubernetes.io/docs/contribute/style/style-guide/ +[SIG Google groups]: /sig-list.md +[ISO 8601]: https://en.wikipedia.org/wiki/ISO_8601 +[kubernetes icon set]: /icons +[API naming convention]: /contributors/devel/api-conventions.md#naming-conventions +[singular-they]: https://en.wikipedia.org/wiki/Singular_they +[Microsoft's guide to bias-free communication]: https://docs.microsoft.com/en-us/style-guide/bias-free-communication +[Wikipedia's entry for the Singular they]: https://en.wikipedia.org/wiki/Singular_they +[Latin phrases]: https://en.wikipedia.org/wiki/List_of_Latin_abbreviations +[Oxford comma]: https://www.grammarly.com/blog/what-is-the-oxford-comma-and-why-do-people-care-so-much-about-it/ +[gh-code-hl-list]: https://github.com/github/linguist/blob/master/lib/linguist/languages.yml +[hugo-code-hl-list]: http://www.rubycoloredglasses.com/2013/04/languages-supported-by-github-flavored-markdown/ +[git-mv]: https://git-scm.com/docs/git-mv +[jekyll]: https://jekyllrb.com/ +[hugo]: https://gohugo.io/ +[gitlab]: https://docs.gitlab.com/ee/development/documentation/styleguide.html +[google]: https://developers.google.com/style/ +[microsoft]: https://docs.microsoft.com/en-us/style-guide/welcome/ +[sig-docs]: https://kubernetes.io/docs/contribute/style/style-guide/ |
