diff options
| author | Kubernetes Prow Robot <k8s-ci-robot@users.noreply.github.com> | 2019-01-30 11:50:33 -0800 |
|---|---|---|
| committer | GitHub <noreply@github.com> | 2019-01-30 11:50:33 -0800 |
| commit | 362bc1c406a604dbe6a56e60146a67fcce56d5cf (patch) | |
| tree | 6592c8baa26a0ec129e25d8eb4fc597825355719 /contributors | |
| parent | 04f8eea1cf75f6bc0323217aabc81b99d4234bd8 (diff) | |
| parent | 03190b4bedac819dbf80e70e1574ed0f43ab2cda (diff) | |
Merge pull request #3173 from eduartua/issue-3064-grouping-by-sig-testing
Grouping /devel files by SIGs - SIG Testing
Diffstat (limited to 'contributors')
| -rw-r--r-- | contributors/devel/README.md | 4 | ||||
| -rw-r--r-- | contributors/devel/api_changes.md | 2 | ||||
| -rw-r--r-- | contributors/devel/bazel.md | 185 | ||||
| -rw-r--r-- | contributors/devel/development.md | 4 | ||||
| -rw-r--r-- | contributors/devel/e2e-tests.md | 765 | ||||
| -rw-r--r-- | contributors/devel/flaky-tests.md | 202 | ||||
| -rw-r--r-- | contributors/devel/gubernator.md | 137 | ||||
| -rw-r--r-- | contributors/devel/sig-testing/bazel.md | 184 | ||||
| -rw-r--r-- | contributors/devel/sig-testing/e2e-tests.md | 764 | ||||
| -rw-r--r-- | contributors/devel/sig-testing/flaky-tests.md | 201 | ||||
| -rw-r--r-- | contributors/devel/sig-testing/gubernator.md | 136 | ||||
| -rw-r--r-- | contributors/devel/sig-testing/testing.md | 227 | ||||
| -rw-r--r-- | contributors/devel/sig-testing/writing-good-e2e-tests.md | 231 | ||||
| -rw-r--r-- | contributors/devel/testing.md | 228 | ||||
| -rw-r--r-- | contributors/devel/writing-good-e2e-tests.md | 232 | ||||
| -rw-r--r-- | contributors/guide/README.md | 4 | ||||
| -rw-r--r-- | contributors/guide/coding-conventions.md | 4 | ||||
| -rw-r--r-- | contributors/guide/github-workflow.md | 6 |
18 files changed, 1767 insertions, 1749 deletions
diff --git a/contributors/devel/README.md b/contributors/devel/README.md index a0685b5e..ab1f9c74 100644 --- a/contributors/devel/README.md +++ b/contributors/devel/README.md @@ -24,12 +24,12 @@ Guide](http://kubernetes.io/docs/admin/). * **Development Guide** ([development.md](development.md)): Setting up your development environment. -* **Testing** ([testing.md](testing.md)): How to run unit, integration, and end-to-end tests in your development sandbox. +* **Testing** ([testing.md](sig-testing/testing.md)): How to run unit, integration, and end-to-end tests in your development sandbox. * **Conformance Testing** ([conformance-tests.md](conformance-tests.md)) What is conformance testing and how to create/manage them. -* **Hunting flaky tests** ([flaky-tests.md](flaky-tests.md)): We have a goal of 99.9% flake free tests. +* **Hunting flaky tests** ([flaky-tests.md](sig-testing/flaky-tests.md)): We have a goal of 99.9% flake free tests. Here's how to run your tests many times. * **Logging Conventions** ([logging.md](sig-instrumentation/logging.md)): Glog levels. diff --git a/contributors/devel/api_changes.md b/contributors/devel/api_changes.md index 1f46d298..bdbec963 100644 --- a/contributors/devel/api_changes.md +++ b/contributors/devel/api_changes.md @@ -732,7 +732,7 @@ doing! ## Write end-to-end tests -Check out the [E2E docs](e2e-tests.md) for detailed information about how to +Check out the [E2E docs](/contributors/devel/sig-testing/e2e-tests.md) for detailed information about how to write end-to-end tests for your feature. ## Examples and docs diff --git a/contributors/devel/bazel.md b/contributors/devel/bazel.md index 991a0ac2..502d32ee 100644 --- a/contributors/devel/bazel.md +++ b/contributors/devel/bazel.md @@ -1,184 +1,3 @@ -# Build and test with Bazel +This file has moved to https://git.k8s.io/community/contributors/devel/sig-testing/bazel.md. -Building and testing Kubernetes with Bazel is supported but not yet default. - -Bazel is used to run all Kubernetes PRs on [Prow](https://prow.k8s.io), -as remote caching enables significantly reduced build and test times. - -Some repositories (such as kubernetes/test-infra) have switched to using Bazel -exclusively for all build, test, and release workflows. - -Go rules are managed by the [`gazelle`](https://github.com/bazelbuild/rules_go/tree/master/go/tools/gazelle) -tool, with some additional rules managed by the [`kazel`](https://git.k8s.io/repo-infra/kazel) tool. -These tools are called via the `hack/update-bazel.sh` script. - -Instructions for installing Bazel -can be found [here](https://www.bazel.io/versions/master/docs/install.html). - -Several convenience `make` rules have been created for common operations: - -* `make bazel-build`: builds all binaries in tree (`bazel build -- //... - -//vendor/...`) -* `make bazel-test`: runs all unit tests (`bazel test --config=unit -- //... - //hack:verify-all -//build/... -//vendor/...`) -* `make bazel-test-integration`: runs all integration tests (`bazel test - --config integration //test/integration/...`) -* `make bazel-release`: builds release tarballs, Docker images (for server - components), and Debian images (`bazel build //build/release-tars`) - -You can also interact with Bazel directly; for example, to run all `kubectl` unit -tests, run - -```console -$ bazel test //pkg/kubectl/... -``` - -## Planter -If you don't want to install Bazel, you can instead try using the unofficial -[Planter](https://git.k8s.io/test-infra/planter) tool, -which runs Bazel inside a Docker container. - -For example, you can run -```console -$ ../test-infra/planter/planter.sh make bazel-test -$ ../test-infra/planter/planter.sh bazel build //cmd/kubectl -``` - -## Continuous Integration - -There are several bazel CI jobs: -* [ci-kubernetes-bazel-build](http://k8s-testgrid.appspot.com/google-unit#bazel-build): builds everything - with Bazel -* [ci-kubernetes-bazel-test](http://k8s-testgrid.appspot.com/google-unit#bazel-test): runs unit tests in - with Bazel - -Similar jobs are run on all PRs; additionally, several of the e2e jobs use -Bazel-built binaries when launching and testing Kubernetes clusters. - -## Updating `BUILD` files - -To update `BUILD` files, run: - -```console -$ ./hack/update-bazel.sh -``` - -To prevent Go rules from being updated, consult the [gazelle -documentation](https://github.com/bazelbuild/rules_go/tree/master/go/tools/gazelle). - -Note that much like Go files and `gofmt`, `BUILD` files have standardized, -opinionated style rules, and running `hack/update-bazel.sh` will format them for you. - -If you want to auto-format `BUILD` files in your editor, use of -[Buildifier](https://github.com/bazelbuild/buildtools/blob/master/buildifier/README.md) -is recommended. - -Updating the `BUILD` file for a package will be required when: -* Files are added to or removed from a package -* Import dependencies change for a package -* A `BUILD` file has been updated and needs to be reformatted -* A new `BUILD` file has been added (parent `BUILD` files will be updated) - -## Known issues and limitations - -### [Cross-compilation of cgo is not currently natively supported](https://github.com/bazelbuild/rules_go/issues/1020) -All binaries are currently built for the host OS and architecture running Bazel. -(For example, you can't currently target linux/amd64 from macOS or linux/s390x -from an amd64 machine.) - -The Go rules support cross-compilation of pure Go code using the `--platforms` -flag, and this is being used successfully in the kubernetes/test-infra repo. - -It may already be possible to cross-compile cgo code if a custom CC toolchain is -set up, possibly reusing the kube-cross Docker image, but this area needs -further exploration. - -### The CC toolchain is not fully hermetic -Bazel requires several tools and development packages to be installed in the system, including `gcc`, `g++`, `glibc and libstdc++ development headers` and `glibc static development libraries`. Please check your distribution for exact names of the packages. Examples for some commonly used distributions are below: - -| Dependency | Debian/Ubuntu | CentOS | OpenSuSE | -|:---------------------:|-------------------------------|--------------------------------|-----------------------------------------| -| Build essentials | `apt install build-essential` | `yum groupinstall development` | `zypper install -t pattern devel_C_C++` | -| GCC C++ | `apt install g++` | `yum install gcc-c++` | `zypper install gcc-c++` | -| GNU Libc static files | `apt install libc6-dev` | `yum install glibc-static` | `zypper install glibc-devel-static` | - -If any of these packages change, they may also cause spurious build failures -as described in [this issue](https://github.com/bazelbuild/bazel/issues/4907). - -An example error might look something like -``` -ERROR: undeclared inclusion(s) in rule '//vendor/golang.org/x/text/cases:go_default_library.cgo_c_lib': -this rule is missing dependency declarations for the following files included by 'vendor/golang.org/x/text/cases/linux_amd64_stripped/go_default_library.cgo_codegen~/_cgo_export.c': - '/usr/lib/gcc/x86_64-linux-gnu/7/include/stddef.h' -``` - -The only way to recover from this error is to force Bazel to regenerate its -automatically-generated CC toolchain configuration by running `bazel clean ---expunge`. - -Improving cgo cross-compilation may help with all of this. - -### Changes to Go imports requires updating BUILD files -The Go rules in `BUILD` and `BUILD.bazel` files must be updated any time files -are added or removed or Go imports are changed. These rules are automatically -maintained by `gazelle`, which is run via `hack/update-bazel.sh`, but this is -still a source of friction. - -[Autogazelle](https://github.com/bazelbuild/bazel-gazelle/tree/master/cmd/autogazelle) -is a new experimental tool which may reduce or remove the need for developers -to run `hack/update-bazel.sh`, but no work has yet been done to support it in -kubernetes/kubernetes. - -### Code coverage support is incomplete for Go -Bazel and the Go rules have limited support for code coverage. Running something -like `bazel coverage -- //... -//vendor/...` will run tests in coverage mode, -but no report summary is currently generated. It may be possible to combine -`bazel coverage` with -[Gopherage](https://github.com/kubernetes/test-infra/tree/master/gopherage), -however. - -### Kubernetes code generators are not fully supported -The make-based build system in kubernetes/kubernetes runs several code -generators at build time: -* [conversion-gen](https://github.com/kubernetes/code-generator/tree/master/cmd/conversion-gen) -* [deepcopy-gen](https://github.com/kubernetes/code-generator/tree/master/cmd/deepcopy-gen) -* [defaulter-gen](https://github.com/kubernetes/code-generator/tree/master/cmd/defaulter-gen) -* [openapi-gen](https://github.com/kubernetes/kube-openapi/tree/master/cmd/openapi-gen) -* [go-bindata](https://github.com/jteeuwen/go-bindata/tree/master/go-bindata) - -Of these, only `openapi-gen` and `go-bindata` are currently supported when -building Kubernetes with Bazel. - -The `go-bindata` generated code is produced by hand-written genrules. - -The other code generators use special build tags of the form `// -+k8s:generator-name=arg`; for example, input files to the openapi-gen tool are -specified with `// +k8s:openapi-gen=true`. - -`kazel` is used to find all packages that require OpenAPI generation, and then a -handwritten genrule consumes this list of packages to run `openapi-gen`. - -For `openapi-gen`, a single output file is produced in a single Go package, which -makes this fairly compatible with Bazel. -All other Kubernetes code generators generally produce one output file per input -package, which is less compatible with the Bazel workflow. - -The make-based build system batches up all input packages into one call to the -code generator binary, but this is inefficient for Bazel's incrementality, as a -change in one package may result in unnecessarily recompiling many other -packages. -On the other hand, calling the code generator binary multiple times is less -efficient than calling it once, since many of the generators parse the tree for -Go type information and other metadata. - -One additional challenge is that many of the code generators add additional -Go imports which `gazelle` (and `autogazelle`) cannot infer, and so they must be -explicitly added as dependencies in the `BUILD` files. - -Kubernetes has even more code generators than this limited list, but the rest -are generally run as `hack/update-*.sh` scripts and checked into the repository, -and so are not immediately needed for Bazel parity. - -## Contacts -For help or discussion, join the [#bazel](https://kubernetes.slack.com/messages/bazel) -channel on Kubernetes Slack. +This file is a placeholder to preserve links. Please remove by April 30, 2019 or the release of kubernetes 1.13, whichever comes first.
\ No newline at end of file diff --git a/contributors/devel/development.md b/contributors/devel/development.md index 60bb883c..1d093485 100644 --- a/contributors/devel/development.md +++ b/contributors/devel/development.md @@ -186,7 +186,7 @@ To check out code to work on, please refer to [this guide](/contributors/guide/g [build/common.sh]: https://git.k8s.io/kubernetes/build/common.sh [e2e-image]: https://git.k8s.io/test-infra/jenkins/e2e-image [etcd-latest]: https://coreos.com/etcd/docs/latest -[etcd-install]: testing.md#install-etcd-dependency +[etcd-install]: sig-testing/testing.md#install-etcd-dependency <!-- https://github.com/coreos/etcd/releases --> [go-workspace]: https://golang.org/doc/code.html#Workspaces [issue]: https://github.com/kubernetes/kubernetes/issues @@ -194,4 +194,4 @@ To check out code to work on, please refer to [this guide](/contributors/guide/g [kubernetes.io]: https://kubernetes.io [mercurial]: http://mercurial.selenic.com/wiki/Download [test-image]: https://git.k8s.io/test-infra/jenkins/test-image -[Build with Bazel]: bazel.md +[Build with Bazel]: sig-testing/bazel.md diff --git a/contributors/devel/e2e-tests.md b/contributors/devel/e2e-tests.md index e01a896f..31d589f6 100644 --- a/contributors/devel/e2e-tests.md +++ b/contributors/devel/e2e-tests.md @@ -1,764 +1,3 @@ -# End-to-End Testing in Kubernetes +This file has moved to https://git.k8s.io/community/contributors/devel/sig-testing/e2e-tests.md. -**Table of Contents** - -- [End-to-End Testing in Kubernetes](#end-to-end-testing-in-kubernetes) - - [Overview](#overview) - - [Building Kubernetes and Running the Tests](#building-kubernetes-and-running-the-tests) - - [Cleaning up](#cleaning-up) - - [Advanced testing](#advanced-testing) - - [Extracting a specific version of kubernetes](#extracting-a-specific-version-of-kubernetes) - - [Bringing up a cluster for testing](#bringing-up-a-cluster-for-testing) - - [Federation e2e tests](#federation-e2e-tests) - - [Configuring federation e2e tests](#configuring-federation-e2e-tests) - - [Image Push Repository](#image-push-repository) - - [Build](#build) - - [Deploy federation control plane](#deploy-federation-control-plane) - - [Run the Tests](#run-the-tests) - - [Teardown](#teardown) - - [Shortcuts for test developers](#shortcuts-for-test-developers) - - [Debugging clusters](#debugging-clusters) - - [Local clusters](#local-clusters) - - [Testing against local clusters](#testing-against-local-clusters) - - [Version-skewed and upgrade testing](#version-skewed-and-upgrade-testing) - - [Test jobs naming convention](#test-jobs-naming-convention) - - [Kinds of tests](#kinds-of-tests) - - [Viper configuration and hierarchichal test parameters.](#viper-configuration-and-hierarchichal-test-parameters) - - [Conformance tests](#conformance-tests) - - [Continuous Integration](#continuous-integration) - - [What is CI?](#what-is-ci) - - [What runs in CI?](#what-runs-in-ci) - - [Non-default tests](#non-default-tests) - - [The PR-builder](#the-pr-builder) - - [Adding a test to CI](#adding-a-test-to-ci) - - [Moving a test out of CI](#moving-a-test-out-of-ci) - - [Performance Evaluation](#performance-evaluation) - - [One More Thing](#one-more-thing) - - -## Overview - -End-to-end (e2e) tests for Kubernetes provide a mechanism to test end-to-end -behavior of the system, and is the last signal to ensure end user operations -match developer specifications. Although unit and integration tests provide a -good signal, in a distributed system like Kubernetes it is not uncommon that a -minor change may pass all unit and integration tests, but cause unforeseen -changes at the system level. - -The primary objectives of the e2e tests are to ensure a consistent and reliable -behavior of the kubernetes code base, and to catch hard-to-test bugs before -users do, when unit and integration tests are insufficient. - -The e2e tests in kubernetes are built atop of -[Ginkgo](http://onsi.github.io/ginkgo/) and -[Gomega](http://onsi.github.io/gomega/). There are a host of features that this -Behavior-Driven Development (BDD) testing framework provides, and it is -recommended that the developer read the documentation prior to diving into the - tests. - -The purpose of *this* document is to serve as a primer for developers who are -looking to execute or add tests using a local development environment. - -Before writing new tests or making substantive changes to existing tests, you -should also read [Writing Good e2e Tests](writing-good-e2e-tests.md) - -## Building Kubernetes and Running the Tests - -There are a variety of ways to run e2e tests, but we aim to decrease the number -of ways to run e2e tests to a canonical way: `kubetest`. - -You can install `kubetest` as follows: -```sh -go get -u k8s.io/test-infra/kubetest -``` - -You can run an end-to-end test which will bring up a master and nodes, perform -some tests, and then tear everything down. Make sure you have followed the -getting started steps for your chosen cloud platform (which might involve -changing the --provider flag value to something other than "gce"). - -You can quickly recompile the e2e testing framework via `go install ./test/e2e`. -This will not do anything besides allow you to verify that the go code compiles. -If you want to run your e2e testing framework without re-provisioning the e2e setup, -you can do so via `make WHAT=test/e2e/e2e.test`, and then re-running the ginkgo tests. - -To build Kubernetes, up a cluster, run tests, and tear everything down, use: - -```sh -kubetest --build --up --test --down -``` - -If you'd like to just perform one of these steps, here are some examples: - -```sh -# Build binaries for testing -kubetest --build - -# Create a fresh cluster. Deletes a cluster first, if it exists -kubetest --up - -# Run all tests -kubetest --test - -# Run tests matching the regex "\[Feature:Performance\]" against a local cluster -# Specify "--provider=local" flag when running the tests locally -kubetest --test --test_args="--ginkgo.focus=\[Feature:Performance\]" --provider=local - -# Conversely, exclude tests that match the regex "Pods.*env" -kubetest --test --test_args="--ginkgo.skip=Pods.*env" - -# Run tests in parallel, skip any that must be run serially -GINKGO_PARALLEL=y kubetest --test --test_args="--ginkgo.skip=\[Serial\]" - -# Run tests in parallel, skip any that must be run serially and keep the test namespace if test failed -GINKGO_PARALLEL=y kubetest --test --test_args="--ginkgo.skip=\[Serial\] --delete-namespace-on-failure=false" - -# Flags can be combined, and their actions will take place in this order: -# --build, --up, --test, --down -# -# You can also specify an alternative provider, such as 'aws' -# -# e.g.: -kubetest --provider=aws --build --up --test --down - -# -ctl can be used to quickly call kubectl against your e2e cluster. Useful for -# cleaning up after a failed test or viewing logs. -# kubectl output is default on, you can use --verbose-commands=false to suppress output. -kubetest -ctl='get events' -kubetest -ctl='delete pod foobar' -``` - -The tests are built into a single binary which can be used to deploy a -Kubernetes system or run tests against an already-deployed Kubernetes system. -See `kubetest --help` (or the flag definitions in `hack/e2e.go`) for -more options, such as reusing an existing cluster. - -### Cleaning up - -During a run, pressing `control-C` should result in an orderly shutdown, but if -something goes wrong and you still have some VMs running you can force a cleanup -with this command: - -```sh -kubetest --down -``` - -## Advanced testing - -### Extracting a specific version of kubernetes - -The `kubetest` binary can download and extract a specific version of kubernetes, -both the server, client and test binaries. The `--extract=E` flag enables this -functionality. - -There are a variety of values to pass this flag: - -```sh -# Official builds: <ci|release>/<latest|stable>[-N.N] -kubetest --extract=ci/latest --up # Deploy the latest ci build. -kubetest --extract=ci/latest-1.5 --up # Deploy the latest 1.5 CI build. -kubetest --extract=release/latest --up # Deploy the latest RC. -kubetest --extract=release/stable-1.5 --up # Deploy the 1.5 release. - -# A specific version: -kubetest --extract=v1.5.1 --up # Deploy 1.5.1 -kubetest --extract=v1.5.2-beta.0 --up # Deploy 1.5.2-beta.0 -kubetest --extract=gs://foo/bar --up # --stage=gs://foo/bar - -# Whatever GKE is using (gke, gke-staging, gke-test): -kubetest --extract=gke --up # Deploy whatever GKE prod uses - -# Using a GCI version: -kubetest --extract=gci/gci-canary --up # Deploy the version for next gci release -kubetest --extract=gci/gci-57 # Deploy the version bound to gci m57 -kubetest --extract=gci/gci-57/ci/latest # Deploy the latest CI build using gci m57 for the VM image - -# Reuse whatever is already built -kubetest --up # Most common. Note, no extract flag -kubetest --build --up # Most common. Note, no extract flag -kubetest --build --stage=gs://foo/bar --extract=local --up # Extract the staged version -``` - -### Bringing up a cluster for testing - -If you want, you may bring up a cluster in some other manner and run tests -against it. To do so, or to do other non-standard test things, you can pass -arguments into Ginkgo using `--test_args` (e.g. see above). For the purposes of -brevity, we will look at a subset of the options, which are listed below: - -``` ---ginkgo.dryRun=false: If set, ginkgo will walk the test hierarchy without -actually running anything. - ---ginkgo.failFast=false: If set, ginkgo will stop running a test suite after a -failure occurs. - ---ginkgo.failOnPending=false: If set, ginkgo will mark the test suite as failed -if any specs are pending. - ---ginkgo.focus="": If set, ginkgo will only run specs that match this regular -expression. - ---ginkgo.noColor="n": If set to "y", ginkgo will not use color in the output - ---ginkgo.skip="": If set, ginkgo will only run specs that do not match this -regular expression. - ---ginkgo.trace=false: If set, default reporter prints out the full stack trace -when a failure occurs - ---ginkgo.v=false: If set, default reporter print out all specs as they begin. - ---host="": The host, or api-server, to connect to - ---kubeconfig="": Path to kubeconfig containing embedded authinfo. - ---provider="": The name of the Kubernetes provider (gce, gke, local, vagrant, -etc.) - ---repo-root="../../": Root directory of kubernetes repository, for finding test -files. -``` - -Prior to running the tests, you may want to first create a simple auth file in -your home directory, e.g. `$HOME/.kube/config`, with the following: - -``` -{ - "User": "root", - "Password": "" -} -``` - -As mentioned earlier there are a host of other options that are available, but -they are left to the developer. - -**NOTE:** If you are running tests on a local cluster repeatedly, you may need -to periodically perform some manual cleanup: - - - `rm -rf /var/run/kubernetes`, clear kube generated credentials, sometimes -stale permissions can cause problems. - - - `sudo iptables -F`, clear ip tables rules left by the kube-proxy. - -### Reproducing failures in flaky tests -You can run a test repeatedly until it fails. This is useful when debugging -flaky tests. In order to do so, you need to set the following environment -variable: -```sh -$ export GINKGO_UNTIL_IT_FAILS=true -``` - -After setting the environment variable, you can run the tests as before. The e2e -script adds `--untilItFails=true` to ginkgo args if the environment variable is -set. The flags asks ginkgo to run the test repeatedly until it fails. - -### Federation e2e tests - -By default, `e2e.go` provisions a single Kubernetes cluster, and any `Feature:Federation` ginkgo tests will be skipped. - -Federation e2e testing involve bringing up multiple "underlying" Kubernetes clusters, -and deploying the federation control plane as a Kubernetes application on the underlying clusters. - -The federation e2e tests are still managed via `e2e.go`, but require some extra configuration items. - -#### Configuring federation e2e tests - -The following environment variables will enable federation e2e building, provisioning and testing. - -```sh -$ export FEDERATION=true -$ export E2E_ZONES="us-central1-a us-central1-b us-central1-f" -``` - -A Kubernetes cluster will be provisioned in each zone listed in `E2E_ZONES`. A zone can only appear once in the `E2E_ZONES` list. - -#### Image Push Repository - -Next, specify the docker repository where your ci images will be pushed. - -* **If `--provider=gce` or `--provider=gke`**: - - If you use the same GCP project where you to run the e2e tests as the container image repository, - FEDERATION_PUSH_REPO_BASE environment variable will be defaulted to "gcr.io/${DEFAULT_GCP_PROJECT_NAME}". - You can skip ahead to the **Build** section. - - You can simply set your push repo base based on your project name, and the necessary repositories will be - auto-created when you first push your container images. - - ```sh - $ export FEDERATION_PUSH_REPO_BASE="gcr.io/${GCE_PROJECT_NAME}" - ``` - - Skip ahead to the **Build** section. - -* **For all other providers**: - - You'll be responsible for creating and managing access to the repositories manually. - - ```sh - $ export FEDERATION_PUSH_REPO_BASE="quay.io/colin_hom" - ``` - - Given this example, the `federation-apiserver` container image will be pushed to the repository - `quay.io/colin_hom/federation-apiserver`. - - The docker client on the machine running `e2e.go` must have push access for the following pre-existing repositories: - - * `${FEDERATION_PUSH_REPO_BASE}/federation-apiserver` - * `${FEDERATION_PUSH_REPO_BASE}/federation-controller-manager` - - These repositories must allow public read access, as the e2e node docker daemons will not have any credentials. If you're using - GCE/GKE as your provider, the repositories will have read-access by default. - -#### Build - -* Compile the binaries and build container images: - - ```sh - $ KUBE_RELEASE_RUN_TESTS=n KUBE_FASTBUILD=true kubetest -build - ``` - -* Push the federation container images - - ```sh - $ federation/develop/push-federation-images.sh - ``` - -#### Deploy federation control plane - -The following command will create the underlying Kubernetes clusters in each of `E2E_ZONES`, and then provision the -federation control plane in the cluster occupying the last zone in the `E2E_ZONES` list. - -```sh -$ kubetest --up -``` - -#### Run the Tests - -This will run only the `Feature:Federation` e2e tests. You can omit the `ginkgo.focus` argument to run the entire e2e suite. - -```sh -$ kubetest --test --test_args="--ginkgo.focus=\[Feature:Federation\]" -``` - -#### Teardown - -```sh -$ kubetest --down -``` - -#### Shortcuts for test developers - -* To speed up `--up`, provision a single-node kubernetes cluster in a single e2e zone: - - `NUM_NODES=1 E2E_ZONES="us-central1-f"` - - Keep in mind that some tests may require multiple underlying clusters and/or minimum compute resource availability. - -* If you're hacking around with the federation control plane deployment itself, - you can quickly re-deploy the federation control plane Kubernetes manifests without tearing any resources down. - To re-deploy the federation control plane after running `--up` for the first time: - - ```sh - $ federation/cluster/federation-up.sh - ``` - -### Debugging clusters - -If a cluster fails to initialize, or you'd like to better understand cluster -state to debug a failed e2e test, you can use the `cluster/log-dump.sh` script -to gather logs. - -This script requires that the cluster provider supports ssh. Assuming it does, -running: - -```sh -$ federation/cluster/log-dump.sh <directory> -``` - -will ssh to the master and all nodes and download a variety of useful logs to -the provided directory (which should already exist). - -The Google-run Jenkins builds automatically collected these logs for every -build, saving them in the `artifacts` directory uploaded to GCS. - -### Local clusters - -It can be much faster to iterate on a local cluster instead of a cloud-based -one. To start a local cluster, you can run: - -```sh -# The PATH construction is needed because PATH is one of the special-cased -# environment variables not passed by sudo -E -sudo PATH=$PATH hack/local-up-cluster.sh -``` - -This will start a single-node Kubernetes cluster than runs pods using the local -docker daemon. Press Control-C to stop the cluster. - -You can generate a valid kubeconfig file by following instructions printed at the -end of aforementioned script. - -#### Testing against local clusters - -In order to run an E2E test against a locally running cluster, first make sure -to have a local build of the tests: - -```sh -kubetest --build -``` - -Then point the tests at a custom host directly: - -```sh -export KUBECONFIG=/path/to/kubeconfig -kubetest --provider=local --test -``` - -To control the tests that are run: - -```sh -kubetest --provider=local --test --test_args="--ginkgo.focus=Secrets" -``` - -You will also likely need to specify `minStartupPods` to match the number of -nodes in your cluster. If you're testing against a cluster set up by -`local-up-cluster.sh`, you will need to do the following: - -```sh -kubetest --provider=local --test --test_args="--minStartupPods=1 --ginkgo.focus=Secrets" -``` - -### Version-skewed and upgrade testing - -We run version-skewed tests to check that newer versions of Kubernetes work -similarly enough to older versions. The general strategy is to cover the following cases: - -1. One version of `kubectl` with another version of the cluster and tests (e.g. - that v1.2 and v1.4 `kubectl` doesn't break v1.3 tests running against a v1.3 - cluster). -1. A newer version of the Kubernetes master with older nodes and tests (e.g. - that upgrading a master to v1.3 with nodes at v1.2 still passes v1.2 tests). -1. A newer version of the whole cluster with older tests (e.g. that a cluster - upgraded---master and nodes---to v1.3 still passes v1.2 tests). -1. That an upgraded cluster functions the same as a brand-new cluster of the - same version (e.g. a cluster upgraded to v1.3 passes the same v1.3 tests as - a newly-created v1.3 cluster). - -[kubetest](https://git.k8s.io/test-infra/kubetest) is -the authoritative source on how to run version-skewed tests, but below is a -quick-and-dirty tutorial. - -```sh -# Assume you have two copies of the Kubernetes repository checked out, at -# ./kubernetes and ./kubernetes_old - -# If using GKE: -export CLUSTER_API_VERSION=${OLD_VERSION} - -# Deploy a cluster at the old version; see above for more details -cd ./kubernetes_old -kubetest --up - -# Upgrade the cluster to the new version -# -# If using GKE, add --upgrade-target=${NEW_VERSION} -# -# You can target Feature:MasterUpgrade or Feature:ClusterUpgrade -cd ../kubernetes -kubetest --provider=gke --test --check-version-skew=false --test_args="--ginkgo.focus=\[Feature:MasterUpgrade\]" - -# Run old tests with new kubectl -cd ../kubernetes_old -kubetest --provider=gke --test --test_args="--kubectl-path=$(pwd)/../kubernetes/cluster/kubectl.sh" -``` - -If you are just testing version-skew, you may want to just deploy at one -version and then test at another version, instead of going through the whole -upgrade process: - -```sh -# With the same setup as above - -# Deploy a cluster at the new version -cd ./kubernetes -kubetest --up - -# Run new tests with old kubectl -kubetest --test --test_args="--kubectl-path=$(pwd)/../kubernetes_old/cluster/kubectl.sh" - -# Run old tests with new kubectl -cd ../kubernetes_old -kubetest --test --test_args="--kubectl-path=$(pwd)/../kubernetes/cluster/kubectl.sh" -``` - -#### Test jobs naming convention - -**Version skew tests** are named as -`<cloud-provider>-<master&node-version>-<kubectl-version>-<image-name>-kubectl-skew` -e.g: `gke-1.5-1.6-cvm-kubectl-skew` means cloud provider is GKE; -master and nodes are built from `release-1.5` branch; -`kubectl` is built from `release-1.6` branch; -image name is cvm (container_vm). -The test suite is always the older one in version skew tests. e.g. from release-1.5 in this case. - -**Upgrade tests**: - -If a test job name ends with `upgrade-cluster`, it means we first upgrade -the cluster (i.e. master and nodes) and then run the old test suite with new kubectl. - -If a test job name ends with `upgrade-cluster-new`, it means we first upgrade -the cluster (i.e. master and nodes) and then run the new test suite with new kubectl. - -If a test job name ends with `upgrade-master`, it means we first upgrade -the master and keep the nodes in old version and then run the old test suite with new kubectl. - -There are some examples in the table, -where `->` means upgrading; container_vm (cvm) and gci are image names. - -| test name | test suite | master version (image) | node version (image) | kubectl -| --------- | :--------: | :----: | :---:| :---: -| gce-1.5-1.6-upgrade-cluster | 1.5 | 1.5->1.6 | 1.5->1.6 | 1.6 -| gce-1.5-1.6-upgrade-cluster-new | 1.6 | 1.5->1.6 | 1.5->1.6 | 1.6 -| gce-1.5-1.6-upgrade-master | 1.5 | 1.5->1.6 | 1.5 | 1.6 -| gke-container_vm-1.5-container_vm-1.6-upgrade-cluster | 1.5 | 1.5->1.6 (cvm) | 1.5->1.6 (cvm) | 1.6 -| gke-gci-1.5-container_vm-1.6-upgrade-cluster-new | 1.6 | 1.5->1.6 (gci) | 1.5->1.6 (cvm) | 1.6 -| gke-gci-1.5-container_vm-1.6-upgrade-master | 1.5 | 1.5->1.6 (gci) | 1.5 (cvm) | 1.6 - -## Kinds of tests - -We are working on implementing clearer partitioning of our e2e tests to make -running a known set of tests easier (#10548). Tests can be labeled with any of -the following labels, in order of increasing precedence (that is, each label -listed below supersedes the previous ones): - - - If a test has no labels, it is expected to run fast (under five minutes), be -able to be run in parallel, and be consistent. - - - `[Slow]`: If a test takes more than five minutes to run (by itself or in -parallel with many other tests), it is labeled `[Slow]`. This partition allows -us to run almost all of our tests quickly in parallel, without waiting for the -stragglers to finish. - - - `[Serial]`: If a test cannot be run in parallel with other tests (e.g. it -takes too many resources or restarts nodes), it is labeled `[Serial]`, and -should be run in serial as part of a separate suite. - - - `[Disruptive]`: If a test restarts components that might cause other tests -to fail or break the cluster completely, it is labeled `[Disruptive]`. Any -`[Disruptive]` test is also assumed to qualify for the `[Serial]` label, but -need not be labeled as both. These tests are not run against soak clusters to -avoid restarting components. - - - `[Flaky]`: If a test is found to be flaky and we have decided that it's too -hard to fix in the short term (e.g. it's going to take a full engineer-week), it -receives the `[Flaky]` label until it is fixed. The `[Flaky]` label should be -used very sparingly, and should be accompanied with a reference to the issue for -de-flaking the test, because while a test remains labeled `[Flaky]`, it is not -monitored closely in CI. `[Flaky]` tests are by default not run, unless a -`focus` or `skip` argument is explicitly given. - - - `[Feature:.+]`: If a test has non-default requirements to run or targets -some non-core functionality, and thus should not be run as part of the standard -suite, it receives a `[Feature:.+]` label, e.g. `[Feature:Performance]` or -`[Feature:Ingress]`. `[Feature:.+]` tests are not run in our core suites, -instead running in custom suites. If a feature is experimental or alpha and is -not enabled by default due to being incomplete or potentially subject to -breaking changes, it does *not* block PR merges, and thus should run in -some separate test suites owned by the feature owner(s) -(see [Continuous Integration](#continuous-integration) below). - - - `[Conformance]`: Designate that this test is included in the Conformance -test suite for [Conformance Testing](conformance-tests.md). This test must -meet a number of [requirements](conformance-tests.md#conformance-test-requirements) -to be eligible for this tag. This tag does not supersed any other labels. - - - `[LinuxOnly]`: If a test is known to be using Linux-specific features -(e.g.: seLinuxOptions) or is unable to run on Windows nodes, it is labeled -`[LinuxOnly]`. When using Windows nodes, this tag should be added to the -`skip` argument. - - - The following tags are not considered to be exhaustively applied, but are -intended to further categorize existing `[Conformance]` tests, or tests that are -being considered as candidate for promotion to `[Conformance]` as we work to -refine requirements: - - `[Privileged]`: This is a test that requires privileged access - - `[Internet]`: This is a test that assumes access to the public internet - - `[Deprecated]`: This is a test that exercises a deprecated feature - - `[Alpha]`: This is a test that exercises an alpha feature - - `[Beta]`: This is a test that exercises a beta feature - -Every test should be owned by a [SIG](/sig-list.md), -and have a corresponding `[sig-<name>]` label. - -### Viper configuration and hierarchichal test parameters. - -The future of e2e test configuration idioms will be increasingly defined using viper, and decreasingly via flags. - -Flags in general fall apart once tests become sufficiently complicated. So, even if we could use another flag library, it wouldn't be ideal. - -To use viper, rather than flags, to configure your tests: - -- Just add "e2e.json" to the current directory you are in, and define parameters in it... i.e. `"kubeconfig":"/tmp/x"`. - -Note that advanced testing parameters, and hierarchichally defined parameters, are only defined in viper, to see what they are, you can dive into [TestContextType](https://git.k8s.io/kubernetes/test/e2e/framework/test_context.go). - -In time, it is our intent to add or autogenerate a sample viper configuration that includes all e2e parameters, to ship with kubernetes. - -### Conformance tests - -For more information on Conformance tests please see the [Conformance Testing](conformance-tests.md) - -## Continuous Integration - -A quick overview of how we run e2e CI on Kubernetes. - -### What is CI? - -We run a battery of [release-blocking jobs](https://k8s-testgrid.appspot.com/sig-release-master-blocking) -against `HEAD` of the master branch on a continuous basis, and block merges -via [Tide](https://git.k8s.io/test-infra/prow/cmd/tide) on a subset of those -tests if they fail. - -CI results can be found at [ci-test.k8s.io](http://ci-test.k8s.io), e.g. -[ci-test.k8s.io/kubernetes-e2e-gce/10594](http://ci-test.k8s.io/kubernetes-e2e-gce/10594). - -### What runs in CI? - -We run all default tests (those that aren't marked `[Flaky]` or `[Feature:.+]`) -against GCE and GKE. To minimize the time from regression-to-green-run, we -partition tests across different jobs: - - - `kubernetes-e2e-<provider>` runs all non-`[Slow]`, non-`[Serial]`, -non-`[Disruptive]`, non-`[Flaky]`, non-`[Feature:.+]` tests in parallel. - - - `kubernetes-e2e-<provider>-slow` runs all `[Slow]`, non-`[Serial]`, -non-`[Disruptive]`, non-`[Flaky]`, non-`[Feature:.+]` tests in parallel. - - - `kubernetes-e2e-<provider>-serial` runs all `[Serial]` and `[Disruptive]`, -non-`[Flaky]`, non-`[Feature:.+]` tests in serial. - -We also run non-default tests if the tests exercise general-availability ("GA") -features that require a special environment to run in, e.g. -`kubernetes-e2e-gce-scalability` and `kubernetes-kubemark-gce`, which test for -Kubernetes performance. - -#### Non-default tests - -Many `[Feature:.+]` tests we don't run in CI. These tests are for features that -are experimental (often in the `experimental` API), and aren't enabled by -default. - -### The PR-builder - -We also run a battery of tests against every PR before we merge it. These tests -are equivalent to `kubernetes-gce`: it runs all non-`[Slow]`, non-`[Serial]`, -non-`[Disruptive]`, non-`[Flaky]`, non-`[Feature:.+]` tests in parallel. These -tests are considered "smoke tests" to give a decent signal that the PR doesn't -break most functionality. Results for your PR can be found at -[pr-test.k8s.io](http://pr-test.k8s.io), e.g. -[pr-test.k8s.io/20354](http://pr-test.k8s.io/20354) for #20354. - -### Adding a test to CI - -As mentioned above, prior to adding a new test, it is a good idea to perform a -`-ginkgo.dryRun=true` on the system, in order to see if a behavior is already -being tested, or to determine if it may be possible to augment an existing set -of tests for a specific use case. - -If a behavior does not currently have coverage and a developer wishes to add a -new e2e test, navigate to the ./test/e2e directory and create a new test using -the existing suite as a guide. - -**NOTE:** To build/run with tests in a new directory within ./test/e2e, add the -directory to import list in ./test/e2e/e2e_test.go - -TODO(#20357): Create a self-documented example which has been disabled, but can -be copied to create new tests and outlines the capabilities and libraries used. - -When writing a test, consult #kinds-of-tests above to determine how your test -should be marked, (e.g. `[Slow]`, `[Serial]`; remember, by default we assume a -test can run in parallel with other tests!). - -When first adding a test it should *not* go straight into CI, because failures -block ordinary development. A test should only be added to CI after is has been -running in some non-CI suite long enough to establish a track record showing -that the test does not fail when run against *working* software. Note also that -tests running in CI are generally running on a well-loaded cluster, so must -contend for resources; see above about [kinds of tests](#kinds_of_tests). - -Generally, a feature starts as `experimental`, and will be run in some suite -owned by the team developing the feature. If a feature is in beta or GA, it -*should* block PR merges and releases. In moving from experimental to beta or GA, tests -that are expected to pass by default should simply remove the `[Feature:.+]` -label, and will be incorporated into our core suites. If tests are not expected -to pass by default, (e.g. they require a special environment such as added -quota,) they should remain with the `[Feature:.+]` label. - -Occasionally, we'll want to add tests to better exercise features that are -already GA. These tests also shouldn't go straight to CI. They should begin by -being marked as `[Flaky]` to be run outside of CI, and once a track-record for -them is established, they may be promoted out of `[Flaky]`. - -### Moving a test out of CI - -If we have determined that a test is known-flaky and cannot be fixed in the -short-term, we may move it out of CI indefinitely. This move should be used -sparingly, as it effectively means that we have no coverage of that test. When a -test is demoted, it should be marked `[Flaky]` with a comment accompanying the -label with a reference to an issue opened to fix the test. - -## Performance Evaluation - -Another benefit of the e2e tests is the ability to create reproducible loads on -the system, which can then be used to determine the responsiveness, or analyze -other characteristics of the system. For example, the density tests load the -system to 30,50,100 pods per/node and measures the different characteristics of -the system, such as throughput, api-latency, etc. - -For a good overview of how we analyze performance data, please read the -following [post](https://kubernetes.io/blog/2015/09/kubernetes-performance-measurements-and/) - -For developers who are interested in doing their own performance analysis, we -recommend setting up [prometheus](http://prometheus.io/) for data collection, -and using [grafana](https://prometheus.io/docs/visualization/grafana/) to -visualize the data. There also exists the option of pushing your own metrics in -from the tests using a -[prom-push-gateway](http://prometheus.io/docs/instrumenting/pushing/). -Containers for all of these components can be found -[here](https://hub.docker.com/u/prom/). - -For more accurate measurements, you may wish to set up prometheus external to -kubernetes in an environment where it can access the major system components -(api-server, controller-manager, scheduler). This is especially useful when -attempting to gather metrics in a load-balanced api-server environment, because -all api-servers can be analyzed independently as well as collectively. On -startup, configuration file is passed to prometheus that specifies the endpoints -that prometheus will scrape, as well as the sampling interval. - -``` -#prometheus.conf -job: { - name: "kubernetes" - scrape_interval: "1s" - target_group: { - # apiserver(s) - target: "http://localhost:8080/metrics" - # scheduler - target: "http://localhost:10251/metrics" - # controller-manager - target: "http://localhost:10252/metrics" - } -} -``` - -Once prometheus is scraping the kubernetes endpoints, that data can then be -plotted using promdash, and alerts can be created against the assortment of -metrics that kubernetes provides. - -## One More Thing - -You should also know the [testing conventions](../guide/coding-conventions.md#testing-conventions). - -**HAPPY TESTING!** +This file is a placeholder to preserve links. Please remove by April 30, 2019 or the release of kubernetes 1.13, whichever comes first.
\ No newline at end of file diff --git a/contributors/devel/flaky-tests.md b/contributors/devel/flaky-tests.md index 14302592..7f238095 100644 --- a/contributors/devel/flaky-tests.md +++ b/contributors/devel/flaky-tests.md @@ -1,201 +1,3 @@ -# Flaky tests - -Any test that fails occasionally is "flaky". Since our merges only proceed when -all tests are green, and we have a number of different CI systems running the -tests in various combinations, even a small percentage of flakes results in a -lot of pain for people waiting for their PRs to merge. - -Therefore, it's very important that we write tests defensively. Situations that -"almost never happen" happen with some regularity when run thousands of times in -resource-constrained environments. Since flakes can often be quite hard to -reproduce while still being common enough to block merges occasionally, it's -additionally important that the test logs be useful for narrowing down exactly -what caused the failure. - -Note that flakes can occur in unit tests, integration tests, or end-to-end -tests, but probably occur most commonly in end-to-end tests. - -## Hunting Flakes - -You may notice lots of your PRs or ones you watch are having a common -pre-submit failure, but less frequent issues that are still of concern take -more analysis over time. There are metrics recorded and viewable in: -- [TestGrid](https://k8s-testgrid.appspot.com/presubmits-kubernetes-blocking#Summary) -- [Velodrome](http://velodrome.k8s.io/dashboard/db/bigquery-metrics?orgId=1) - -It is worth noting tests are going to fail in presubmit a lot due -to unbuildable code, but that wont happen as much on the same commit unless -there's a true issue in the code or a broader problem like a dep failed to -pull in. - -## Filing issues for flaky tests - -Because flakes may be rare, it's very important that all relevant logs be -discoverable from the issue. - -1. Search for the test name. If you find an open issue and you're 90% sure the - flake is exactly the same, add a comment instead of making a new issue. -2. If you make a new issue, you should title it with the test name, prefixed by - "e2e/unit/integration flake:" (whichever is appropriate) -3. Reference any old issues you found in step one. Also, make a comment in the - old issue referencing your new issue, because people monitoring only their - email do not see the backlinks github adds. Alternatively, tag the person or - people who most recently worked on it. -4. Paste, in block quotes, the entire log of the individual failing test, not - just the failure line. -5. Link to durable storage with the rest of the logs. This means (for all the - tests that Google runs) the GCS link is mandatory! The Jenkins test result - link is nice but strictly optional: not only does it expire more quickly, - it's not accessible to non-Googlers. - -## Finding failed flaky test cases - -Find flaky tests issues on GitHub under the [kind/flake issue label][flake]. -There are significant numbers of flaky tests reported on a regular basis and P2 -flakes are under-investigated. Fixing flakes is a quick way to gain expertise -and community goodwill. - -[flake]: https://github.com/kubernetes/kubernetes/issues?q=is%3Aopen+is%3Aissue+label%3Akind%2Fflake - -## Expectations when a flaky test is assigned to you - -Note that we won't randomly assign these issues to you unless you've opted in or -you're part of a group that has opted in. We are more than happy to accept help -from anyone in fixing these, but due to the severity of the problem when merges -are blocked, we need reasonably quick turn-around time on test flakes. Therefore -we have the following guidelines: - -1. If a flaky test is assigned to you, it's more important than anything else - you're doing unless you can get a special dispensation (in which case it will - be reassigned). If you have too many flaky tests assigned to you, or you - have such a dispensation, then it's *still* your responsibility to find new - owners (this may just mean giving stuff back to the relevant Team or SIG Lead). -2. You should make a reasonable effort to reproduce it. Somewhere between an - hour and half a day of concentrated effort is "reasonable". It is perfectly - reasonable to ask for help! -3. If you can reproduce it (or it's obvious from the logs what happened), you - should then be able to fix it, or in the case where someone is clearly more - qualified to fix it, reassign it with very clear instructions. -4. Once you have made a change that you believe fixes a flake, it is conservative - to keep the issue for the flake open and see if it manifests again after the - change is merged. -5. If you can't reproduce a flake: __don't just close it!__ Every time a flake comes - back, at least 2 hours of merge time is wasted. So we need to make monotonic - progress towards narrowing it down every time a flake occurs. If you can't - figure it out from the logs, add log messages that would have help you figure - it out. If you make changes to make a flake more reproducible, please link - your pull request to the flake you're working on. -6. If a flake has been open, could not be reproduced, and has not manifested in - 3 months, it is reasonable to close the flake issue with a note saying - why. - -# Reproducing unit test flakes - -Try the [stress command](https://godoc.org/golang.org/x/tools/cmd/stress). - -Just - -``` -$ go install golang.org/x/tools/cmd/stress -``` - -Then build your test binary - -``` -$ go test -c -race -``` - -Then run it under stress - -``` -$ stress ./package.test -test.run=FlakyTest -``` - -It runs the command and writes output to `/tmp/gostress-*` files when it fails. -It periodically reports with run counts. Be careful with tests that use the -`net/http/httptest` package; they could exhaust the available ports on your -system! - -# Hunting flaky unit tests in Kubernetes - -Sometimes unit tests are flaky. This means that due to (usually) race -conditions, they will occasionally fail, even though most of the time they pass. - -We have a goal of 99.9% flake free tests. This means that there is only one -flake in one thousand runs of a test. - -Running a test 1000 times on your own machine can be tedious and time consuming. -Fortunately, there is a better way to achieve this using Kubernetes. - -_Note: these instructions are mildly hacky for now, as we get run once semantics -and logging they will get better_ - -There is a testing image `brendanburns/flake` up on the docker hub. We will use -this image to test our fix. - -Create a replication controller with the following config: - -```yaml -apiVersion: v1 -kind: ReplicationController -metadata: - name: flakecontroller -spec: - replicas: 24 - template: - metadata: - labels: - name: flake - spec: - containers: - - name: flake - image: brendanburns/flake - env: - - name: TEST_PACKAGE - value: pkg/tools - - name: REPO_SPEC - value: https://github.com/kubernetes/kubernetes -``` - -Note that we omit the labels and the selector fields of the replication -controller, because they will be populated from the labels field of the pod -template by default. - -```sh -kubectl create -f ./controller.yaml -``` - -This will spin up 24 instances of the test. They will run to completion, then -exit, and the kubelet will restart them, accumulating more and more runs of the -test. - -You can examine the recent runs of the test by calling `docker ps -a` and -looking for tasks that exited with non-zero exit codes. Unfortunately, docker -ps -a only keeps around the exit status of the last 15-20 containers with the -same image, so you have to check them frequently. - -You can use this script to automate checking for failures, assuming your cluster -is running on GCE and has four nodes: - -```sh -echo "" > output.txt -for i in {1..4}; do - echo "Checking kubernetes-node-${i}" - echo "kubernetes-node-${i}:" >> output.txt - gcloud compute ssh "kubernetes-node-${i}" --command="sudo docker ps -a" >> output.txt -done -grep "Exited ([^0])" output.txt -``` - -Eventually you will have sufficient runs for your purposes. At that point you -can delete the replication controller by running: - -```sh -kubectl delete replicationcontroller flakecontroller -``` - -If you do a final check for flakes with `docker ps -a`, ignore tasks that -exited -1, since that's what happens when you stop the replication controller. - -Happy flake hunting! +This file has moved to https://git.k8s.io/community/contributors/devel/sig-testing/flaky-tests.md. +This file is a placeholder to preserve links. Please remove by April 30, 2019 or the release of kubernetes 1.13, whichever comes first.
\ No newline at end of file diff --git a/contributors/devel/gubernator.md b/contributors/devel/gubernator.md index b03d11a1..34cb58fb 100644 --- a/contributors/devel/gubernator.md +++ b/contributors/devel/gubernator.md @@ -1,136 +1,3 @@ -# Gubernator +This file has moved to https://git.k8s.io/community/contributors/devel/sig-testing/gubernator.md. -*This document is oriented at developers who want to use Gubernator to debug while developing for Kubernetes.* - - -- [Gubernator](#gubernator) - - [What is Gubernator?](#what-is-gubernator) - - [Gubernator Features](#gubernator-features) - - [Test Failures list](#test-failures-list) - - [Log Filtering](#log-filtering) - - [Gubernator for Local Tests](#gubernator-for-local-tests) - - [Future Work](#future-work) - - -## What is Gubernator? - -[Gubernator](https://k8s-gubernator.appspot.com/) is a webpage for viewing and filtering Kubernetes -test results. - -Gubernator simplifies the debugging process and makes it easier to track down failures by automating many -steps commonly taken in searching through logs, and by offering tools to filter through logs to find relevant lines. -Gubernator automates the steps of finding the failed tests, displaying relevant logs, and determining the -failed pods and the corresponding pod UID, namespace, and container ID. -It also allows for filtering of the log files to display relevant lines based on selected keywords, and -allows for multiple logs to be woven together by timestamp. - -Gubernator runs on Google App Engine and fetches logs stored on Google Cloud Storage. - -## Gubernator Features - -### Test Failures list - -Comments made by k8s-ci-robot will post a link to a page listing the failed tests. -Each failed test comes with the corresponding error log from a junit file and a link -to filter logs for that test. - -Based on the message logged in the junit file, the pod name may be displayed. - - - -[Test Failures List Example](https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/kubernetes-e2e-gke/11721) - -### Log Filtering - -The log filtering page comes with checkboxes and textboxes to aid in filtering. Filtered keywords will be bolded -and lines including keywords will be highlighted. Up to four lines around the line of interest will also be displayed. - - - -If less than 100 lines are skipped, the "... skipping xx lines ..." message can be clicked to expand and show -the hidden lines. - -Before expansion: - -After expansion: - - -If the pod name was displayed in the Test Failures list, it will automatically be included in the filters. -If it is not found in the error message, it can be manually entered into the textbox. Once a pod name -is entered, the Pod UID, Namespace, and ContainerID may be automatically filled in as well. These can be -altered as well. To apply the filter, check off the options corresponding to the filter. - - - -To add a filter, type the term to be filtered into the textbox labeled "Add filter:" and press enter. -Additional filters will be displayed as checkboxes under the textbox. - - - -To choose which logs to view check off the checkboxes corresponding to the logs of interest. If multiple logs are -included, the "Weave by timestamp" option can weave the selected logs together based on the timestamp in each line. - - - -[Log Filtering Example 1](https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/kubelet-gce-e2e-ci/5535/nodelog?pod=pod-configmaps-b5b876cb-3e1e-11e6-8956-42010af0001d&junit=junit_03.xml&wrap=on&logfiles=%2Fkubernetes-jenkins%2Flogs%2Fkubelet-gce-e2e-ci%2F5535%2Fartifacts%2Ftmp-node-e2e-7a5a3b40-e2e-node-coreos-stable20160622-image%2Fkube-apiserver.log&logfiles=%2Fkubernetes-jenkins%2Flogs%2Fkubelet-gce-e2e-ci%2F5535%2Fartifacts%2Ftmp-node-e2e-7a5a3b40-e2e-node-coreos-stable20160622-image%2Fkubelet.log&UID=on&poduid=b5b8a59e-3e1e-11e6-b358-42010af0001d&ns=e2e-tests-configmap-oi12h&cID=tmp-node-e2e-7a5a3b40-e2e-node-coreos-stable20160622-image) - -[Log Filtering Example 2](https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/kubernetes-e2e-gke/11721/nodelog?pod=client-containers-a53f813c-503e-11e6-88dd-0242ac110003&junit=junit_19.xml&wrap=on) - - -### Gubernator for Local Tests - -*Currently Gubernator can only be used with remote node e2e tests.* - -**NOTE: Using Gubernator with local tests will publicly upload your test logs to Google Cloud Storage** - -To use Gubernator to view logs from local test runs, set the GUBERNATOR tag to true. -A URL link to view the test results will be printed to the console. -Please note that running with the Gubernator tag will bypass the user confirmation for uploading to GCS. - -```console - -$ make test-e2e-node REMOTE=true GUBERNATOR=true -... -================================================================ -Running gubernator.sh - -Gubernator linked below: -k8s-gubernator.appspot.com/build/yourusername-g8r-logs/logs/e2e-node/timestamp -``` - -The gubernator.sh script can be run after running a remote node e2e test for the same effect. - -```console -$ ./test/e2e_node/gubernator.sh -Do you want to run gubernator.sh and upload logs publicly to GCS? [y/n]y -... -Gubernator linked below: -k8s-gubernator.appspot.com/build/yourusername-g8r-logs/logs/e2e-node/timestamp -``` - -## Future Work - -Gubernator provides a framework for debugging failures and introduces useful features. -There is still a lot of room for more features and growth to make the debugging process more efficient. - -How to contribute (see https://git.k8s.io/test-infra/gubernator/README.md) - -* Extend GUBERNATOR flag to all local tests - -* More accurate identification of pod name, container ID, etc. - * Change content of logged strings for failures to include more information - * Better regex in Gubernator - -* Automate discovery of more keywords - * Volume Name - * Disk Name - * Pod IP - -* Clickable API objects in the displayed lines in order to add them as filters - -* Construct story of pod's lifetime - * Have concise view of what a pod went through from when pod was started to failure - -* Improve UI - * Have separate folders of logs in rows instead of in one long column - * Improve interface for adding additional features (maybe instead of textbox and checkbox, have chips) +This file is a placeholder to preserve links. Please remove by April 30, 2019 or the release of kubernetes 1.13, whichever comes first.
\ No newline at end of file diff --git a/contributors/devel/sig-testing/bazel.md b/contributors/devel/sig-testing/bazel.md new file mode 100644 index 00000000..991a0ac2 --- /dev/null +++ b/contributors/devel/sig-testing/bazel.md @@ -0,0 +1,184 @@ +# Build and test with Bazel + +Building and testing Kubernetes with Bazel is supported but not yet default. + +Bazel is used to run all Kubernetes PRs on [Prow](https://prow.k8s.io), +as remote caching enables significantly reduced build and test times. + +Some repositories (such as kubernetes/test-infra) have switched to using Bazel +exclusively for all build, test, and release workflows. + +Go rules are managed by the [`gazelle`](https://github.com/bazelbuild/rules_go/tree/master/go/tools/gazelle) +tool, with some additional rules managed by the [`kazel`](https://git.k8s.io/repo-infra/kazel) tool. +These tools are called via the `hack/update-bazel.sh` script. + +Instructions for installing Bazel +can be found [here](https://www.bazel.io/versions/master/docs/install.html). + +Several convenience `make` rules have been created for common operations: + +* `make bazel-build`: builds all binaries in tree (`bazel build -- //... + -//vendor/...`) +* `make bazel-test`: runs all unit tests (`bazel test --config=unit -- //... + //hack:verify-all -//build/... -//vendor/...`) +* `make bazel-test-integration`: runs all integration tests (`bazel test + --config integration //test/integration/...`) +* `make bazel-release`: builds release tarballs, Docker images (for server + components), and Debian images (`bazel build //build/release-tars`) + +You can also interact with Bazel directly; for example, to run all `kubectl` unit +tests, run + +```console +$ bazel test //pkg/kubectl/... +``` + +## Planter +If you don't want to install Bazel, you can instead try using the unofficial +[Planter](https://git.k8s.io/test-infra/planter) tool, +which runs Bazel inside a Docker container. + +For example, you can run +```console +$ ../test-infra/planter/planter.sh make bazel-test +$ ../test-infra/planter/planter.sh bazel build //cmd/kubectl +``` + +## Continuous Integration + +There are several bazel CI jobs: +* [ci-kubernetes-bazel-build](http://k8s-testgrid.appspot.com/google-unit#bazel-build): builds everything + with Bazel +* [ci-kubernetes-bazel-test](http://k8s-testgrid.appspot.com/google-unit#bazel-test): runs unit tests in + with Bazel + +Similar jobs are run on all PRs; additionally, several of the e2e jobs use +Bazel-built binaries when launching and testing Kubernetes clusters. + +## Updating `BUILD` files + +To update `BUILD` files, run: + +```console +$ ./hack/update-bazel.sh +``` + +To prevent Go rules from being updated, consult the [gazelle +documentation](https://github.com/bazelbuild/rules_go/tree/master/go/tools/gazelle). + +Note that much like Go files and `gofmt`, `BUILD` files have standardized, +opinionated style rules, and running `hack/update-bazel.sh` will format them for you. + +If you want to auto-format `BUILD` files in your editor, use of +[Buildifier](https://github.com/bazelbuild/buildtools/blob/master/buildifier/README.md) +is recommended. + +Updating the `BUILD` file for a package will be required when: +* Files are added to or removed from a package +* Import dependencies change for a package +* A `BUILD` file has been updated and needs to be reformatted +* A new `BUILD` file has been added (parent `BUILD` files will be updated) + +## Known issues and limitations + +### [Cross-compilation of cgo is not currently natively supported](https://github.com/bazelbuild/rules_go/issues/1020) +All binaries are currently built for the host OS and architecture running Bazel. +(For example, you can't currently target linux/amd64 from macOS or linux/s390x +from an amd64 machine.) + +The Go rules support cross-compilation of pure Go code using the `--platforms` +flag, and this is being used successfully in the kubernetes/test-infra repo. + +It may already be possible to cross-compile cgo code if a custom CC toolchain is +set up, possibly reusing the kube-cross Docker image, but this area needs +further exploration. + +### The CC toolchain is not fully hermetic +Bazel requires several tools and development packages to be installed in the system, including `gcc`, `g++`, `glibc and libstdc++ development headers` and `glibc static development libraries`. Please check your distribution for exact names of the packages. Examples for some commonly used distributions are below: + +| Dependency | Debian/Ubuntu | CentOS | OpenSuSE | +|:---------------------:|-------------------------------|--------------------------------|-----------------------------------------| +| Build essentials | `apt install build-essential` | `yum groupinstall development` | `zypper install -t pattern devel_C_C++` | +| GCC C++ | `apt install g++` | `yum install gcc-c++` | `zypper install gcc-c++` | +| GNU Libc static files | `apt install libc6-dev` | `yum install glibc-static` | `zypper install glibc-devel-static` | + +If any of these packages change, they may also cause spurious build failures +as described in [this issue](https://github.com/bazelbuild/bazel/issues/4907). + +An example error might look something like +``` +ERROR: undeclared inclusion(s) in rule '//vendor/golang.org/x/text/cases:go_default_library.cgo_c_lib': +this rule is missing dependency declarations for the following files included by 'vendor/golang.org/x/text/cases/linux_amd64_stripped/go_default_library.cgo_codegen~/_cgo_export.c': + '/usr/lib/gcc/x86_64-linux-gnu/7/include/stddef.h' +``` + +The only way to recover from this error is to force Bazel to regenerate its +automatically-generated CC toolchain configuration by running `bazel clean +--expunge`. + +Improving cgo cross-compilation may help with all of this. + +### Changes to Go imports requires updating BUILD files +The Go rules in `BUILD` and `BUILD.bazel` files must be updated any time files +are added or removed or Go imports are changed. These rules are automatically +maintained by `gazelle`, which is run via `hack/update-bazel.sh`, but this is +still a source of friction. + +[Autogazelle](https://github.com/bazelbuild/bazel-gazelle/tree/master/cmd/autogazelle) +is a new experimental tool which may reduce or remove the need for developers +to run `hack/update-bazel.sh`, but no work has yet been done to support it in +kubernetes/kubernetes. + +### Code coverage support is incomplete for Go +Bazel and the Go rules have limited support for code coverage. Running something +like `bazel coverage -- //... -//vendor/...` will run tests in coverage mode, +but no report summary is currently generated. It may be possible to combine +`bazel coverage` with +[Gopherage](https://github.com/kubernetes/test-infra/tree/master/gopherage), +however. + +### Kubernetes code generators are not fully supported +The make-based build system in kubernetes/kubernetes runs several code +generators at build time: +* [conversion-gen](https://github.com/kubernetes/code-generator/tree/master/cmd/conversion-gen) +* [deepcopy-gen](https://github.com/kubernetes/code-generator/tree/master/cmd/deepcopy-gen) +* [defaulter-gen](https://github.com/kubernetes/code-generator/tree/master/cmd/defaulter-gen) +* [openapi-gen](https://github.com/kubernetes/kube-openapi/tree/master/cmd/openapi-gen) +* [go-bindata](https://github.com/jteeuwen/go-bindata/tree/master/go-bindata) + +Of these, only `openapi-gen` and `go-bindata` are currently supported when +building Kubernetes with Bazel. + +The `go-bindata` generated code is produced by hand-written genrules. + +The other code generators use special build tags of the form `// ++k8s:generator-name=arg`; for example, input files to the openapi-gen tool are +specified with `// +k8s:openapi-gen=true`. + +`kazel` is used to find all packages that require OpenAPI generation, and then a +handwritten genrule consumes this list of packages to run `openapi-gen`. + +For `openapi-gen`, a single output file is produced in a single Go package, which +makes this fairly compatible with Bazel. +All other Kubernetes code generators generally produce one output file per input +package, which is less compatible with the Bazel workflow. + +The make-based build system batches up all input packages into one call to the +code generator binary, but this is inefficient for Bazel's incrementality, as a +change in one package may result in unnecessarily recompiling many other +packages. +On the other hand, calling the code generator binary multiple times is less +efficient than calling it once, since many of the generators parse the tree for +Go type information and other metadata. + +One additional challenge is that many of the code generators add additional +Go imports which `gazelle` (and `autogazelle`) cannot infer, and so they must be +explicitly added as dependencies in the `BUILD` files. + +Kubernetes has even more code generators than this limited list, but the rest +are generally run as `hack/update-*.sh` scripts and checked into the repository, +and so are not immediately needed for Bazel parity. + +## Contacts +For help or discussion, join the [#bazel](https://kubernetes.slack.com/messages/bazel) +channel on Kubernetes Slack. diff --git a/contributors/devel/sig-testing/e2e-tests.md b/contributors/devel/sig-testing/e2e-tests.md new file mode 100644 index 00000000..e01a896f --- /dev/null +++ b/contributors/devel/sig-testing/e2e-tests.md @@ -0,0 +1,764 @@ +# End-to-End Testing in Kubernetes + +**Table of Contents** + +- [End-to-End Testing in Kubernetes](#end-to-end-testing-in-kubernetes) + - [Overview](#overview) + - [Building Kubernetes and Running the Tests](#building-kubernetes-and-running-the-tests) + - [Cleaning up](#cleaning-up) + - [Advanced testing](#advanced-testing) + - [Extracting a specific version of kubernetes](#extracting-a-specific-version-of-kubernetes) + - [Bringing up a cluster for testing](#bringing-up-a-cluster-for-testing) + - [Federation e2e tests](#federation-e2e-tests) + - [Configuring federation e2e tests](#configuring-federation-e2e-tests) + - [Image Push Repository](#image-push-repository) + - [Build](#build) + - [Deploy federation control plane](#deploy-federation-control-plane) + - [Run the Tests](#run-the-tests) + - [Teardown](#teardown) + - [Shortcuts for test developers](#shortcuts-for-test-developers) + - [Debugging clusters](#debugging-clusters) + - [Local clusters](#local-clusters) + - [Testing against local clusters](#testing-against-local-clusters) + - [Version-skewed and upgrade testing](#version-skewed-and-upgrade-testing) + - [Test jobs naming convention](#test-jobs-naming-convention) + - [Kinds of tests](#kinds-of-tests) + - [Viper configuration and hierarchichal test parameters.](#viper-configuration-and-hierarchichal-test-parameters) + - [Conformance tests](#conformance-tests) + - [Continuous Integration](#continuous-integration) + - [What is CI?](#what-is-ci) + - [What runs in CI?](#what-runs-in-ci) + - [Non-default tests](#non-default-tests) + - [The PR-builder](#the-pr-builder) + - [Adding a test to CI](#adding-a-test-to-ci) + - [Moving a test out of CI](#moving-a-test-out-of-ci) + - [Performance Evaluation](#performance-evaluation) + - [One More Thing](#one-more-thing) + + +## Overview + +End-to-end (e2e) tests for Kubernetes provide a mechanism to test end-to-end +behavior of the system, and is the last signal to ensure end user operations +match developer specifications. Although unit and integration tests provide a +good signal, in a distributed system like Kubernetes it is not uncommon that a +minor change may pass all unit and integration tests, but cause unforeseen +changes at the system level. + +The primary objectives of the e2e tests are to ensure a consistent and reliable +behavior of the kubernetes code base, and to catch hard-to-test bugs before +users do, when unit and integration tests are insufficient. + +The e2e tests in kubernetes are built atop of +[Ginkgo](http://onsi.github.io/ginkgo/) and +[Gomega](http://onsi.github.io/gomega/). There are a host of features that this +Behavior-Driven Development (BDD) testing framework provides, and it is +recommended that the developer read the documentation prior to diving into the + tests. + +The purpose of *this* document is to serve as a primer for developers who are +looking to execute or add tests using a local development environment. + +Before writing new tests or making substantive changes to existing tests, you +should also read [Writing Good e2e Tests](writing-good-e2e-tests.md) + +## Building Kubernetes and Running the Tests + +There are a variety of ways to run e2e tests, but we aim to decrease the number +of ways to run e2e tests to a canonical way: `kubetest`. + +You can install `kubetest` as follows: +```sh +go get -u k8s.io/test-infra/kubetest +``` + +You can run an end-to-end test which will bring up a master and nodes, perform +some tests, and then tear everything down. Make sure you have followed the +getting started steps for your chosen cloud platform (which might involve +changing the --provider flag value to something other than "gce"). + +You can quickly recompile the e2e testing framework via `go install ./test/e2e`. +This will not do anything besides allow you to verify that the go code compiles. +If you want to run your e2e testing framework without re-provisioning the e2e setup, +you can do so via `make WHAT=test/e2e/e2e.test`, and then re-running the ginkgo tests. + +To build Kubernetes, up a cluster, run tests, and tear everything down, use: + +```sh +kubetest --build --up --test --down +``` + +If you'd like to just perform one of these steps, here are some examples: + +```sh +# Build binaries for testing +kubetest --build + +# Create a fresh cluster. Deletes a cluster first, if it exists +kubetest --up + +# Run all tests +kubetest --test + +# Run tests matching the regex "\[Feature:Performance\]" against a local cluster +# Specify "--provider=local" flag when running the tests locally +kubetest --test --test_args="--ginkgo.focus=\[Feature:Performance\]" --provider=local + +# Conversely, exclude tests that match the regex "Pods.*env" +kubetest --test --test_args="--ginkgo.skip=Pods.*env" + +# Run tests in parallel, skip any that must be run serially +GINKGO_PARALLEL=y kubetest --test --test_args="--ginkgo.skip=\[Serial\]" + +# Run tests in parallel, skip any that must be run serially and keep the test namespace if test failed +GINKGO_PARALLEL=y kubetest --test --test_args="--ginkgo.skip=\[Serial\] --delete-namespace-on-failure=false" + +# Flags can be combined, and their actions will take place in this order: +# --build, --up, --test, --down +# +# You can also specify an alternative provider, such as 'aws' +# +# e.g.: +kubetest --provider=aws --build --up --test --down + +# -ctl can be used to quickly call kubectl against your e2e cluster. Useful for +# cleaning up after a failed test or viewing logs. +# kubectl output is default on, you can use --verbose-commands=false to suppress output. +kubetest -ctl='get events' +kubetest -ctl='delete pod foobar' +``` + +The tests are built into a single binary which can be used to deploy a +Kubernetes system or run tests against an already-deployed Kubernetes system. +See `kubetest --help` (or the flag definitions in `hack/e2e.go`) for +more options, such as reusing an existing cluster. + +### Cleaning up + +During a run, pressing `control-C` should result in an orderly shutdown, but if +something goes wrong and you still have some VMs running you can force a cleanup +with this command: + +```sh +kubetest --down +``` + +## Advanced testing + +### Extracting a specific version of kubernetes + +The `kubetest` binary can download and extract a specific version of kubernetes, +both the server, client and test binaries. The `--extract=E` flag enables this +functionality. + +There are a variety of values to pass this flag: + +```sh +# Official builds: <ci|release>/<latest|stable>[-N.N] +kubetest --extract=ci/latest --up # Deploy the latest ci build. +kubetest --extract=ci/latest-1.5 --up # Deploy the latest 1.5 CI build. +kubetest --extract=release/latest --up # Deploy the latest RC. +kubetest --extract=release/stable-1.5 --up # Deploy the 1.5 release. + +# A specific version: +kubetest --extract=v1.5.1 --up # Deploy 1.5.1 +kubetest --extract=v1.5.2-beta.0 --up # Deploy 1.5.2-beta.0 +kubetest --extract=gs://foo/bar --up # --stage=gs://foo/bar + +# Whatever GKE is using (gke, gke-staging, gke-test): +kubetest --extract=gke --up # Deploy whatever GKE prod uses + +# Using a GCI version: +kubetest --extract=gci/gci-canary --up # Deploy the version for next gci release +kubetest --extract=gci/gci-57 # Deploy the version bound to gci m57 +kubetest --extract=gci/gci-57/ci/latest # Deploy the latest CI build using gci m57 for the VM image + +# Reuse whatever is already built +kubetest --up # Most common. Note, no extract flag +kubetest --build --up # Most common. Note, no extract flag +kubetest --build --stage=gs://foo/bar --extract=local --up # Extract the staged version +``` + +### Bringing up a cluster for testing + +If you want, you may bring up a cluster in some other manner and run tests +against it. To do so, or to do other non-standard test things, you can pass +arguments into Ginkgo using `--test_args` (e.g. see above). For the purposes of +brevity, we will look at a subset of the options, which are listed below: + +``` +--ginkgo.dryRun=false: If set, ginkgo will walk the test hierarchy without +actually running anything. + +--ginkgo.failFast=false: If set, ginkgo will stop running a test suite after a +failure occurs. + +--ginkgo.failOnPending=false: If set, ginkgo will mark the test suite as failed +if any specs are pending. + +--ginkgo.focus="": If set, ginkgo will only run specs that match this regular +expression. + +--ginkgo.noColor="n": If set to "y", ginkgo will not use color in the output + +--ginkgo.skip="": If set, ginkgo will only run specs that do not match this +regular expression. + +--ginkgo.trace=false: If set, default reporter prints out the full stack trace +when a failure occurs + +--ginkgo.v=false: If set, default reporter print out all specs as they begin. + +--host="": The host, or api-server, to connect to + +--kubeconfig="": Path to kubeconfig containing embedded authinfo. + +--provider="": The name of the Kubernetes provider (gce, gke, local, vagrant, +etc.) + +--repo-root="../../": Root directory of kubernetes repository, for finding test +files. +``` + +Prior to running the tests, you may want to first create a simple auth file in +your home directory, e.g. `$HOME/.kube/config`, with the following: + +``` +{ + "User": "root", + "Password": "" +} +``` + +As mentioned earlier there are a host of other options that are available, but +they are left to the developer. + +**NOTE:** If you are running tests on a local cluster repeatedly, you may need +to periodically perform some manual cleanup: + + - `rm -rf /var/run/kubernetes`, clear kube generated credentials, sometimes +stale permissions can cause problems. + + - `sudo iptables -F`, clear ip tables rules left by the kube-proxy. + +### Reproducing failures in flaky tests +You can run a test repeatedly until it fails. This is useful when debugging +flaky tests. In order to do so, you need to set the following environment +variable: +```sh +$ export GINKGO_UNTIL_IT_FAILS=true +``` + +After setting the environment variable, you can run the tests as before. The e2e +script adds `--untilItFails=true` to ginkgo args if the environment variable is +set. The flags asks ginkgo to run the test repeatedly until it fails. + +### Federation e2e tests + +By default, `e2e.go` provisions a single Kubernetes cluster, and any `Feature:Federation` ginkgo tests will be skipped. + +Federation e2e testing involve bringing up multiple "underlying" Kubernetes clusters, +and deploying the federation control plane as a Kubernetes application on the underlying clusters. + +The federation e2e tests are still managed via `e2e.go`, but require some extra configuration items. + +#### Configuring federation e2e tests + +The following environment variables will enable federation e2e building, provisioning and testing. + +```sh +$ export FEDERATION=true +$ export E2E_ZONES="us-central1-a us-central1-b us-central1-f" +``` + +A Kubernetes cluster will be provisioned in each zone listed in `E2E_ZONES`. A zone can only appear once in the `E2E_ZONES` list. + +#### Image Push Repository + +Next, specify the docker repository where your ci images will be pushed. + +* **If `--provider=gce` or `--provider=gke`**: + + If you use the same GCP project where you to run the e2e tests as the container image repository, + FEDERATION_PUSH_REPO_BASE environment variable will be defaulted to "gcr.io/${DEFAULT_GCP_PROJECT_NAME}". + You can skip ahead to the **Build** section. + + You can simply set your push repo base based on your project name, and the necessary repositories will be + auto-created when you first push your container images. + + ```sh + $ export FEDERATION_PUSH_REPO_BASE="gcr.io/${GCE_PROJECT_NAME}" + ``` + + Skip ahead to the **Build** section. + +* **For all other providers**: + + You'll be responsible for creating and managing access to the repositories manually. + + ```sh + $ export FEDERATION_PUSH_REPO_BASE="quay.io/colin_hom" + ``` + + Given this example, the `federation-apiserver` container image will be pushed to the repository + `quay.io/colin_hom/federation-apiserver`. + + The docker client on the machine running `e2e.go` must have push access for the following pre-existing repositories: + + * `${FEDERATION_PUSH_REPO_BASE}/federation-apiserver` + * `${FEDERATION_PUSH_REPO_BASE}/federation-controller-manager` + + These repositories must allow public read access, as the e2e node docker daemons will not have any credentials. If you're using + GCE/GKE as your provider, the repositories will have read-access by default. + +#### Build + +* Compile the binaries and build container images: + + ```sh + $ KUBE_RELEASE_RUN_TESTS=n KUBE_FASTBUILD=true kubetest -build + ``` + +* Push the federation container images + + ```sh + $ federation/develop/push-federation-images.sh + ``` + +#### Deploy federation control plane + +The following command will create the underlying Kubernetes clusters in each of `E2E_ZONES`, and then provision the +federation control plane in the cluster occupying the last zone in the `E2E_ZONES` list. + +```sh +$ kubetest --up +``` + +#### Run the Tests + +This will run only the `Feature:Federation` e2e tests. You can omit the `ginkgo.focus` argument to run the entire e2e suite. + +```sh +$ kubetest --test --test_args="--ginkgo.focus=\[Feature:Federation\]" +``` + +#### Teardown + +```sh +$ kubetest --down +``` + +#### Shortcuts for test developers + +* To speed up `--up`, provision a single-node kubernetes cluster in a single e2e zone: + + `NUM_NODES=1 E2E_ZONES="us-central1-f"` + + Keep in mind that some tests may require multiple underlying clusters and/or minimum compute resource availability. + +* If you're hacking around with the federation control plane deployment itself, + you can quickly re-deploy the federation control plane Kubernetes manifests without tearing any resources down. + To re-deploy the federation control plane after running `--up` for the first time: + + ```sh + $ federation/cluster/federation-up.sh + ``` + +### Debugging clusters + +If a cluster fails to initialize, or you'd like to better understand cluster +state to debug a failed e2e test, you can use the `cluster/log-dump.sh` script +to gather logs. + +This script requires that the cluster provider supports ssh. Assuming it does, +running: + +```sh +$ federation/cluster/log-dump.sh <directory> +``` + +will ssh to the master and all nodes and download a variety of useful logs to +the provided directory (which should already exist). + +The Google-run Jenkins builds automatically collected these logs for every +build, saving them in the `artifacts` directory uploaded to GCS. + +### Local clusters + +It can be much faster to iterate on a local cluster instead of a cloud-based +one. To start a local cluster, you can run: + +```sh +# The PATH construction is needed because PATH is one of the special-cased +# environment variables not passed by sudo -E +sudo PATH=$PATH hack/local-up-cluster.sh +``` + +This will start a single-node Kubernetes cluster than runs pods using the local +docker daemon. Press Control-C to stop the cluster. + +You can generate a valid kubeconfig file by following instructions printed at the +end of aforementioned script. + +#### Testing against local clusters + +In order to run an E2E test against a locally running cluster, first make sure +to have a local build of the tests: + +```sh +kubetest --build +``` + +Then point the tests at a custom host directly: + +```sh +export KUBECONFIG=/path/to/kubeconfig +kubetest --provider=local --test +``` + +To control the tests that are run: + +```sh +kubetest --provider=local --test --test_args="--ginkgo.focus=Secrets" +``` + +You will also likely need to specify `minStartupPods` to match the number of +nodes in your cluster. If you're testing against a cluster set up by +`local-up-cluster.sh`, you will need to do the following: + +```sh +kubetest --provider=local --test --test_args="--minStartupPods=1 --ginkgo.focus=Secrets" +``` + +### Version-skewed and upgrade testing + +We run version-skewed tests to check that newer versions of Kubernetes work +similarly enough to older versions. The general strategy is to cover the following cases: + +1. One version of `kubectl` with another version of the cluster and tests (e.g. + that v1.2 and v1.4 `kubectl` doesn't break v1.3 tests running against a v1.3 + cluster). +1. A newer version of the Kubernetes master with older nodes and tests (e.g. + that upgrading a master to v1.3 with nodes at v1.2 still passes v1.2 tests). +1. A newer version of the whole cluster with older tests (e.g. that a cluster + upgraded---master and nodes---to v1.3 still passes v1.2 tests). +1. That an upgraded cluster functions the same as a brand-new cluster of the + same version (e.g. a cluster upgraded to v1.3 passes the same v1.3 tests as + a newly-created v1.3 cluster). + +[kubetest](https://git.k8s.io/test-infra/kubetest) is +the authoritative source on how to run version-skewed tests, but below is a +quick-and-dirty tutorial. + +```sh +# Assume you have two copies of the Kubernetes repository checked out, at +# ./kubernetes and ./kubernetes_old + +# If using GKE: +export CLUSTER_API_VERSION=${OLD_VERSION} + +# Deploy a cluster at the old version; see above for more details +cd ./kubernetes_old +kubetest --up + +# Upgrade the cluster to the new version +# +# If using GKE, add --upgrade-target=${NEW_VERSION} +# +# You can target Feature:MasterUpgrade or Feature:ClusterUpgrade +cd ../kubernetes +kubetest --provider=gke --test --check-version-skew=false --test_args="--ginkgo.focus=\[Feature:MasterUpgrade\]" + +# Run old tests with new kubectl +cd ../kubernetes_old +kubetest --provider=gke --test --test_args="--kubectl-path=$(pwd)/../kubernetes/cluster/kubectl.sh" +``` + +If you are just testing version-skew, you may want to just deploy at one +version and then test at another version, instead of going through the whole +upgrade process: + +```sh +# With the same setup as above + +# Deploy a cluster at the new version +cd ./kubernetes +kubetest --up + +# Run new tests with old kubectl +kubetest --test --test_args="--kubectl-path=$(pwd)/../kubernetes_old/cluster/kubectl.sh" + +# Run old tests with new kubectl +cd ../kubernetes_old +kubetest --test --test_args="--kubectl-path=$(pwd)/../kubernetes/cluster/kubectl.sh" +``` + +#### Test jobs naming convention + +**Version skew tests** are named as +`<cloud-provider>-<master&node-version>-<kubectl-version>-<image-name>-kubectl-skew` +e.g: `gke-1.5-1.6-cvm-kubectl-skew` means cloud provider is GKE; +master and nodes are built from `release-1.5` branch; +`kubectl` is built from `release-1.6` branch; +image name is cvm (container_vm). +The test suite is always the older one in version skew tests. e.g. from release-1.5 in this case. + +**Upgrade tests**: + +If a test job name ends with `upgrade-cluster`, it means we first upgrade +the cluster (i.e. master and nodes) and then run the old test suite with new kubectl. + +If a test job name ends with `upgrade-cluster-new`, it means we first upgrade +the cluster (i.e. master and nodes) and then run the new test suite with new kubectl. + +If a test job name ends with `upgrade-master`, it means we first upgrade +the master and keep the nodes in old version and then run the old test suite with new kubectl. + +There are some examples in the table, +where `->` means upgrading; container_vm (cvm) and gci are image names. + +| test name | test suite | master version (image) | node version (image) | kubectl +| --------- | :--------: | :----: | :---:| :---: +| gce-1.5-1.6-upgrade-cluster | 1.5 | 1.5->1.6 | 1.5->1.6 | 1.6 +| gce-1.5-1.6-upgrade-cluster-new | 1.6 | 1.5->1.6 | 1.5->1.6 | 1.6 +| gce-1.5-1.6-upgrade-master | 1.5 | 1.5->1.6 | 1.5 | 1.6 +| gke-container_vm-1.5-container_vm-1.6-upgrade-cluster | 1.5 | 1.5->1.6 (cvm) | 1.5->1.6 (cvm) | 1.6 +| gke-gci-1.5-container_vm-1.6-upgrade-cluster-new | 1.6 | 1.5->1.6 (gci) | 1.5->1.6 (cvm) | 1.6 +| gke-gci-1.5-container_vm-1.6-upgrade-master | 1.5 | 1.5->1.6 (gci) | 1.5 (cvm) | 1.6 + +## Kinds of tests + +We are working on implementing clearer partitioning of our e2e tests to make +running a known set of tests easier (#10548). Tests can be labeled with any of +the following labels, in order of increasing precedence (that is, each label +listed below supersedes the previous ones): + + - If a test has no labels, it is expected to run fast (under five minutes), be +able to be run in parallel, and be consistent. + + - `[Slow]`: If a test takes more than five minutes to run (by itself or in +parallel with many other tests), it is labeled `[Slow]`. This partition allows +us to run almost all of our tests quickly in parallel, without waiting for the +stragglers to finish. + + - `[Serial]`: If a test cannot be run in parallel with other tests (e.g. it +takes too many resources or restarts nodes), it is labeled `[Serial]`, and +should be run in serial as part of a separate suite. + + - `[Disruptive]`: If a test restarts components that might cause other tests +to fail or break the cluster completely, it is labeled `[Disruptive]`. Any +`[Disruptive]` test is also assumed to qualify for the `[Serial]` label, but +need not be labeled as both. These tests are not run against soak clusters to +avoid restarting components. + + - `[Flaky]`: If a test is found to be flaky and we have decided that it's too +hard to fix in the short term (e.g. it's going to take a full engineer-week), it +receives the `[Flaky]` label until it is fixed. The `[Flaky]` label should be +used very sparingly, and should be accompanied with a reference to the issue for +de-flaking the test, because while a test remains labeled `[Flaky]`, it is not +monitored closely in CI. `[Flaky]` tests are by default not run, unless a +`focus` or `skip` argument is explicitly given. + + - `[Feature:.+]`: If a test has non-default requirements to run or targets +some non-core functionality, and thus should not be run as part of the standard +suite, it receives a `[Feature:.+]` label, e.g. `[Feature:Performance]` or +`[Feature:Ingress]`. `[Feature:.+]` tests are not run in our core suites, +instead running in custom suites. If a feature is experimental or alpha and is +not enabled by default due to being incomplete or potentially subject to +breaking changes, it does *not* block PR merges, and thus should run in +some separate test suites owned by the feature owner(s) +(see [Continuous Integration](#continuous-integration) below). + + - `[Conformance]`: Designate that this test is included in the Conformance +test suite for [Conformance Testing](conformance-tests.md). This test must +meet a number of [requirements](conformance-tests.md#conformance-test-requirements) +to be eligible for this tag. This tag does not supersed any other labels. + + - `[LinuxOnly]`: If a test is known to be using Linux-specific features +(e.g.: seLinuxOptions) or is unable to run on Windows nodes, it is labeled +`[LinuxOnly]`. When using Windows nodes, this tag should be added to the +`skip` argument. + + - The following tags are not considered to be exhaustively applied, but are +intended to further categorize existing `[Conformance]` tests, or tests that are +being considered as candidate for promotion to `[Conformance]` as we work to +refine requirements: + - `[Privileged]`: This is a test that requires privileged access + - `[Internet]`: This is a test that assumes access to the public internet + - `[Deprecated]`: This is a test that exercises a deprecated feature + - `[Alpha]`: This is a test that exercises an alpha feature + - `[Beta]`: This is a test that exercises a beta feature + +Every test should be owned by a [SIG](/sig-list.md), +and have a corresponding `[sig-<name>]` label. + +### Viper configuration and hierarchichal test parameters. + +The future of e2e test configuration idioms will be increasingly defined using viper, and decreasingly via flags. + +Flags in general fall apart once tests become sufficiently complicated. So, even if we could use another flag library, it wouldn't be ideal. + +To use viper, rather than flags, to configure your tests: + +- Just add "e2e.json" to the current directory you are in, and define parameters in it... i.e. `"kubeconfig":"/tmp/x"`. + +Note that advanced testing parameters, and hierarchichally defined parameters, are only defined in viper, to see what they are, you can dive into [TestContextType](https://git.k8s.io/kubernetes/test/e2e/framework/test_context.go). + +In time, it is our intent to add or autogenerate a sample viper configuration that includes all e2e parameters, to ship with kubernetes. + +### Conformance tests + +For more information on Conformance tests please see the [Conformance Testing](conformance-tests.md) + +## Continuous Integration + +A quick overview of how we run e2e CI on Kubernetes. + +### What is CI? + +We run a battery of [release-blocking jobs](https://k8s-testgrid.appspot.com/sig-release-master-blocking) +against `HEAD` of the master branch on a continuous basis, and block merges +via [Tide](https://git.k8s.io/test-infra/prow/cmd/tide) on a subset of those +tests if they fail. + +CI results can be found at [ci-test.k8s.io](http://ci-test.k8s.io), e.g. +[ci-test.k8s.io/kubernetes-e2e-gce/10594](http://ci-test.k8s.io/kubernetes-e2e-gce/10594). + +### What runs in CI? + +We run all default tests (those that aren't marked `[Flaky]` or `[Feature:.+]`) +against GCE and GKE. To minimize the time from regression-to-green-run, we +partition tests across different jobs: + + - `kubernetes-e2e-<provider>` runs all non-`[Slow]`, non-`[Serial]`, +non-`[Disruptive]`, non-`[Flaky]`, non-`[Feature:.+]` tests in parallel. + + - `kubernetes-e2e-<provider>-slow` runs all `[Slow]`, non-`[Serial]`, +non-`[Disruptive]`, non-`[Flaky]`, non-`[Feature:.+]` tests in parallel. + + - `kubernetes-e2e-<provider>-serial` runs all `[Serial]` and `[Disruptive]`, +non-`[Flaky]`, non-`[Feature:.+]` tests in serial. + +We also run non-default tests if the tests exercise general-availability ("GA") +features that require a special environment to run in, e.g. +`kubernetes-e2e-gce-scalability` and `kubernetes-kubemark-gce`, which test for +Kubernetes performance. + +#### Non-default tests + +Many `[Feature:.+]` tests we don't run in CI. These tests are for features that +are experimental (often in the `experimental` API), and aren't enabled by +default. + +### The PR-builder + +We also run a battery of tests against every PR before we merge it. These tests +are equivalent to `kubernetes-gce`: it runs all non-`[Slow]`, non-`[Serial]`, +non-`[Disruptive]`, non-`[Flaky]`, non-`[Feature:.+]` tests in parallel. These +tests are considered "smoke tests" to give a decent signal that the PR doesn't +break most functionality. Results for your PR can be found at +[pr-test.k8s.io](http://pr-test.k8s.io), e.g. +[pr-test.k8s.io/20354](http://pr-test.k8s.io/20354) for #20354. + +### Adding a test to CI + +As mentioned above, prior to adding a new test, it is a good idea to perform a +`-ginkgo.dryRun=true` on the system, in order to see if a behavior is already +being tested, or to determine if it may be possible to augment an existing set +of tests for a specific use case. + +If a behavior does not currently have coverage and a developer wishes to add a +new e2e test, navigate to the ./test/e2e directory and create a new test using +the existing suite as a guide. + +**NOTE:** To build/run with tests in a new directory within ./test/e2e, add the +directory to import list in ./test/e2e/e2e_test.go + +TODO(#20357): Create a self-documented example which has been disabled, but can +be copied to create new tests and outlines the capabilities and libraries used. + +When writing a test, consult #kinds-of-tests above to determine how your test +should be marked, (e.g. `[Slow]`, `[Serial]`; remember, by default we assume a +test can run in parallel with other tests!). + +When first adding a test it should *not* go straight into CI, because failures +block ordinary development. A test should only be added to CI after is has been +running in some non-CI suite long enough to establish a track record showing +that the test does not fail when run against *working* software. Note also that +tests running in CI are generally running on a well-loaded cluster, so must +contend for resources; see above about [kinds of tests](#kinds_of_tests). + +Generally, a feature starts as `experimental`, and will be run in some suite +owned by the team developing the feature. If a feature is in beta or GA, it +*should* block PR merges and releases. In moving from experimental to beta or GA, tests +that are expected to pass by default should simply remove the `[Feature:.+]` +label, and will be incorporated into our core suites. If tests are not expected +to pass by default, (e.g. they require a special environment such as added +quota,) they should remain with the `[Feature:.+]` label. + +Occasionally, we'll want to add tests to better exercise features that are +already GA. These tests also shouldn't go straight to CI. They should begin by +being marked as `[Flaky]` to be run outside of CI, and once a track-record for +them is established, they may be promoted out of `[Flaky]`. + +### Moving a test out of CI + +If we have determined that a test is known-flaky and cannot be fixed in the +short-term, we may move it out of CI indefinitely. This move should be used +sparingly, as it effectively means that we have no coverage of that test. When a +test is demoted, it should be marked `[Flaky]` with a comment accompanying the +label with a reference to an issue opened to fix the test. + +## Performance Evaluation + +Another benefit of the e2e tests is the ability to create reproducible loads on +the system, which can then be used to determine the responsiveness, or analyze +other characteristics of the system. For example, the density tests load the +system to 30,50,100 pods per/node and measures the different characteristics of +the system, such as throughput, api-latency, etc. + +For a good overview of how we analyze performance data, please read the +following [post](https://kubernetes.io/blog/2015/09/kubernetes-performance-measurements-and/) + +For developers who are interested in doing their own performance analysis, we +recommend setting up [prometheus](http://prometheus.io/) for data collection, +and using [grafana](https://prometheus.io/docs/visualization/grafana/) to +visualize the data. There also exists the option of pushing your own metrics in +from the tests using a +[prom-push-gateway](http://prometheus.io/docs/instrumenting/pushing/). +Containers for all of these components can be found +[here](https://hub.docker.com/u/prom/). + +For more accurate measurements, you may wish to set up prometheus external to +kubernetes in an environment where it can access the major system components +(api-server, controller-manager, scheduler). This is especially useful when +attempting to gather metrics in a load-balanced api-server environment, because +all api-servers can be analyzed independently as well as collectively. On +startup, configuration file is passed to prometheus that specifies the endpoints +that prometheus will scrape, as well as the sampling interval. + +``` +#prometheus.conf +job: { + name: "kubernetes" + scrape_interval: "1s" + target_group: { + # apiserver(s) + target: "http://localhost:8080/metrics" + # scheduler + target: "http://localhost:10251/metrics" + # controller-manager + target: "http://localhost:10252/metrics" + } +} +``` + +Once prometheus is scraping the kubernetes endpoints, that data can then be +plotted using promdash, and alerts can be created against the assortment of +metrics that kubernetes provides. + +## One More Thing + +You should also know the [testing conventions](../guide/coding-conventions.md#testing-conventions). + +**HAPPY TESTING!** diff --git a/contributors/devel/sig-testing/flaky-tests.md b/contributors/devel/sig-testing/flaky-tests.md new file mode 100644 index 00000000..14302592 --- /dev/null +++ b/contributors/devel/sig-testing/flaky-tests.md @@ -0,0 +1,201 @@ +# Flaky tests + +Any test that fails occasionally is "flaky". Since our merges only proceed when +all tests are green, and we have a number of different CI systems running the +tests in various combinations, even a small percentage of flakes results in a +lot of pain for people waiting for their PRs to merge. + +Therefore, it's very important that we write tests defensively. Situations that +"almost never happen" happen with some regularity when run thousands of times in +resource-constrained environments. Since flakes can often be quite hard to +reproduce while still being common enough to block merges occasionally, it's +additionally important that the test logs be useful for narrowing down exactly +what caused the failure. + +Note that flakes can occur in unit tests, integration tests, or end-to-end +tests, but probably occur most commonly in end-to-end tests. + +## Hunting Flakes + +You may notice lots of your PRs or ones you watch are having a common +pre-submit failure, but less frequent issues that are still of concern take +more analysis over time. There are metrics recorded and viewable in: +- [TestGrid](https://k8s-testgrid.appspot.com/presubmits-kubernetes-blocking#Summary) +- [Velodrome](http://velodrome.k8s.io/dashboard/db/bigquery-metrics?orgId=1) + +It is worth noting tests are going to fail in presubmit a lot due +to unbuildable code, but that wont happen as much on the same commit unless +there's a true issue in the code or a broader problem like a dep failed to +pull in. + +## Filing issues for flaky tests + +Because flakes may be rare, it's very important that all relevant logs be +discoverable from the issue. + +1. Search for the test name. If you find an open issue and you're 90% sure the + flake is exactly the same, add a comment instead of making a new issue. +2. If you make a new issue, you should title it with the test name, prefixed by + "e2e/unit/integration flake:" (whichever is appropriate) +3. Reference any old issues you found in step one. Also, make a comment in the + old issue referencing your new issue, because people monitoring only their + email do not see the backlinks github adds. Alternatively, tag the person or + people who most recently worked on it. +4. Paste, in block quotes, the entire log of the individual failing test, not + just the failure line. +5. Link to durable storage with the rest of the logs. This means (for all the + tests that Google runs) the GCS link is mandatory! The Jenkins test result + link is nice but strictly optional: not only does it expire more quickly, + it's not accessible to non-Googlers. + +## Finding failed flaky test cases + +Find flaky tests issues on GitHub under the [kind/flake issue label][flake]. +There are significant numbers of flaky tests reported on a regular basis and P2 +flakes are under-investigated. Fixing flakes is a quick way to gain expertise +and community goodwill. + +[flake]: https://github.com/kubernetes/kubernetes/issues?q=is%3Aopen+is%3Aissue+label%3Akind%2Fflake + +## Expectations when a flaky test is assigned to you + +Note that we won't randomly assign these issues to you unless you've opted in or +you're part of a group that has opted in. We are more than happy to accept help +from anyone in fixing these, but due to the severity of the problem when merges +are blocked, we need reasonably quick turn-around time on test flakes. Therefore +we have the following guidelines: + +1. If a flaky test is assigned to you, it's more important than anything else + you're doing unless you can get a special dispensation (in which case it will + be reassigned). If you have too many flaky tests assigned to you, or you + have such a dispensation, then it's *still* your responsibility to find new + owners (this may just mean giving stuff back to the relevant Team or SIG Lead). +2. You should make a reasonable effort to reproduce it. Somewhere between an + hour and half a day of concentrated effort is "reasonable". It is perfectly + reasonable to ask for help! +3. If you can reproduce it (or it's obvious from the logs what happened), you + should then be able to fix it, or in the case where someone is clearly more + qualified to fix it, reassign it with very clear instructions. +4. Once you have made a change that you believe fixes a flake, it is conservative + to keep the issue for the flake open and see if it manifests again after the + change is merged. +5. If you can't reproduce a flake: __don't just close it!__ Every time a flake comes + back, at least 2 hours of merge time is wasted. So we need to make monotonic + progress towards narrowing it down every time a flake occurs. If you can't + figure it out from the logs, add log messages that would have help you figure + it out. If you make changes to make a flake more reproducible, please link + your pull request to the flake you're working on. +6. If a flake has been open, could not be reproduced, and has not manifested in + 3 months, it is reasonable to close the flake issue with a note saying + why. + +# Reproducing unit test flakes + +Try the [stress command](https://godoc.org/golang.org/x/tools/cmd/stress). + +Just + +``` +$ go install golang.org/x/tools/cmd/stress +``` + +Then build your test binary + +``` +$ go test -c -race +``` + +Then run it under stress + +``` +$ stress ./package.test -test.run=FlakyTest +``` + +It runs the command and writes output to `/tmp/gostress-*` files when it fails. +It periodically reports with run counts. Be careful with tests that use the +`net/http/httptest` package; they could exhaust the available ports on your +system! + +# Hunting flaky unit tests in Kubernetes + +Sometimes unit tests are flaky. This means that due to (usually) race +conditions, they will occasionally fail, even though most of the time they pass. + +We have a goal of 99.9% flake free tests. This means that there is only one +flake in one thousand runs of a test. + +Running a test 1000 times on your own machine can be tedious and time consuming. +Fortunately, there is a better way to achieve this using Kubernetes. + +_Note: these instructions are mildly hacky for now, as we get run once semantics +and logging they will get better_ + +There is a testing image `brendanburns/flake` up on the docker hub. We will use +this image to test our fix. + +Create a replication controller with the following config: + +```yaml +apiVersion: v1 +kind: ReplicationController +metadata: + name: flakecontroller +spec: + replicas: 24 + template: + metadata: + labels: + name: flake + spec: + containers: + - name: flake + image: brendanburns/flake + env: + - name: TEST_PACKAGE + value: pkg/tools + - name: REPO_SPEC + value: https://github.com/kubernetes/kubernetes +``` + +Note that we omit the labels and the selector fields of the replication +controller, because they will be populated from the labels field of the pod +template by default. + +```sh +kubectl create -f ./controller.yaml +``` + +This will spin up 24 instances of the test. They will run to completion, then +exit, and the kubelet will restart them, accumulating more and more runs of the +test. + +You can examine the recent runs of the test by calling `docker ps -a` and +looking for tasks that exited with non-zero exit codes. Unfortunately, docker +ps -a only keeps around the exit status of the last 15-20 containers with the +same image, so you have to check them frequently. + +You can use this script to automate checking for failures, assuming your cluster +is running on GCE and has four nodes: + +```sh +echo "" > output.txt +for i in {1..4}; do + echo "Checking kubernetes-node-${i}" + echo "kubernetes-node-${i}:" >> output.txt + gcloud compute ssh "kubernetes-node-${i}" --command="sudo docker ps -a" >> output.txt +done +grep "Exited ([^0])" output.txt +``` + +Eventually you will have sufficient runs for your purposes. At that point you +can delete the replication controller by running: + +```sh +kubectl delete replicationcontroller flakecontroller +``` + +If you do a final check for flakes with `docker ps -a`, ignore tasks that +exited -1, since that's what happens when you stop the replication controller. + +Happy flake hunting! + diff --git a/contributors/devel/sig-testing/gubernator.md b/contributors/devel/sig-testing/gubernator.md new file mode 100644 index 00000000..b03d11a1 --- /dev/null +++ b/contributors/devel/sig-testing/gubernator.md @@ -0,0 +1,136 @@ +# Gubernator + +*This document is oriented at developers who want to use Gubernator to debug while developing for Kubernetes.* + + +- [Gubernator](#gubernator) + - [What is Gubernator?](#what-is-gubernator) + - [Gubernator Features](#gubernator-features) + - [Test Failures list](#test-failures-list) + - [Log Filtering](#log-filtering) + - [Gubernator for Local Tests](#gubernator-for-local-tests) + - [Future Work](#future-work) + + +## What is Gubernator? + +[Gubernator](https://k8s-gubernator.appspot.com/) is a webpage for viewing and filtering Kubernetes +test results. + +Gubernator simplifies the debugging process and makes it easier to track down failures by automating many +steps commonly taken in searching through logs, and by offering tools to filter through logs to find relevant lines. +Gubernator automates the steps of finding the failed tests, displaying relevant logs, and determining the +failed pods and the corresponding pod UID, namespace, and container ID. +It also allows for filtering of the log files to display relevant lines based on selected keywords, and +allows for multiple logs to be woven together by timestamp. + +Gubernator runs on Google App Engine and fetches logs stored on Google Cloud Storage. + +## Gubernator Features + +### Test Failures list + +Comments made by k8s-ci-robot will post a link to a page listing the failed tests. +Each failed test comes with the corresponding error log from a junit file and a link +to filter logs for that test. + +Based on the message logged in the junit file, the pod name may be displayed. + + + +[Test Failures List Example](https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/kubernetes-e2e-gke/11721) + +### Log Filtering + +The log filtering page comes with checkboxes and textboxes to aid in filtering. Filtered keywords will be bolded +and lines including keywords will be highlighted. Up to four lines around the line of interest will also be displayed. + + + +If less than 100 lines are skipped, the "... skipping xx lines ..." message can be clicked to expand and show +the hidden lines. + +Before expansion: + +After expansion: + + +If the pod name was displayed in the Test Failures list, it will automatically be included in the filters. +If it is not found in the error message, it can be manually entered into the textbox. Once a pod name +is entered, the Pod UID, Namespace, and ContainerID may be automatically filled in as well. These can be +altered as well. To apply the filter, check off the options corresponding to the filter. + + + +To add a filter, type the term to be filtered into the textbox labeled "Add filter:" and press enter. +Additional filters will be displayed as checkboxes under the textbox. + + + +To choose which logs to view check off the checkboxes corresponding to the logs of interest. If multiple logs are +included, the "Weave by timestamp" option can weave the selected logs together based on the timestamp in each line. + + + +[Log Filtering Example 1](https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/kubelet-gce-e2e-ci/5535/nodelog?pod=pod-configmaps-b5b876cb-3e1e-11e6-8956-42010af0001d&junit=junit_03.xml&wrap=on&logfiles=%2Fkubernetes-jenkins%2Flogs%2Fkubelet-gce-e2e-ci%2F5535%2Fartifacts%2Ftmp-node-e2e-7a5a3b40-e2e-node-coreos-stable20160622-image%2Fkube-apiserver.log&logfiles=%2Fkubernetes-jenkins%2Flogs%2Fkubelet-gce-e2e-ci%2F5535%2Fartifacts%2Ftmp-node-e2e-7a5a3b40-e2e-node-coreos-stable20160622-image%2Fkubelet.log&UID=on&poduid=b5b8a59e-3e1e-11e6-b358-42010af0001d&ns=e2e-tests-configmap-oi12h&cID=tmp-node-e2e-7a5a3b40-e2e-node-coreos-stable20160622-image) + +[Log Filtering Example 2](https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/kubernetes-e2e-gke/11721/nodelog?pod=client-containers-a53f813c-503e-11e6-88dd-0242ac110003&junit=junit_19.xml&wrap=on) + + +### Gubernator for Local Tests + +*Currently Gubernator can only be used with remote node e2e tests.* + +**NOTE: Using Gubernator with local tests will publicly upload your test logs to Google Cloud Storage** + +To use Gubernator to view logs from local test runs, set the GUBERNATOR tag to true. +A URL link to view the test results will be printed to the console. +Please note that running with the Gubernator tag will bypass the user confirmation for uploading to GCS. + +```console + +$ make test-e2e-node REMOTE=true GUBERNATOR=true +... +================================================================ +Running gubernator.sh + +Gubernator linked below: +k8s-gubernator.appspot.com/build/yourusername-g8r-logs/logs/e2e-node/timestamp +``` + +The gubernator.sh script can be run after running a remote node e2e test for the same effect. + +```console +$ ./test/e2e_node/gubernator.sh +Do you want to run gubernator.sh and upload logs publicly to GCS? [y/n]y +... +Gubernator linked below: +k8s-gubernator.appspot.com/build/yourusername-g8r-logs/logs/e2e-node/timestamp +``` + +## Future Work + +Gubernator provides a framework for debugging failures and introduces useful features. +There is still a lot of room for more features and growth to make the debugging process more efficient. + +How to contribute (see https://git.k8s.io/test-infra/gubernator/README.md) + +* Extend GUBERNATOR flag to all local tests + +* More accurate identification of pod name, container ID, etc. + * Change content of logged strings for failures to include more information + * Better regex in Gubernator + +* Automate discovery of more keywords + * Volume Name + * Disk Name + * Pod IP + +* Clickable API objects in the displayed lines in order to add them as filters + +* Construct story of pod's lifetime + * Have concise view of what a pod went through from when pod was started to failure + +* Improve UI + * Have separate folders of logs in rows instead of in one long column + * Improve interface for adding additional features (maybe instead of textbox and checkbox, have chips) diff --git a/contributors/devel/sig-testing/testing.md b/contributors/devel/sig-testing/testing.md new file mode 100644 index 00000000..60f83b53 --- /dev/null +++ b/contributors/devel/sig-testing/testing.md @@ -0,0 +1,227 @@ +# Testing guide + +**Table of Contents** + +- [Testing guide](#testing-guide) + - [Unit tests](#unit-tests) + - [Run all unit tests](#run-all-unit-tests) + - [Set go flags during unit tests](#set-go-flags-during-unit-tests) + - [Run unit tests from certain packages](#run-unit-tests-from-certain-packages) + - [Run specific unit test cases in a package](#run-specific-unit-test-cases-in-a-package) + - [Stress running unit tests](#stress-running-unit-tests) + - [Unit test coverage](#unit-test-coverage) + - [Benchmark unit tests](#benchmark-unit-tests) + - [Integration tests](#integration-tests) + - [Install etcd dependency](#install-etcd-dependency) + - [Etcd test data](#etcd-test-data) + - [Run integration tests](#run-integration-tests) + - [Run a specific integration test](#run-a-specific-integration-test) + - [End-to-End tests](#end-to-end-tests) + + +This assumes you already read the [development guide](development.md) to +install go, godeps, and configure your git client. All command examples are +relative to the `kubernetes` root directory. + +Before sending pull requests you should at least make sure your changes have +passed both unit and integration tests. + +Kubernetes only merges pull requests when unit, integration, and e2e tests are +passing, so it is often a good idea to make sure the e2e tests work as well. + +## Unit tests + +* Unit tests should be fully hermetic + - Only access resources in the test binary. +* All packages and any significant files require unit tests. +* The preferred method of testing multiple scenarios or input is + [table driven testing](https://github.com/golang/go/wiki/TableDrivenTests) + - Example: [TestNamespaceAuthorization](https://git.k8s.io/kubernetes/test/integration/auth/auth_test.go) +* Unit tests must pass on macOS and Windows platforms. + - Tests using linux-specific features must be skipped or compiled out. + - Skipped is better, compiled out is required when it won't compile. +* Concurrent unit test runs must pass. +* See [coding conventions](../guide/coding-conventions.md). + +### Run all unit tests + +`make test` is the entrypoint for running the unit tests that ensures that +`GOPATH` is set up correctly. If you have `GOPATH` set up correctly, you can +also just use `go test` directly. + +```sh +cd kubernetes +make test # Run all unit tests. +``` + +If any unit test fails with a timeout panic (see [#1594](https://github.com/kubernetes/community/issues/1594)) on the testing package, you can increase the `KUBE_TIMEOUT` value as shown below. + +```sh +make test KUBE_TIMEOUT="-timeout 300s" +``` + +### Set go flags during unit tests + +You can set [go flags](https://golang.org/cmd/go/) by setting the +`GOFLAGS` environment variable. + +### Run unit tests from certain packages + +`make test` accepts packages as arguments; the `k8s.io/kubernetes` prefix is +added automatically to these: + +```sh +make test WHAT=./pkg/api # run tests for pkg/api +``` + +To run multiple targets you need quotes: + +```sh +make test WHAT="./pkg/api ./pkg/kubelet" # run tests for pkg/api and pkg/kubelet +``` + +In a shell, it's often handy to use brace expansion: + +```sh +make test WHAT=./pkg/{api,kubelet} # run tests for pkg/api and pkg/kubelet +``` + +### Run specific unit test cases in a package + +You can set the test args using the `KUBE_TEST_ARGS` environment variable. +You can use this to pass the `-run` argument to `go test`, which accepts a +regular expression for the name of the test that should be run. + +```sh +# Runs TestValidatePod in pkg/api/validation with the verbose flag set +make test WHAT=./pkg/api/validation GOFLAGS="-v" KUBE_TEST_ARGS='-run ^TestValidatePod$' + +# Runs tests that match the regex ValidatePod|ValidateConfigMap in pkg/api/validation +make test WHAT=./pkg/api/validation GOFLAGS="-v" KUBE_TEST_ARGS="-run ValidatePod\|ValidateConfigMap$" +``` + +For other supported test flags, see the [golang +documentation](https://golang.org/cmd/go/#hdr-Testing_flags). + +### Stress running unit tests + +Running the same tests repeatedly is one way to root out flakes. +You can do this efficiently. + +```sh +# Have 2 workers run all tests 5 times each (10 total iterations). +make test PARALLEL=2 ITERATION=5 +``` + +For more advanced ideas please see [flaky-tests.md](flaky-tests.md). + +### Unit test coverage + +Currently, collecting coverage is only supported for the Go unit tests. + +To run all unit tests and generate an HTML coverage report, run the following: + +```sh +make test KUBE_COVER=y +``` + +At the end of the run, an HTML report will be generated with the path +printed to stdout. + +To run tests and collect coverage in only one package, pass its relative path +under the `kubernetes` directory as an argument, for example: + +```sh +make test WHAT=./pkg/kubectl KUBE_COVER=y +``` + +Multiple arguments can be passed, in which case the coverage results will be +combined for all tests run. + +### Benchmark unit tests + +To run benchmark tests, you'll typically use something like: + +```sh +go test ./pkg/apiserver -benchmem -run=XXX -bench=BenchmarkWatch +``` + +This will do the following: + +1. `-run=XXX` is a regular expression filter on the name of test cases to run +2. `-bench=BenchmarkWatch` will run test methods with BenchmarkWatch in the name + * See `grep -nr BenchmarkWatch .` for examples +3. `-benchmem` enables memory allocation stats + +See `go help test` and `go help testflag` for additional info. + +## Integration tests + +* Integration tests should only access other resources on the local machine + - Most commonly etcd or a service listening on localhost. +* All significant features require integration tests. + - This includes kubectl commands +* The preferred method of testing multiple scenarios or inputs +is [table driven testing](https://github.com/golang/go/wiki/TableDrivenTests) + - Example: [TestNamespaceAuthorization](https://git.k8s.io/kubernetes/test/integration/auth/auth_test.go) +* Each test should create its own master, httpserver and config. + - Example: [TestPodUpdateActiveDeadlineSeconds](https://git.k8s.io/kubernetes/test/integration/pods/pods_test.go) +* See [coding conventions](coding-conventions.md). + +### Install etcd dependency + +Kubernetes integration tests require your `PATH` to include an +[etcd](https://github.com/coreos/etcd/releases) installation. Kubernetes +includes a script to help install etcd on your machine. + +```sh +# Install etcd and add to PATH + +# Option a) install inside kubernetes root +hack/install-etcd.sh # Installs in ./third_party/etcd +echo export PATH="\$PATH:$(pwd)/third_party/etcd" >> ~/.profile # Add to PATH + +# Option b) install manually +grep -E "image.*etcd" cluster/gce/manifests/etcd.manifest # Find version +# Install that version using yum/apt-get/etc +echo export PATH="\$PATH:<LOCATION>" >> ~/.profile # Add to PATH +``` + +### Etcd test data + +Many tests start an etcd server internally, storing test data in the operating system's temporary directory. + +If you see test failures because the temporary directory does not have sufficient space, +or is on a volume with unpredictable write latency, you can override the test data directory +for those internal etcd instances with the `TEST_ETCD_DIR` environment variable. + +### Run integration tests + +The integration tests are run using `make test-integration`. +The Kubernetes integration tests are written using the normal golang testing +package but expect to have a running etcd instance to connect to. The `test-integration.sh` +script wraps `make test` and sets up an etcd instance for the integration tests to use. + +```sh +make test-integration # Run all integration tests. +``` + +This script runs the golang tests in package +[`test/integration`](https://git.k8s.io/kubernetes/test/integration). + +### Run a specific integration test + +You can also use the `KUBE_TEST_ARGS` environment variable with the `make test-integration` +to run a specific integration test case: + +```sh +# Run integration test TestPodUpdateActiveDeadlineSeconds with the verbose flag set. +make test-integration WHAT=./test/integration/pods GOFLAGS="-v" KUBE_TEST_ARGS="-run ^TestPodUpdateActiveDeadlineSeconds$" +``` + +If you set `KUBE_TEST_ARGS`, the test case will be run with only the `v1` API +version and the watch cache test is skipped. + +## End-to-End tests + +Please refer to [End-to-End Testing in Kubernetes](e2e-tests.md). diff --git a/contributors/devel/sig-testing/writing-good-e2e-tests.md b/contributors/devel/sig-testing/writing-good-e2e-tests.md new file mode 100644 index 00000000..836479c2 --- /dev/null +++ b/contributors/devel/sig-testing/writing-good-e2e-tests.md @@ -0,0 +1,231 @@ +# Writing good e2e tests for Kubernetes # + +## Patterns and Anti-Patterns ## + +### Goals of e2e tests ### + +Beyond the obvious goal of providing end-to-end system test coverage, +there are a few less obvious goals that you should bear in mind when +designing, writing and debugging your end-to-end tests. In +particular, "flaky" tests, which pass most of the time but fail +intermittently for difficult-to-diagnose reasons are extremely costly +in terms of blurring our regression signals and slowing down our +automated merge velocity. Up-front time and effort designing your test +to be reliable is very well spent. Bear in mind that we have hundreds +of tests, each running in dozens of different environments, and if any +test in any test environment fails, we have to assume that we +potentially have some sort of regression. So if a significant number +of tests fail even only 1% of the time, basic statistics dictates that +we will almost never have a "green" regression indicator. Stated +another way, writing a test that is only 99% reliable is just about +useless in the harsh reality of a CI environment. In fact it's worse +than useless, because not only does it not provide a reliable +regression indicator, but it also costs a lot of subsequent debugging +time, and delayed merges. + +#### Debuggability #### + +If your test fails, it should provide as detailed as possible reasons +for the failure in its output. "Timeout" is not a useful error +message. "Timed out after 60 seconds waiting for pod xxx to enter +running state, still in pending state" is much more useful to someone +trying to figure out why your test failed and what to do about it. +Specifically, +[assertion](https://onsi.github.io/gomega/#making-assertions) code +like the following generates rather useless errors: + +``` +Expect(err).NotTo(HaveOccurred()) +``` + +Rather +[annotate](https://onsi.github.io/gomega/#annotating-assertions) your assertion with something like this: + +``` +Expect(err).NotTo(HaveOccurred(), "Failed to create %d foobars, only created %d", foobarsReqd, foobarsCreated) +``` + +On the other hand, overly verbose logging, particularly of non-error conditions, can make +it unnecessarily difficult to figure out whether a test failed and if +so why? So don't log lots of irrelevant stuff either. + +#### Ability to run in non-dedicated test clusters #### + +To reduce end-to-end delay and improve resource utilization when +running e2e tests, we try, where possible, to run large numbers of +tests in parallel against the same test cluster. This means that: + +1. you should avoid making any assumption (implicit or explicit) that +your test is the only thing running against the cluster. For example, +making the assumption that your test can run a pod on every node in a +cluster is not a safe assumption, as some other tests, running at the +same time as yours, might have saturated one or more nodes in the +cluster. Similarly, running a pod in the system namespace, and +assuming that will increase the count of pods in the system +namespace by one is not safe, as some other test might be creating or +deleting pods in the system namespace at the same time as your test. +If you do legitimately need to write a test like that, make sure to +label it ["\[Serial\]"](e2e-tests.md#kinds-of-tests) so that it's easy +to identify, and not run in parallel with any other tests. +1. You should avoid doing things to the cluster that make it difficult +for other tests to reliably do what they're trying to do, at the same +time. For example, rebooting nodes, disconnecting network interfaces, +or upgrading cluster software as part of your test is likely to +violate the assumptions that other tests might have made about a +reasonably stable cluster environment. If you need to write such +tests, please label them as +["\[Disruptive\]"](e2e-tests.md#kinds-of-tests) so that it's easy to +identify them, and not run them in parallel with other tests. +1. You should avoid making assumptions about the Kubernetes API that +are not part of the API specification, as your tests will break as +soon as these assumptions become invalid. For example, relying on +specific Events, Event reasons or Event messages will make your tests +very brittle. + +#### Speed of execution #### + +We have hundreds of e2e tests, some of which we run in serial, one +after the other, in some cases. If each test takes just a few minutes +to run, that very quickly adds up to many, many hours of total +execution time. We try to keep such total execution time down to a +few tens of minutes at most. Therefore, try (very hard) to keep the +execution time of your individual tests below 2 minutes, ideally +shorter than that. Concretely, adding inappropriately long 'sleep' +statements or other gratuitous waits to tests is a killer. If under +normal circumstances your pod enters the running state within 10 +seconds, and 99.9% of the time within 30 seconds, it would be +gratuitous to wait 5 minutes for this to happen. Rather just fail +after 30 seconds, with a clear error message as to why your test +failed ("e.g. Pod x failed to become ready after 30 seconds, it +usually takes 10 seconds"). If you do have a truly legitimate reason +for waiting longer than that, or writing a test which takes longer +than 2 minutes to run, comment very clearly in the code why this is +necessary, and label the test as +["\[Slow\]"](e2e-tests.md#kinds-of-tests), so that it's easy to +identify and avoid in test runs that are required to complete +timeously (for example those that are run against every code +submission before it is allowed to be merged). +Note that completing within, say, 2 minutes only when the test +passes is not generally good enough. Your test should also fail in a +reasonable time. We have seen tests that, for example, wait up to 10 +minutes for each of several pods to become ready. Under good +conditions these tests might pass within a few seconds, but if the +pods never become ready (e.g. due to a system regression) they take a +very long time to fail and typically cause the entire test run to time +out, so that no results are produced. Again, this is a lot less +useful than a test that fails reliably within a minute or two when the +system is not working correctly. + +#### Resilience to relatively rare, temporary infrastructure glitches or delays #### + +Remember that your test will be run many thousands of +times, at different times of day and night, probably on different +cloud providers, under different load conditions. And often the +underlying state of these systems is stored in eventually consistent +data stores. So, for example, if a resource creation request is +theoretically asynchronous, even if you observe it to be practically +synchronous most of the time, write your test to assume that it's +asynchronous (e.g. make the "create" call, and poll or watch the +resource until it's in the correct state before proceeding). +Similarly, don't assume that API endpoints are 100% available. +They're not. Under high load conditions, API calls might temporarily +fail or time-out. In such cases it's appropriate to back off and retry +a few times before failing your test completely (in which case make +the error message very clear about what happened, e.g. "Retried +http://... 3 times - all failed with xxx". Use the standard +retry mechanisms provided in the libraries detailed below. + +### Some concrete tools at your disposal ### + +Obviously most of the above goals apply to many tests, not just yours. +So we've developed a set of reusable test infrastructure, libraries +and best practices to help you to do the right thing, or at least do +the same thing as other tests, so that if that turns out to be the +wrong thing, it can be fixed in one place, not hundreds, to be the +right thing. + +Here are a few pointers: + ++ [E2e Framework](https://git.k8s.io/kubernetes/test/e2e/framework/framework.go): + Familiarise yourself with this test framework and how to use it. + Amongst others, it automatically creates uniquely named namespaces + within which your tests can run to avoid name clashes, and reliably + automates cleaning up the mess after your test has completed (it + just deletes everything in the namespace). This helps to ensure + that tests do not leak resources. Note that deleting a namespace + (and by implication everything in it) is currently an expensive + operation. So the fewer resources you create, the less cleaning up + the framework needs to do, and the faster your test (and other + tests running concurrently with yours) will complete. Your tests + should always use this framework. Trying other home-grown + approaches to avoiding name clashes and resource leaks has proven + to be a very bad idea. ++ [E2e utils library](https://git.k8s.io/kubernetes/test/e2e/framework/util.go): + This handy library provides tons of reusable code for a host of + commonly needed test functionality, including waiting for resources + to enter specified states, safely and consistently retrying failed + operations, usefully reporting errors, and much more. Make sure + that you're familiar with what's available there, and use it. + Likewise, if you come across a generally useful mechanism that's + not yet implemented there, add it so that others can benefit from + your brilliance. In particular pay attention to the variety of + timeout and retry related constants at the top of that file. Always + try to reuse these constants rather than try to dream up your own + values. Even if the values there are not precisely what you would + like to use (timeout periods, retry counts etc), the benefit of + having them be consistent and centrally configurable across our + entire test suite typically outweighs your personal preferences. ++ **Follow the examples of stable, well-written tests:** Some of our + existing end-to-end tests are better written and more reliable than + others. A few examples of well-written tests include: + [Replication Controllers](https://git.k8s.io/kubernetes/test/e2e/apps/rc.go), + [Services](https://git.k8s.io/kubernetes/test/e2e/network/service.go), + [Reboot](https://git.k8s.io/kubernetes/test/e2e/lifecycle/reboot.go). ++ [Ginkgo Test Framework](https://github.com/onsi/ginkgo): This is the + test library and runner upon which our e2e tests are built. Before + you write or refactor a test, read the docs and make sure that you + understand how it works. In particular be aware that every test is + uniquely identified and described (e.g. in test reports) by the + concatenation of its `Describe` clause and nested `It` clauses. + So for example `Describe("Pods",...).... It(""should be scheduled + with cpu and memory limits")` produces a sane test identifier and + descriptor `Pods should be scheduled with cpu and memory limits`, + which makes it clear what's being tested, and hence what's not + working if it fails. Other good examples include: + +``` + CAdvisor should be healthy on every node +``` + +and + +``` + Daemon set should run and stop complex daemon +``` + + On the contrary +(these are real examples), the following are less good test +descriptors: + +``` + KubeProxy should test kube-proxy +``` + +and + +``` +Nodes [Disruptive] Network when a node becomes unreachable +[replication controller] recreates pods scheduled on the +unreachable node AND allows scheduling of pods on a node after +it rejoins the cluster +``` + +An improvement might be + +``` +Unreachable nodes are evacuated and then repopulated upon rejoining [Disruptive] +``` + +Note that opening issues for specific better tooling is welcome, and +code implementing that tooling is even more welcome :-). + diff --git a/contributors/devel/testing.md b/contributors/devel/testing.md index 60f83b53..5bb42eeb 100644 --- a/contributors/devel/testing.md +++ b/contributors/devel/testing.md @@ -1,227 +1,3 @@ -# Testing guide +This file has moved to https://git.k8s.io/community/contributors/devel/sig-testing/testing.md. -**Table of Contents** - -- [Testing guide](#testing-guide) - - [Unit tests](#unit-tests) - - [Run all unit tests](#run-all-unit-tests) - - [Set go flags during unit tests](#set-go-flags-during-unit-tests) - - [Run unit tests from certain packages](#run-unit-tests-from-certain-packages) - - [Run specific unit test cases in a package](#run-specific-unit-test-cases-in-a-package) - - [Stress running unit tests](#stress-running-unit-tests) - - [Unit test coverage](#unit-test-coverage) - - [Benchmark unit tests](#benchmark-unit-tests) - - [Integration tests](#integration-tests) - - [Install etcd dependency](#install-etcd-dependency) - - [Etcd test data](#etcd-test-data) - - [Run integration tests](#run-integration-tests) - - [Run a specific integration test](#run-a-specific-integration-test) - - [End-to-End tests](#end-to-end-tests) - - -This assumes you already read the [development guide](development.md) to -install go, godeps, and configure your git client. All command examples are -relative to the `kubernetes` root directory. - -Before sending pull requests you should at least make sure your changes have -passed both unit and integration tests. - -Kubernetes only merges pull requests when unit, integration, and e2e tests are -passing, so it is often a good idea to make sure the e2e tests work as well. - -## Unit tests - -* Unit tests should be fully hermetic - - Only access resources in the test binary. -* All packages and any significant files require unit tests. -* The preferred method of testing multiple scenarios or input is - [table driven testing](https://github.com/golang/go/wiki/TableDrivenTests) - - Example: [TestNamespaceAuthorization](https://git.k8s.io/kubernetes/test/integration/auth/auth_test.go) -* Unit tests must pass on macOS and Windows platforms. - - Tests using linux-specific features must be skipped or compiled out. - - Skipped is better, compiled out is required when it won't compile. -* Concurrent unit test runs must pass. -* See [coding conventions](../guide/coding-conventions.md). - -### Run all unit tests - -`make test` is the entrypoint for running the unit tests that ensures that -`GOPATH` is set up correctly. If you have `GOPATH` set up correctly, you can -also just use `go test` directly. - -```sh -cd kubernetes -make test # Run all unit tests. -``` - -If any unit test fails with a timeout panic (see [#1594](https://github.com/kubernetes/community/issues/1594)) on the testing package, you can increase the `KUBE_TIMEOUT` value as shown below. - -```sh -make test KUBE_TIMEOUT="-timeout 300s" -``` - -### Set go flags during unit tests - -You can set [go flags](https://golang.org/cmd/go/) by setting the -`GOFLAGS` environment variable. - -### Run unit tests from certain packages - -`make test` accepts packages as arguments; the `k8s.io/kubernetes` prefix is -added automatically to these: - -```sh -make test WHAT=./pkg/api # run tests for pkg/api -``` - -To run multiple targets you need quotes: - -```sh -make test WHAT="./pkg/api ./pkg/kubelet" # run tests for pkg/api and pkg/kubelet -``` - -In a shell, it's often handy to use brace expansion: - -```sh -make test WHAT=./pkg/{api,kubelet} # run tests for pkg/api and pkg/kubelet -``` - -### Run specific unit test cases in a package - -You can set the test args using the `KUBE_TEST_ARGS` environment variable. -You can use this to pass the `-run` argument to `go test`, which accepts a -regular expression for the name of the test that should be run. - -```sh -# Runs TestValidatePod in pkg/api/validation with the verbose flag set -make test WHAT=./pkg/api/validation GOFLAGS="-v" KUBE_TEST_ARGS='-run ^TestValidatePod$' - -# Runs tests that match the regex ValidatePod|ValidateConfigMap in pkg/api/validation -make test WHAT=./pkg/api/validation GOFLAGS="-v" KUBE_TEST_ARGS="-run ValidatePod\|ValidateConfigMap$" -``` - -For other supported test flags, see the [golang -documentation](https://golang.org/cmd/go/#hdr-Testing_flags). - -### Stress running unit tests - -Running the same tests repeatedly is one way to root out flakes. -You can do this efficiently. - -```sh -# Have 2 workers run all tests 5 times each (10 total iterations). -make test PARALLEL=2 ITERATION=5 -``` - -For more advanced ideas please see [flaky-tests.md](flaky-tests.md). - -### Unit test coverage - -Currently, collecting coverage is only supported for the Go unit tests. - -To run all unit tests and generate an HTML coverage report, run the following: - -```sh -make test KUBE_COVER=y -``` - -At the end of the run, an HTML report will be generated with the path -printed to stdout. - -To run tests and collect coverage in only one package, pass its relative path -under the `kubernetes` directory as an argument, for example: - -```sh -make test WHAT=./pkg/kubectl KUBE_COVER=y -``` - -Multiple arguments can be passed, in which case the coverage results will be -combined for all tests run. - -### Benchmark unit tests - -To run benchmark tests, you'll typically use something like: - -```sh -go test ./pkg/apiserver -benchmem -run=XXX -bench=BenchmarkWatch -``` - -This will do the following: - -1. `-run=XXX` is a regular expression filter on the name of test cases to run -2. `-bench=BenchmarkWatch` will run test methods with BenchmarkWatch in the name - * See `grep -nr BenchmarkWatch .` for examples -3. `-benchmem` enables memory allocation stats - -See `go help test` and `go help testflag` for additional info. - -## Integration tests - -* Integration tests should only access other resources on the local machine - - Most commonly etcd or a service listening on localhost. -* All significant features require integration tests. - - This includes kubectl commands -* The preferred method of testing multiple scenarios or inputs -is [table driven testing](https://github.com/golang/go/wiki/TableDrivenTests) - - Example: [TestNamespaceAuthorization](https://git.k8s.io/kubernetes/test/integration/auth/auth_test.go) -* Each test should create its own master, httpserver and config. - - Example: [TestPodUpdateActiveDeadlineSeconds](https://git.k8s.io/kubernetes/test/integration/pods/pods_test.go) -* See [coding conventions](coding-conventions.md). - -### Install etcd dependency - -Kubernetes integration tests require your `PATH` to include an -[etcd](https://github.com/coreos/etcd/releases) installation. Kubernetes -includes a script to help install etcd on your machine. - -```sh -# Install etcd and add to PATH - -# Option a) install inside kubernetes root -hack/install-etcd.sh # Installs in ./third_party/etcd -echo export PATH="\$PATH:$(pwd)/third_party/etcd" >> ~/.profile # Add to PATH - -# Option b) install manually -grep -E "image.*etcd" cluster/gce/manifests/etcd.manifest # Find version -# Install that version using yum/apt-get/etc -echo export PATH="\$PATH:<LOCATION>" >> ~/.profile # Add to PATH -``` - -### Etcd test data - -Many tests start an etcd server internally, storing test data in the operating system's temporary directory. - -If you see test failures because the temporary directory does not have sufficient space, -or is on a volume with unpredictable write latency, you can override the test data directory -for those internal etcd instances with the `TEST_ETCD_DIR` environment variable. - -### Run integration tests - -The integration tests are run using `make test-integration`. -The Kubernetes integration tests are written using the normal golang testing -package but expect to have a running etcd instance to connect to. The `test-integration.sh` -script wraps `make test` and sets up an etcd instance for the integration tests to use. - -```sh -make test-integration # Run all integration tests. -``` - -This script runs the golang tests in package -[`test/integration`](https://git.k8s.io/kubernetes/test/integration). - -### Run a specific integration test - -You can also use the `KUBE_TEST_ARGS` environment variable with the `make test-integration` -to run a specific integration test case: - -```sh -# Run integration test TestPodUpdateActiveDeadlineSeconds with the verbose flag set. -make test-integration WHAT=./test/integration/pods GOFLAGS="-v" KUBE_TEST_ARGS="-run ^TestPodUpdateActiveDeadlineSeconds$" -``` - -If you set `KUBE_TEST_ARGS`, the test case will be run with only the `v1` API -version and the watch cache test is skipped. - -## End-to-End tests - -Please refer to [End-to-End Testing in Kubernetes](e2e-tests.md). +This file is a placeholder to preserve links. Please remove by April 30, 2019 or the release of kubernetes 1.13, whichever comes first.
\ No newline at end of file diff --git a/contributors/devel/writing-good-e2e-tests.md b/contributors/devel/writing-good-e2e-tests.md index 836479c2..b39208eb 100644 --- a/contributors/devel/writing-good-e2e-tests.md +++ b/contributors/devel/writing-good-e2e-tests.md @@ -1,231 +1,3 @@ -# Writing good e2e tests for Kubernetes # - -## Patterns and Anti-Patterns ## - -### Goals of e2e tests ### - -Beyond the obvious goal of providing end-to-end system test coverage, -there are a few less obvious goals that you should bear in mind when -designing, writing and debugging your end-to-end tests. In -particular, "flaky" tests, which pass most of the time but fail -intermittently for difficult-to-diagnose reasons are extremely costly -in terms of blurring our regression signals and slowing down our -automated merge velocity. Up-front time and effort designing your test -to be reliable is very well spent. Bear in mind that we have hundreds -of tests, each running in dozens of different environments, and if any -test in any test environment fails, we have to assume that we -potentially have some sort of regression. So if a significant number -of tests fail even only 1% of the time, basic statistics dictates that -we will almost never have a "green" regression indicator. Stated -another way, writing a test that is only 99% reliable is just about -useless in the harsh reality of a CI environment. In fact it's worse -than useless, because not only does it not provide a reliable -regression indicator, but it also costs a lot of subsequent debugging -time, and delayed merges. - -#### Debuggability #### - -If your test fails, it should provide as detailed as possible reasons -for the failure in its output. "Timeout" is not a useful error -message. "Timed out after 60 seconds waiting for pod xxx to enter -running state, still in pending state" is much more useful to someone -trying to figure out why your test failed and what to do about it. -Specifically, -[assertion](https://onsi.github.io/gomega/#making-assertions) code -like the following generates rather useless errors: - -``` -Expect(err).NotTo(HaveOccurred()) -``` - -Rather -[annotate](https://onsi.github.io/gomega/#annotating-assertions) your assertion with something like this: - -``` -Expect(err).NotTo(HaveOccurred(), "Failed to create %d foobars, only created %d", foobarsReqd, foobarsCreated) -``` - -On the other hand, overly verbose logging, particularly of non-error conditions, can make -it unnecessarily difficult to figure out whether a test failed and if -so why? So don't log lots of irrelevant stuff either. - -#### Ability to run in non-dedicated test clusters #### - -To reduce end-to-end delay and improve resource utilization when -running e2e tests, we try, where possible, to run large numbers of -tests in parallel against the same test cluster. This means that: - -1. you should avoid making any assumption (implicit or explicit) that -your test is the only thing running against the cluster. For example, -making the assumption that your test can run a pod on every node in a -cluster is not a safe assumption, as some other tests, running at the -same time as yours, might have saturated one or more nodes in the -cluster. Similarly, running a pod in the system namespace, and -assuming that will increase the count of pods in the system -namespace by one is not safe, as some other test might be creating or -deleting pods in the system namespace at the same time as your test. -If you do legitimately need to write a test like that, make sure to -label it ["\[Serial\]"](e2e-tests.md#kinds-of-tests) so that it's easy -to identify, and not run in parallel with any other tests. -1. You should avoid doing things to the cluster that make it difficult -for other tests to reliably do what they're trying to do, at the same -time. For example, rebooting nodes, disconnecting network interfaces, -or upgrading cluster software as part of your test is likely to -violate the assumptions that other tests might have made about a -reasonably stable cluster environment. If you need to write such -tests, please label them as -["\[Disruptive\]"](e2e-tests.md#kinds-of-tests) so that it's easy to -identify them, and not run them in parallel with other tests. -1. You should avoid making assumptions about the Kubernetes API that -are not part of the API specification, as your tests will break as -soon as these assumptions become invalid. For example, relying on -specific Events, Event reasons or Event messages will make your tests -very brittle. - -#### Speed of execution #### - -We have hundreds of e2e tests, some of which we run in serial, one -after the other, in some cases. If each test takes just a few minutes -to run, that very quickly adds up to many, many hours of total -execution time. We try to keep such total execution time down to a -few tens of minutes at most. Therefore, try (very hard) to keep the -execution time of your individual tests below 2 minutes, ideally -shorter than that. Concretely, adding inappropriately long 'sleep' -statements or other gratuitous waits to tests is a killer. If under -normal circumstances your pod enters the running state within 10 -seconds, and 99.9% of the time within 30 seconds, it would be -gratuitous to wait 5 minutes for this to happen. Rather just fail -after 30 seconds, with a clear error message as to why your test -failed ("e.g. Pod x failed to become ready after 30 seconds, it -usually takes 10 seconds"). If you do have a truly legitimate reason -for waiting longer than that, or writing a test which takes longer -than 2 minutes to run, comment very clearly in the code why this is -necessary, and label the test as -["\[Slow\]"](e2e-tests.md#kinds-of-tests), so that it's easy to -identify and avoid in test runs that are required to complete -timeously (for example those that are run against every code -submission before it is allowed to be merged). -Note that completing within, say, 2 minutes only when the test -passes is not generally good enough. Your test should also fail in a -reasonable time. We have seen tests that, for example, wait up to 10 -minutes for each of several pods to become ready. Under good -conditions these tests might pass within a few seconds, but if the -pods never become ready (e.g. due to a system regression) they take a -very long time to fail and typically cause the entire test run to time -out, so that no results are produced. Again, this is a lot less -useful than a test that fails reliably within a minute or two when the -system is not working correctly. - -#### Resilience to relatively rare, temporary infrastructure glitches or delays #### - -Remember that your test will be run many thousands of -times, at different times of day and night, probably on different -cloud providers, under different load conditions. And often the -underlying state of these systems is stored in eventually consistent -data stores. So, for example, if a resource creation request is -theoretically asynchronous, even if you observe it to be practically -synchronous most of the time, write your test to assume that it's -asynchronous (e.g. make the "create" call, and poll or watch the -resource until it's in the correct state before proceeding). -Similarly, don't assume that API endpoints are 100% available. -They're not. Under high load conditions, API calls might temporarily -fail or time-out. In such cases it's appropriate to back off and retry -a few times before failing your test completely (in which case make -the error message very clear about what happened, e.g. "Retried -http://... 3 times - all failed with xxx". Use the standard -retry mechanisms provided in the libraries detailed below. - -### Some concrete tools at your disposal ### - -Obviously most of the above goals apply to many tests, not just yours. -So we've developed a set of reusable test infrastructure, libraries -and best practices to help you to do the right thing, or at least do -the same thing as other tests, so that if that turns out to be the -wrong thing, it can be fixed in one place, not hundreds, to be the -right thing. - -Here are a few pointers: - -+ [E2e Framework](https://git.k8s.io/kubernetes/test/e2e/framework/framework.go): - Familiarise yourself with this test framework and how to use it. - Amongst others, it automatically creates uniquely named namespaces - within which your tests can run to avoid name clashes, and reliably - automates cleaning up the mess after your test has completed (it - just deletes everything in the namespace). This helps to ensure - that tests do not leak resources. Note that deleting a namespace - (and by implication everything in it) is currently an expensive - operation. So the fewer resources you create, the less cleaning up - the framework needs to do, and the faster your test (and other - tests running concurrently with yours) will complete. Your tests - should always use this framework. Trying other home-grown - approaches to avoiding name clashes and resource leaks has proven - to be a very bad idea. -+ [E2e utils library](https://git.k8s.io/kubernetes/test/e2e/framework/util.go): - This handy library provides tons of reusable code for a host of - commonly needed test functionality, including waiting for resources - to enter specified states, safely and consistently retrying failed - operations, usefully reporting errors, and much more. Make sure - that you're familiar with what's available there, and use it. - Likewise, if you come across a generally useful mechanism that's - not yet implemented there, add it so that others can benefit from - your brilliance. In particular pay attention to the variety of - timeout and retry related constants at the top of that file. Always - try to reuse these constants rather than try to dream up your own - values. Even if the values there are not precisely what you would - like to use (timeout periods, retry counts etc), the benefit of - having them be consistent and centrally configurable across our - entire test suite typically outweighs your personal preferences. -+ **Follow the examples of stable, well-written tests:** Some of our - existing end-to-end tests are better written and more reliable than - others. A few examples of well-written tests include: - [Replication Controllers](https://git.k8s.io/kubernetes/test/e2e/apps/rc.go), - [Services](https://git.k8s.io/kubernetes/test/e2e/network/service.go), - [Reboot](https://git.k8s.io/kubernetes/test/e2e/lifecycle/reboot.go). -+ [Ginkgo Test Framework](https://github.com/onsi/ginkgo): This is the - test library and runner upon which our e2e tests are built. Before - you write or refactor a test, read the docs and make sure that you - understand how it works. In particular be aware that every test is - uniquely identified and described (e.g. in test reports) by the - concatenation of its `Describe` clause and nested `It` clauses. - So for example `Describe("Pods",...).... It(""should be scheduled - with cpu and memory limits")` produces a sane test identifier and - descriptor `Pods should be scheduled with cpu and memory limits`, - which makes it clear what's being tested, and hence what's not - working if it fails. Other good examples include: - -``` - CAdvisor should be healthy on every node -``` - -and - -``` - Daemon set should run and stop complex daemon -``` - - On the contrary -(these are real examples), the following are less good test -descriptors: - -``` - KubeProxy should test kube-proxy -``` - -and - -``` -Nodes [Disruptive] Network when a node becomes unreachable -[replication controller] recreates pods scheduled on the -unreachable node AND allows scheduling of pods on a node after -it rejoins the cluster -``` - -An improvement might be - -``` -Unreachable nodes are evacuated and then repopulated upon rejoining [Disruptive] -``` - -Note that opening issues for specific better tooling is welcome, and -code implementing that tooling is even more welcome :-). +This file has moved to https://git.k8s.io/community/contributors/devel/sig-testing/writing-good-e2e-tests.md. +This file is a placeholder to preserve links. Please remove by April 30, 2019 or the release of kubernetes 1.13, whichever comes first.
\ No newline at end of file diff --git a/contributors/guide/README.md b/contributors/guide/README.md index e053b949..5af35566 100644 --- a/contributors/guide/README.md +++ b/contributors/guide/README.md @@ -217,14 +217,14 @@ When reviewing PRs from others [The Gentle Art of Patch Review](http://sage.thes ## Testing Testing is the responsibility of all contributors and is in part owned by all SIGss, but is also coordinated by [sig-testing](/sig-testing). -Refer to the [Testing Guide](/contributors/devel/testing.md) for more information. +Refer to the [Testing Guide](/contributors/devel/sig-testing/testing.md) for more information. There are multiple types of tests. The location of the test code varies with type, as do the specifics of the environment needed to successfully run the test: * Unit: These confirm that a particular function behaves as intended. Golang includes a native ability for unit testing via the [testing](https://golang.org/pkg/testing/) package. Unit test source code can be found adjacent to the corresponding source code within a given package. For example: functions defined in [kubernetes/cmd/kubeadm/app/util/version.go](https://git.k8s.io/kubernetes/cmd/kubeadm/app/util/version.go) will have unit tests in [kubernetes/cmd/kubeadm/app/util/version_test.go](https://git.k8s.io/kubernetes/cmd/kubeadm/app/util/version_test.go). These are easily run locally by any developer on any OS. * Integration: These tests cover interactions of package components or interactions between kubernetes components and some other non-kubernetes system resource (eg: etcd). An example would be testing whether a piece of code can correctly store data to or retrieve data from etcd. Integration tests are stored in [kubernetes/test/integration/](https://git.k8s.io/kubernetes/test/integration). Running these can require the developer set up additional functionality on their development system. -* End-to-end ("e2e"): These are broad tests of overall system behavior and coherence. These are more complicated as they require a functional kubernetes cluster built from the sources to be tested. A separate [document detailing e2e testing](/contributors/devel/e2e-tests.md) and test cases themselves can be found in [kubernetes/test/e2e/](https://git.k8s.io/kubernetes/test/e2e). +* End-to-end ("e2e"): These are broad tests of overall system behavior and coherence. These are more complicated as they require a functional kubernetes cluster built from the sources to be tested. A separate [document detailing e2e testing](/contributors/devel/sig-testing/e2e-tests.md) and test cases themselves can be found in [kubernetes/test/e2e/](https://git.k8s.io/kubernetes/test/e2e). * Conformance: These are a set of testcases, currently a subset of the integration/e2e tests, that the Architecture SIG has approved to define the core set of interoperable features that all Kubernetes deployments must support. For more information on Conformance tests please see the [Conformance Testing](/contributors/devel/conformance-tests.md) Document. Continuous integration will run these tests either as pre-submits on PRs, post-submits against master/release branches, or both. diff --git a/contributors/guide/coding-conventions.md b/contributors/guide/coding-conventions.md index ebabbcbf..c424b9d8 100644 --- a/contributors/guide/coding-conventions.md +++ b/contributors/guide/coding-conventions.md @@ -72,7 +72,7 @@ tests example, see [TestNamespaceAuthorization](https://git.k8s.io/kubernetes/test/integration/auth/auth_test.go) - Significant features should come with integration (test/integration) and/or -[end-to-end (test/e2e) tests](/contributors/devel/e2e-tests.md) +[end-to-end (test/e2e) tests](/contributors/devel/sig-testing/e2e-tests.md) - Including new kubectl commands and major features of existing commands - Unit tests must pass on macOS and Windows platforms - if you use Linux @@ -86,7 +86,7 @@ required when your code does not compile on Windows). asynchronous thing to happen (e.g. wait for 1 seconds and expect a Pod to be running). Wait and retry instead. - - See the [testing guide](/contributors/devel/testing.md) for additional testing advice. + - See the [testing guide](/contributors/devel/sig-testing/testing.md) for additional testing advice. ## Directory and file conventions diff --git a/contributors/guide/github-workflow.md b/contributors/guide/github-workflow.md index 221a7921..cc1e7e8f 100644 --- a/contributors/guide/github-workflow.md +++ b/contributors/guide/github-workflow.md @@ -149,17 +149,17 @@ make test make test WHAT=./pkg/api/helper GOFLAGS=-v # Run integration tests, requires etcd -# For more info, visit https://git.k8s.io/community/contributors/devel/testing.md#integration-tests +# For more info, visit https://git.k8s.io/community/contributors/devel/sig-testing/testing.md#integration-tests make test-integration # Run e2e tests by building test binaries, turn up a test cluster, run all tests, and tear the cluster down # Equivalent to: go run hack/e2e.go -- -v --build --up --test --down # Note: running all e2e tests takes a LONG time! To run specific e2e tests, visit: -# https://git.k8s.io/community/contributors/devel/e2e-tests.md#building-kubernetes-and-running-the-tests +# https://git.k8s.io/community/contributors/devel/sig-testing/e2e-tests.md#building-kubernetes-and-running-the-tests make test-e2e ``` -See the [testing guide](/contributors/devel/testing.md) and [end-to-end tests](/contributors/devel/e2e-tests.md) +See the [testing guide](/contributors/devel/sig-testing/testing.md) and [end-to-end tests](/contributors/devel/sig-testing/e2e-tests.md) for additional information and scenarios. Run `make help` for additional information on these make targets. |
