From 58bb071825a70a882d5a4159529a5800b18349e0 Mon Sep 17 00:00:00 2001 From: Eric Tune Date: Wed, 15 Oct 2014 08:30:02 -0700 Subject: Move developer documentation to docs/devel/ Fix links. --- collab.md | 37 ++++++++++++ development.md | 179 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ flaky-tests.md | 52 +++++++++++++++++ releasing.dot | 113 ++++++++++++++++++++++++++++++++++++ releasing.md | 152 ++++++++++++++++++++++++++++++++++++++++++++++++ releasing.png | Bin 0 -> 30693 bytes releasing.svg | 113 ++++++++++++++++++++++++++++++++++++ 7 files changed, 646 insertions(+) create mode 100644 collab.md create mode 100644 development.md create mode 100644 flaky-tests.md create mode 100644 releasing.dot create mode 100644 releasing.md create mode 100644 releasing.png create mode 100644 releasing.svg diff --git a/collab.md b/collab.md new file mode 100644 index 00000000..c4644048 --- /dev/null +++ b/collab.md @@ -0,0 +1,37 @@ +# On Collaborative Development + +Kubernetes is open source, but many of the people working on it do so as their day job. In order to avoid forcing people to be "at work" effectively 24/7, we want to establish some semi-formal protocols around development. Hopefully these rules make things go more smoothly. If you find that this is not the case, please complain loudly. + +## Patches welcome + +First and foremost: as a potential contributor, your changes and ideas are welcome at any hour of the day or night, weekdays, weekends, and holidays. Please do not ever hesitate to ask a question or send a PR. + +## Timezones and calendars + +For the time being, most of the people working on this project are in the US and on Pacific time. Any times mentioned henceforth will refer to this timezone. Any references to "work days" will refer to the US calendar. + +## Code reviews + +All changes must be code reviewed. For non-maintainers this is obvious, since you can't commit anyway. But even for maintainers, we want all changes to get at least one review, preferably from someone who knows the areas the change touches. For non-trivial changes we may want two reviewers. The primary reviewer will make this decision and nominate a second reviewer, if needed. Except for trivial changes, PRs should sit for at least a 2 hours to allow for wider review. + +Most PRs will find reviewers organically. If a maintainer intends to be the primary reviewer of a PR they should set themselves as the assignee on GitHub and say so in a reply to the PR. Only the primary reviewer of a change should actually do the merge, except in rare cases (e.g. they are unavailable in a reasonable timeframe). + +If a PR has gone 2 work days without an owner emerging, please poke the PR thread and ask for a reviewer to be assigned. + +Except for rare cases, such as trivial changes (e.g. typos, comments) or emergencies (e.g. broken builds), maintainers should not merge their own changes. + +Expect reviewers to request that you avoid [common go style mistakes](https://code.google.com/p/go-wiki/wiki/CodeReviewComments) in your PRs. + +## Assigned reviews + +Maintainers can assign reviews to other maintainers, when appropriate. The assignee becomes the shepherd for that PR and is responsible for merging the PR once they are satisfied with it or else closing it. The assignee might request reviews from non-maintainers. + +## Merge hours + +Maintainers will do merges between the hours of 7:00 am Monday and 7:00 pm (19:00h) Friday. PRs that arrive over the weekend or on holidays will only be merged if there is a very good reason for it and if the code review requirements have been met. + +There may be discussion an even approvals granted outside of the above hours, but merges will generally be deferred. + +## Holds + +Any maintainer or core contributor who wants to review a PR but does not have time immediately may put a hold on a PR simply by saying so on the PR discussion and offering an ETA measured in single-digit days at most. Any PR that has a hold shall not be merged until the person who requested the hold acks the review, withdraws their hold, or is overruled by a preponderance of maintainers. diff --git a/development.md b/development.md new file mode 100644 index 00000000..f750c611 --- /dev/null +++ b/development.md @@ -0,0 +1,179 @@ +# Development Guide + +# Releases and Official Builds + +Official releases are built in Docker containers. Details are [here](build/README.md). You can do simple builds and development with just a local Docker installation. If want to build go locally outside of docker, please continue below. + +## Go development environment + +Kubernetes is written in [Go](http://golang.org) programming language. If you haven't set up Go development environment, please follow [this instruction](http://golang.org/doc/code.html) to install go tool and set up GOPATH. Ensure your version of Go is at least 1.3. + +## Put kubernetes into GOPATH + +We highly recommend to put kubernetes' code into your GOPATH. For example, the following commands will download kubernetes' code under the current user's GOPATH (Assuming there's only one directory in GOPATH.): + +``` +$ echo $GOPATH +/home/user/goproj +$ mkdir -p $GOPATH/src/github.com/GoogleCloudPlatform/ +$ cd $GOPATH/src/github.com/GoogleCloudPlatform/ +$ git clone git@github.com:GoogleCloudPlatform/kubernetes.git +``` + +The commands above will not work if there are more than one directory in ``$GOPATH``. + +(Obviously, clone your own fork of Kubernetes if you plan to do development.) + +## godep and dependency management + +Kubernetes uses [godep](https://github.com/tools/godep) to manage dependencies. It is not strictly required for building Kubernetes but it is required when managing dependencies under the Godeps/ tree, and is required by a number of the build and test scripts. Please make sure that ``godep`` is installed and in your ``$PATH``. + +### Installing godep +There are many ways to build and host go binaries. Here is an easy way to get utilities like ```godep``` installed: + +1. Ensure that [mercurial](http://mercurial.selenic.com/wiki/Download) is installed on your system. (some of godep's dependencies use the mercurial +source control system). Use ```apt-get install mercurial``` or ```yum install mercurial``` on Linux, or [brew.sh](http://brew.sh) on OS X, or download +directly from mercurial. +2. Create a new GOPATH for your tools and install godep: +``` +export GOPATH=$HOME/go-tools +mkdir -p $GOPATH +go get github.com/tools/godep +``` + +3. Add $GOPATH/bin to your path. Typically you'd add this to your ~/.profile: +``` +export GOPATH=$HOME/go-tools +export PATH=$PATH:$GOPATH/bin +``` + +### Using godep +Here is a quick summary of `godep`. `godep` helps manage third party dependencies by copying known versions into Godeps/_workspace. You can use `godep` in three ways: + +1. Use `godep` to call your `go` commands. For example: `godep go test ./...` +2. Use `godep` to modify your `$GOPATH` so that other tools know where to find the dependencies. Specifically: `export GOPATH=$GOPATH:$(godep path)` +3. Use `godep` to copy the saved versions of packages into your `$GOPATH`. This is done with `godep restore`. + +We recommend using options #1 or #2. + +## Hooks + +Before committing any changes, please link/copy these hooks into your .git +directory. This will keep you from accidentally committing non-gofmt'd go code. + +``` +cd kubernetes +ln -s hooks/prepare-commit-msg .git/hooks/prepare-commit-msg +ln -s hooks/commit-msg .git/hooks/commit-msg +``` + +## Unit tests + +``` +cd kubernetes +hack/test-go.sh +``` + +Alternatively, you could also run: + +``` +cd kubernetes +godep go test ./... +``` + +If you only want to run unit tests in one package, you could run ``godep go test`` under the package directory. For example, the following commands will run all unit tests in package kubelet: + +``` +$ cd kubernetes # step into kubernetes' directory. +$ cd pkg/kubelet +$ godep go test +# some output from unit tests +PASS +ok github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet 0.317s +``` + +## Coverage +``` +cd kubernetes +godep go tool cover -html=target/c.out +``` + +## Integration tests + +You need an etcd somewhere in your PATH. To install etcd, run: + +``` +cd kubernetes +hack/install-etcd.sh +sudo ln -s $(pwd)/third_party/etcd/bin/etcd /usr/bin/etcd +``` + +``` +cd kubernetes +hack/test-integration.sh +``` + +## End-to-End tests + +You can run an end-to-end test which will bring up a master and two minions, perform some tests, and then tear everything down. Make sure you have followed the getting started steps for your chosen cloud platform (which might involve changing the `KUBERNETES_PROVIDER` environment variable to something other than "gce". +``` +cd kubernetes +hack/e2e-test.sh +``` + +Pressing control-C should result in an orderly shutdown but if something goes wrong and you still have some VMs running you can force a cleanup with the magical incantation: +``` +hack/e2e-test.sh 1 1 1 +``` + +## Testing out flaky tests +[Instructions here](docs/devel/flaky-tests.md) + +## Add/Update dependencies + +Kubernetes uses [godep](https://github.com/tools/godep) to manage dependencies. To add or update a package, please follow the instructions on [godep's document](https://github.com/tools/godep). + +To add a new package ``foo/bar``: + +- Make sure the kubernetes' root directory is in $GOPATH/github.com/GoogleCloudPlatform/kubernetes +- Run ``godep restore`` to make sure you have all dependancies pulled. +- Download foo/bar into the first directory in GOPATH: ``go get foo/bar``. +- Change code in kubernetes to use ``foo/bar``. +- Run ``godep save ./...`` under kubernetes' root directory. + +To update a package ``foo/bar``: + +- Make sure the kubernetes' root directory is in $GOPATH/github.com/GoogleCloudPlatform/kubernetes +- Run ``godep restore`` to make sure you have all dependancies pulled. +- Update the package with ``go get -u foo/bar``. +- Change code in kubernetes accordingly if necessary. +- Run ``godep update foo/bar`` under kubernetes' root directory. + +## Keeping your development fork in sync + +One time after cloning your forked repo: + +``` +git remote add upstream https://github.com/GoogleCloudPlatform/kubernetes.git +``` + +Then each time you want to sync to upstream: + +``` +git fetch upstream +git rebase upstream/master +``` + +## Regenerating the API documentation + +``` +cd kubernetes/api +sudo docker build -t kubernetes/raml2html . +sudo docker run --name="docgen" kubernetes/raml2html +sudo docker cp docgen:/data/kubernetes.html . +``` + +View the API documentation using htmlpreview (works on your fork, too): +``` +http://htmlpreview.github.io/?https://github.com/GoogleCloudPlatform/kubernetes/blob/master/api/kubernetes.html +``` diff --git a/flaky-tests.md b/flaky-tests.md new file mode 100644 index 00000000..d2cc8fad --- /dev/null +++ b/flaky-tests.md @@ -0,0 +1,52 @@ +# Hunting flaky tests in Kubernetes +Sometimes unit tests are flaky. This means that due to (usually) race conditions, they will occasionally fail, even though most of the time they pass. + +We have a goal of 99.9% flake free tests. This means that there is only one flake in one thousand runs of a test. + +Running a test 1000 times on your own machine can be tedious and time consuming. Fortunately, there is a better way to achieve this using Kubernetes. + +_Note: these instructions are mildly hacky for now, as we get run once semantics and logging they will get better_ + +There is a testing image ```brendanburns/flake``` up on the docker hub. We will use this image to test our fix. + +Create a replication controller with the following config: +```yaml +id: flakeController +desiredState: + replicas: 24 + replicaSelector: + name: flake + podTemplate: + desiredState: + manifest: + version: v1beta1 + id: "" + volumes: [] + containers: + - name: flake + image: brendanburns/flake + env: + - name: TEST_PACKAGE + value: pkg/tools + - name: REPO_SPEC + value: https://github.com/GoogleCloudPlatform/kubernetes + restartpolicy: {} + labels: + name: flake +labels: + name: flake +``` + +```./cluster/kubecfg.sh -c controller.yaml create replicaControllers``` + +This will spin up 100 instances of the test. They will run to completion, then exit, the kubelet will restart them, eventually you will have sufficient +runs for your purposes, and you can stop the replication controller: + +```sh +./cluster/kubecfg.sh stop flakeController +./cluster/kubecfg.sh rm flakeController +``` + +Now examine the machines with ```docker ps -a``` and look for tasks that exited with non-zero exit codes (ignore those that exited -1, since that's what happens when you stop the replica controller) + +Happy flake hunting! diff --git a/releasing.dot b/releasing.dot new file mode 100644 index 00000000..fe8124c3 --- /dev/null +++ b/releasing.dot @@ -0,0 +1,113 @@ +// Build it with: +// $ dot -Tsvg releasing.dot >releasing.svg + +digraph tagged_release { + size = "5,5" + // Arrows go up. + rankdir = BT + subgraph left { + // Group the left nodes together. + ci012abc -> pr101 -> ci345cde -> pr102 + style = invis + } + subgraph right { + // Group the right nodes together. + version_commit -> dev_commit + style = invis + } + { // Align the version commit and the info about it. + rank = same + // Align them with pr101 + pr101 + version_commit + // release_info shows the change in the commit. + release_info + } + { // Align the dev commit and the info about it. + rank = same + // Align them with 345cde + ci345cde + dev_commit + dev_info + } + // Join the nodes from subgraph left. + pr99 -> ci012abc + pr102 -> pr100 + // Do the version node. + pr99 -> version_commit + dev_commit -> pr100 + tag -> version_commit + pr99 [ + label = "Merge PR #99" + shape = box + fillcolor = "#ccccff" + style = "filled" + fontname = "Helvetica Neue, Helvetica, Segoe UI, Arial, freesans, sans-serif" + ]; + ci012abc [ + label = "012abc" + shape = circle + fillcolor = "#ffffcc" + style = "filled" + fontname = "Consolas, Liberation Mono, Menlo, Courier, monospace" + ]; + pr101 [ + label = "Merge PR #101" + shape = box + fillcolor = "#ccccff" + style = "filled" + fontname = "Helvetica Neue, Helvetica, Segoe UI, Arial, freesans, sans-serif" + ]; + ci345cde [ + label = "345cde" + shape = circle + fillcolor = "#ffffcc" + style = "filled" + fontname = "Consolas, Liberation Mono, Menlo, Courier, monospace" + ]; + pr102 [ + label = "Merge PR #102" + shape = box + fillcolor = "#ccccff" + style = "filled" + fontname = "Helvetica Neue, Helvetica, Segoe UI, Arial, freesans, sans-serif" + ]; + version_commit [ + label = "678fed" + shape = circle + fillcolor = "#ccffcc" + style = "filled" + fontname = "Consolas, Liberation Mono, Menlo, Courier, monospace" + ]; + dev_commit [ + label = "456dcb" + shape = circle + fillcolor = "#ffffcc" + style = "filled" + fontname = "Consolas, Liberation Mono, Menlo, Courier, monospace" + ]; + pr100 [ + label = "Merge PR #100" + shape = box + fillcolor = "#ccccff" + style = "filled" + fontname = "Helvetica Neue, Helvetica, Segoe UI, Arial, freesans, sans-serif" + ]; + release_info [ + label = "pkg/version/base.go:\ngitVersion = \"v0.5\";" + shape = none + fontname = "Helvetica Neue, Helvetica, Segoe UI, Arial, freesans, sans-serif" + ]; + dev_info [ + label = "pkg/version/base.go:\ngitVersion = \"v0.5-dev\";" + shape = none + fontname = "Helvetica Neue, Helvetica, Segoe UI, Arial, freesans, sans-serif" + ]; + tag [ + label = "$ git tag -a v0.5" + fillcolor = "#ffcccc" + style = "filled" + fontname = "Helvetica Neue, Helvetica, Segoe UI, Arial, freesans, sans-serif" + ]; +} + diff --git a/releasing.md b/releasing.md new file mode 100644 index 00000000..4cdf8827 --- /dev/null +++ b/releasing.md @@ -0,0 +1,152 @@ +# Releasing Kubernetes + +This document explains how to create a Kubernetes release (as in version) and +how the version information gets embedded into the built binaries. + +## Origin of the Sources + +Kubernetes may be built from either a git tree (using `hack/build-go.sh`) or +from a tarball (using either `hack/build-go.sh` or `go install`) or directly by +the Go native build system (using `go get`). + +When building from git, we want to be able to insert specific information about +the build tree at build time. In particular, we want to use the output of `git +describe` to generate the version of Kubernetes and the status of the build +tree (add a `-dirty` prefix if the tree was modified.) + +When building from a tarball or using the Go build system, we will not have +access to the information about the git tree, but we still want to be able to +tell whether this build corresponds to an exact release (e.g. v0.3) or is +between releases (e.g. at some point in development between v0.3 and v0.4). + +## Version Number Format + +In order to account for these use cases, there are some specific formats that +may end up representing the Kubernetes version. Here are a few examples: + +- **v0.5**: This is official version 0.5 and this version will only be used + when building from a clean git tree at the v0.5 git tag, or from a tree + extracted from the tarball corresponding to that specific release. +- **v0.5-15-g0123abcd4567**: This is the `git describe` output and it indicates + that we are 15 commits past the v0.5 release and that the SHA1 of the commit + where the binaries were built was `0123abcd4567`. It is only possible to have + this level of detail in the version information when building from git, not + when building from a tarball. +- **v0.5-15-g0123abcd4567-dirty** or **v0.5-dirty**: The extra `-dirty` prefix + means that the tree had local modifications or untracked files at the time of + the build, so there's no guarantee that the source code matches exactly the + state of the tree at the `0123abcd4567` commit or at the `v0.5` git tag + (resp.) +- **v0.5-dev**: This means we are building from a tarball or using `go get` or, + if we have a git tree, we are using `go install` directly, so it is not + possible to inject the git version into the build information. Additionally, + this is not an official release, so the `-dev` prefix indicates that the + version we are building is after `v0.5` but before `v0.6`. (There is actually + an exception where a commit with `v0.5-dev` is not present on `v0.6`, see + later for details.) + +## Injecting Version into Binaries + +In order to cover the different build cases, we start by providing information +that can be used when using only Go build tools or when we do not have the git +version information available. + +To be able to provide a meaningful version in those cases, we set the contents +of variables in a Go source file that will be used when no overrides are +present. + +We are using `pkg/version/base.go` as the source of versioning in absence of +information from git. Here is a sample of that file's contents: + +``` + var ( + gitVersion string = "v0.4-dev" // version from git, output of $(git describe) + gitCommit string = "" // sha1 from git, output of $(git rev-parse HEAD) + ) +``` + +This means a build with `go install` or `go get` or a build from a tarball will +yield binaries that will identify themselves as `v0.4-dev` and will not be able +to provide you with a SHA1. + +To add the extra versioning information when building from git, the +`hack/build-go.sh` script will gather that information (using `git describe` and +`git rev-parse`) and then create a `-ldflags` string to pass to `go install` and +tell the Go linker to override the contents of those variables at build time. It +can, for instance, tell it to override `gitVersion` and set it to +`v0.4-13-g4567bcdef6789-dirty` and set `gitCommit` to `4567bcdef6789...` which +is the complete SHA1 of the (dirty) tree used at build time. + +## Handling Official Versions + +Handling official versions from git is easy, as long as there is an annotated +git tag pointing to a specific version then `git describe` will return that tag +exactly which will match the idea of an official version (e.g. `v0.5`). + +Handling it on tarballs is a bit harder since the exact version string must be +present in `pkg/version/base.go` for it to get embedded into the binaries. But +simply creating a commit with `v0.5` on its own would mean that the commits +coming after it would also get the `v0.5` version when built from tarball or `go +get` while in fact they do not match `v0.5` (the one that was tagged) exactly. + +To handle that case, creating a new release should involve creating two adjacent +commits where the first of them will set the version to `v0.5` and the second +will set it to `v0.5-dev`. In that case, even in the presence of merges, there +will be a single comit where the exact `v0.5` version will be used and all +others around it will either have `v0.4-dev` or `v0.5-dev`. + +The diagram below illustrates it. + +![Diagram of git commits involved in the release](./releasing.png) + +After working on `v0.4-dev` and merging PR 99 we decide it is time to release +`v0.5`. So we start a new branch, create one commit to update +`pkg/version/base.go` to include `gitVersion = "v0.5"` and `git commit` it. + +We test it and make sure everything is working as expected. + +Before sending a PR for it, we create a second commit on that same branch, +updating `pkg/version/base.go` to include `gitVersion = "v0.5-dev"`. That will +ensure that further builds (from tarball or `go install`) on that tree will +always include the `-dev` prefix and will not have a `v0.5` version (since they +do not match the official `v0.5` exactly.) + +We then send PR 100 with both commits in it. + +Once the PR is accepted, we can use `git tag -a` to create an annotated tag +*pointing to the one commit* that has `v0.5` in `pkg/version/base.go` and push +it to GitHub. (Unfortunately GitHub tags/releases are not annotated tags, so +this needs to be done from a git client and pushed to GitHub using SSH.) + +## Parallel Commits + +While we are working on releasing `v0.5`, other development takes place and +other PRs get merged. For instance, in the example above, PRs 101 and 102 get +merged to the master branch before the versioning PR gets merged. + +This is not a problem, it is only slightly inaccurate that checking out the tree +at commit `012abc` or commit `345cde` or at the commit of the merges of PR 101 +or 102 will yield a version of `v0.4-dev` *but* those commits are not present in +`v0.5`. + +In that sense, there is a small window in which commits will get a +`v0.4-dev` or `v0.4-N-gXXX` label and while they're indeed later than `v0.4` +but they are not really before `v0.5` in that `v0.5` does not contain those +commits. + +Unfortunately, there is not much we can do about it. On the other hand, other +projects seem to live with that and it does not really become a large problem. + +As an example, Docker commit a327d9b91edf has a `v1.1.1-N-gXXX` label but it is +not present in Docker `v1.2.0`: + +``` + $ git describe a327d9b91edf + v1.1.1-822-ga327d9b91edf + + $ git log --oneline v1.2.0..a327d9b91edf + a327d9b91edf Fix data space reporting from Kb/Mb to KB/MB + + (Non-empty output here means the commit is not present on v1.2.0.) +``` + diff --git a/releasing.png b/releasing.png new file mode 100644 index 00000000..935628de Binary files /dev/null and b/releasing.png differ diff --git a/releasing.svg b/releasing.svg new file mode 100644 index 00000000..f703e6e2 --- /dev/null +++ b/releasing.svg @@ -0,0 +1,113 @@ + + + + + + +tagged_release + + +ci012abc + +012abc + + +pr101 + +Merge PR #101 + + +ci012abc->pr101 + + + + +ci345cde + +345cde + + +pr101->ci345cde + + + + +pr102 + +Merge PR #102 + + +ci345cde->pr102 + + + + +pr100 + +Merge PR #100 + + +pr102->pr100 + + + + +version_commit + +678fed + + +dev_commit + +456dcb + + +version_commit->dev_commit + + + + +dev_commit->pr100 + + + + +release_info +pkg/version/base.go: +gitVersion = "v0.5"; + + +dev_info +pkg/version/base.go: +gitVersion = "v0.5-dev"; + + +pr99 + +Merge PR #99 + + +pr99->ci012abc + + + + +pr99->version_commit + + + + +tag + +$ git tag -a v0.5 + + +tag->version_commit + + + + + -- cgit v1.2.3 From eff78d030052bcdce7269c392ac956d2fa6f6f4a Mon Sep 17 00:00:00 2001 From: Kouhei Ueno Date: Fri, 17 Oct 2014 19:45:12 +0900 Subject: Change git repo checkout https --- development.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/development.md b/development.md index f750c611..ccd64386 100644 --- a/development.md +++ b/development.md @@ -17,7 +17,7 @@ $ echo $GOPATH /home/user/goproj $ mkdir -p $GOPATH/src/github.com/GoogleCloudPlatform/ $ cd $GOPATH/src/github.com/GoogleCloudPlatform/ -$ git clone git@github.com:GoogleCloudPlatform/kubernetes.git +$ git clone https://github.com/GoogleCloudPlatform/kubernetes.git ``` The commands above will not work if there are more than one directory in ``$GOPATH``. -- cgit v1.2.3 From cf8e52e4286fc67c50a28432ce398ce2359ed527 Mon Sep 17 00:00:00 2001 From: Przemo Nowaczyk Date: Tue, 28 Oct 2014 20:57:15 +0100 Subject: small docs fixes --- development.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/development.md b/development.md index ccd64386..715ccb8f 100644 --- a/development.md +++ b/development.md @@ -62,9 +62,9 @@ Before committing any changes, please link/copy these hooks into your .git directory. This will keep you from accidentally committing non-gofmt'd go code. ``` -cd kubernetes -ln -s hooks/prepare-commit-msg .git/hooks/prepare-commit-msg -ln -s hooks/commit-msg .git/hooks/commit-msg +cd kubernetes/.git/hooks/ +ln -s ../../hooks/prepare-commit-msg . +ln -s ../../hooks/commit-msg . ``` ## Unit tests -- cgit v1.2.3 From b8c71ec88501b94c4f98d35304a9eefd582c3767 Mon Sep 17 00:00:00 2001 From: Eric Tune Date: Thu, 16 Oct 2014 14:45:16 -0700 Subject: Separated user, dev, and design docs. Renamed: logging.md -> devel/logging.m Renamed: access.md -> design/access.md Renamed: identifiers.md -> design/identifiers.md Renamed: labels.md -> design/labels.md Renamed: namespaces.md -> design/namespaces.md Renamed: security.md -> design/security.md Renamed: networking.md -> design/networking.md Added abbreviated user user-focused document in place of most moved docs. Added docs/README.md explains how docs are organized. Added short, user-oriented documentation on labels Added a glossary. Fixed up some links. --- access.md | 248 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ identifiers.md | 90 +++++++++++++++++++++ labels.md | 68 ++++++++++++++++ namespaces.md | 193 ++++++++++++++++++++++++++++++++++++++++++++ networking.md | 107 +++++++++++++++++++++++++ security.md | 26 ++++++ 6 files changed, 732 insertions(+) create mode 100644 access.md create mode 100644 identifiers.md create mode 100644 labels.md create mode 100644 namespaces.md create mode 100644 networking.md create mode 100644 security.md diff --git a/access.md b/access.md new file mode 100644 index 00000000..7af64ac9 --- /dev/null +++ b/access.md @@ -0,0 +1,248 @@ +# K8s Identity and Access Management Sketch + +This document suggests a direction for identity and access management in the Kubernetes system. + + +## Background + +High level goals are: + - Have a plan for how identity, authentication, and authorization will fit in to the API. + - Have a plan for partitioning resources within a cluster between independent organizational units. + - Ease integration with existing enterprise and hosted scenarios. + +### Actors +Each of these can act as normal users or attackers. + - External Users: People who are accessing applications running on K8s (e.g. a web site served by webserver running in a container on K8s), but who do not have K8s API access. + - K8s Users : People who access the K8s API (e.g. create K8s API objects like Pods) + - K8s Project Admins: People who manage access for some K8s Users + - K8s Cluster Admins: People who control the machines, networks, or binaries that comprise a K8s cluster. + - K8s Admin means K8s Cluster Admins and K8s Project Admins taken together. + +### Threats +Both intentional attacks and accidental use of privilege are concerns. + +For both cases it may be useful to think about these categories differently: + - Application Path - attack by sending network messages from the internet to the IP/port of any application running on K8s. May exploit weakness in application or misconfiguration of K8s. + - K8s API Path - attack by sending network messages to any K8s API endpoint. + - Insider Path - attack on K8s system components. Attacker may have privileged access to networks, machines or K8s software and data. Software errors in K8s system components and administrator error are some types of threat in this category. + +This document is primarily concerned with K8s API paths, and secondarily with Internal paths. The Application path also needs to be secure, but is not the focus of this document. + +### Assets to protect + +External User assets: + - Personal information like private messages, or images uploaded by External Users + - web server logs + +K8s User assets: + - External User assets of each K8s User + - things private to the K8s app, like: + - credentials for accessing other services (docker private repos, storage services, facebook, etc) + - SSL certificates for web servers + - proprietary data and code + +K8s Cluster assets: + - Assets of each K8s User + - Machine Certificates or secrets. + - The value of K8s cluster computing resources (cpu, memory, etc). + +This document is primarily about protecting K8s User assets and K8s cluster assets from other K8s Users and K8s Project and Cluster Admins. + +### Usage environments +Cluster in Small organization: + - K8s Admins may be the same people as K8s Users. + - few K8s Admins. + - prefer ease of use to fine-grained access control/precise accounting, etc. + - Product requirement that it be easy for potential K8s Cluster Admin to try out setting up a simple cluster. + +Cluster in Large organization: + - K8s Admins typically distinct people from K8s Users. May need to divide K8s Cluster Admin access by roles. + - K8s Users need to be protected from each other. + - Auditing of K8s User and K8s Admin actions important. + - flexible accurate usage accounting and resource controls important. + - Lots of automated access to APIs. + - Need to integrate with existing enterprise directory, authentication, accounting, auditing, and security policy infrastructure. + +Org-run cluster: + - organization that runs K8s master components is same as the org that runs apps on K8s. + - Minions may be on-premises VMs or physical machines; Cloud VMs; or a mix. + +Hosted cluster: + - Offering K8s API as a service, or offering a Paas or Saas built on K8s + - May already offer web services, and need to integrate with existing customer account concept, and existing authentication, accounting, auditing, and security policy infrastructure. + - May want to leverage K8s User accounts and accounting to manage their User accounts (not a priority to support this use case.) + - Precise and accurate accounting of resources needed. Resource controls needed for hard limits (Users given limited slice of data) and soft limits (Users can grow up to some limit and then be expanded). + +K8s ecosystem services: + - There may be companies that want to offer their existing services (Build, CI, A/B-test, release automation, etc) for use with K8s. There should be some story for this case. + +Pods configs should be largely portable between Org-run and hosted configurations. + + +# Design +Related discussion: +- https://github.com/GoogleCloudPlatform/kubernetes/issues/442 +- https://github.com/GoogleCloudPlatform/kubernetes/issues/443 + +This doc describes two security profiles: + - Simple profile: like single-user mode. Make it easy to evaluate K8s without lots of configuring accounts and policies. Protects from unauthorized users, but does not partition authorized users. + - Enterprise profile: Provide mechanisms needed for large numbers of users. Defense in depth. Should integrate with existing enterprise security infrastructure. + +K8s distribution should include templates of config, and documentation, for simple and enterprise profiles. System should be flexible enough for knowledgeable users to create intermediate profiles, but K8s developers should only reason about those two Profiles, not a matrix. + +Features in this doc are divided into "Initial Feature", and "Improvements". Initial features would be candidates for version 1.00. + +## Identity +###userAccount +K8s will have a `userAccount` API object. +- `userAccount` has a UID which is immutable. This is used to associate users with objects and to record actions in audit logs. +- `userAccount` has a name which is a string and human readable and unique among userAccounts. It is used to refer to users in Policies, to ensure that the Policies are human readable. It can be changed only when there are no Policy objects or other objects which refer to that name. An email address is a suggested format for this field. +- `userAccount` is not related to the unix username of processes in Pods created by that userAccount. +- `userAccount` API objects can have labels + +The system may associate one or more Authentication Methods with a +`userAccount` (but they are not formally part of the userAccount object.) +In a simple deployment, the authentication method for a +user might be an authentication token which is verified by a K8s server. In a +more complex deployment, the authentication might be delegated to +another system which is trusted by the K8s API to authenticate users, but where +the authentication details are unknown to K8s. + +Initial Features: +- there is no superuser `userAccount` +- `userAccount` objects are statically populated in the K8s API store by reading a config file. Only a K8s Cluster Admin can do this. +- `userAccount` can have a default `namespace`. If API call does not specify a `namespace`, the default `namespace` for that caller is assumed. +- `userAccount` is global. A single human with access to multiple namespaces is recommended to only have one userAccount. + +Improvements: +- Make `userAccount` part of a separate API group from core K8s objects like `pod`. Facilitates plugging in alternate Access Management. + +Simple Profile: + - single `userAccount`, used by all K8s Users and Project Admins. One access token shared by all. + +Enterprise Profile: + - every human user has own `userAccount`. + - `userAccount`s have labels that indicate both membership in groups, and ability to act in certain roles. + - each service using the API has own `userAccount` too. (e.g. `scheduler`, `repcontroller`) + - automated jobs to denormalize the ldap group info into the local system list of users into the K8s userAccount file. + +###Unix accounts +A `userAccount` is not a Unix user account. The fact that a pod is started by a `userAccount` does not mean that the processes in that pod's containers run as a Unix user with a corresponding name or identity. + +Initially: +- The unix accounts available in a container, and used by the processes running in a container are those that are provided by the combination of the base operating system and the Docker manifest. +- Kubernetes doesn't enforce any relation between `userAccount` and unix accounts. + +Improvements: +- Kubelet allocates disjoint blocks of root-namespace uids for each container. This may provide some defense-in-depth against container escapes. (https://github.com/docker/docker/pull/4572) +- requires docker to integrate user namespace support, and deciding what getpwnam() does for these uids. +- any features that help users avoid use of privileged containers (https://github.com/GoogleCloudPlatform/kubernetes/issues/391) + +###Namespaces +K8s will have a have a `namespace` API object. It is similar to a Google Compute Engine `project`. It provides a namespace for objects created by a group of people co-operating together, preventing name collisions with non-cooperating groups. It also serves as a reference point for authorization policies. + +Namespaces are described in [namespace.md](namespaces.md). + +In the Enterprise Profile: + - a `userAccount` may have permission to access several `namespace`s. + +In the Simple Profile: + - There is a single `namespace` used by the single user. + +Namespaces versus userAccount vs Labels: +- `userAccount`s are intended for audit logging (both name and UID should be logged), and to define who has access to `namespace`s. +- `labels` (see [docs/labels.md](labels.md)) should be used to distinguish pods, users, and other objects that cooperate towards a common goal but are different in some way, such as version, or responsibilities. +- `namespace`s prevent name collisions between uncoordinated groups of people, and provide a place to attach common policies for co-operating groups of people. + + +## Authentication + +Goals for K8s authentication: +- Include a built-in authentication system with no configuration required to use in single-user mode, and little configuration required to add several user accounts, and no https proxy required. +- Allow for authentication to be handled by a system external to Kubernetes, to allow integration with existing to enterprise authorization systems. The kubernetes namespace itself should avoid taking contributions of multiple authorization schemes. Instead, a trusted proxy in front of the apiserver can be used to authenticate users. + - For organizations whose security requirements only allow FIPS compliant implementations (e.g. apache) for authentication. + - So the proxy can terminate SSL, and isolate the CA-signed certificate from less trusted, higher-touch APIserver. + - For organizations that already have existing SaaS web services (e.g. storage, VMs) and want a common authentication portal. +- Avoid mixing authentication and authorization, so that authorization policies be centrally managed, and to allow changes in authentication methods without affecting authorization code. + +Initially: +- Tokens used to authenticate a user. +- Long lived tokens identify a particular `userAccount`. +- Administrator utility generates tokens at cluster setup. +- OAuth2.0 Bearer tokens protocol, http://tools.ietf.org/html/rfc6750 +- No scopes for tokens. Authorization happens in the API server +- Tokens dynamically generated by apiserver to identify pods which are making API calls. +- Tokens checked in a module of the APIserver. +- Authentication in apiserver can be disabled by flag, to allow testing without authorization enabled, and to allow use of an authenticating proxy. In this mode, a query parameter or header added by the proxy will identify the caller. + +Improvements: +- Refresh of tokens. +- SSH keys to access inside containers. + +To be considered for subsequent versions: +- Fuller use of OAuth (http://tools.ietf.org/html/rfc6749) +- Scoped tokens. +- Tokens that are bound to the channel between the client and the api server + - http://www.ietf.org/proceedings/90/slides/slides-90-uta-0.pdf + - http://www.browserauth.net + + +## Authorization + +K8s authorization should: +- Allow for a range of maturity levels, from single-user for those test driving the system, to integration with existing to enterprise authorization systems. +- Allow for centralized management of users and policies. In some organizations, this will mean that the definition of users and access policies needs to reside on a system other than k8s and encompass other web services (such as a storage service). +- Allow processes running in K8s Pods to take on identity, and to allow narrow scoping of permissions for those identities in order to limit damage from software faults. +- Have Authorization Policies exposed as API objects so that a single config file can create or delete Pods, Controllers, Services, and the identities and policies for those Pods and Controllers. +- Be separate as much as practical from Authentication, to allow Authentication methods to change over time and space, without impacting Authorization policies. + +K8s will implement a relatively simple +[Attribute-Based Access Control](http://en.wikipedia.org/wiki/Attribute_Based_Access_Control) model. +The model will be described in more detail in a forthcoming document. The model will +- Be less complex than XACML +- Be easily recognizable to those familiar with Amazon IAM Policies. +- Have a subset/aliases/defaults which allow it to be used in a way comfortable to those users more familiar with Role-Based Access Control. + +Authorization policy is set by creating a set of Policy objects. + +The API Server will be the Enforcement Point for Policy. For each API call that it receives, it will construct the Attributes needed to evaluate the policy (what user is making the call, what resource they are accessing, what they are trying to do that resource, etc) and pass those attributes to a Decision Point. The Decision Point code evaluates the Attributes against all the Policies and allows or denies the API call. The system will be modular enough that the Decision Point code can either be linked into the APIserver binary, or be another service that the apiserver calls for each Decision (with appropriate time-limited caching as needed for performance). + +Policy objects may be applicable only to a single namespace or to all namespaces; K8s Project Admins would be able to create those as needed. Other Policy objects may be applicable to all namespaces; a K8s Cluster Admin might create those in order to authorize a new type of controller to be used by all namespaces, or to make a K8s User into a K8s Project Admin.) + + +## Accounting + +The API should have a `quota` concept (see https://github.com/GoogleCloudPlatform/kubernetes/issues/442). A quota object relates a namespace (and optionally a label selector) to a maximum quantity of resources that may be used (see [resources.md](resources.md)). + +Initially: +- a `quota` object is immutable. +- for hosted K8s systems that do billing, Project is recommended level for billing accounts. +- Every object that consumes resources should have a `namespace` so that Resource usage stats are roll-up-able to `namespace`. +- K8s Cluster Admin sets quota objects by writing a config file. + +Improvements: +- allow one namespace to charge the quota for one or more other namespaces. This would be controlled by a policy which allows changing a billing_namespace= label on an object. +- allow quota to be set by namespace owners for (namespace x label) combinations (e.g. let "webserver" namespace use 100 cores, but to prevent accidents, don't allow "webserver" namespace and "instance=test" use more than 10 cores. +- tools to help write consistent quota config files based on number of minions, historical namespace usages, QoS needs, etc. +- way for K8s Cluster Admin to incrementally adjust Quota objects. + +Simple profile: + - a single `namespace` with infinite resource limits. + +Enterprise profile: + - multiple namespaces each with their own limits. + +Issues: +- need for locking or "eventual consistency" when multiple apiserver goroutines are accessing the object store and handling pod creations. + + +## Audit Logging + +API actions can be logged. + +Initial implementation: +- All API calls logged to nginx logs. + +Improvements: +- API server does logging instead. +- Policies to drop logging for high rate trusted API calls, or by users performing audit or other sensitive functions. diff --git a/identifiers.md b/identifiers.md new file mode 100644 index 00000000..1c0660c6 --- /dev/null +++ b/identifiers.md @@ -0,0 +1,90 @@ +# Identifiers and Names in Kubernetes + +A summarization of the goals and recommendations for identifiers in Kubernetes. Described in [GitHub issue #199](https://github.com/GoogleCloudPlatform/kubernetes/issues/199). + + +## Definitions + +UID +: A non-empty, opaque, system-generated value guaranteed to be unique in time and space; intended to distinguish between historical occurrences of similar entities. + +Name +: A non-empty string guaranteed to be unique within a given scope at a particular time; used in resource URLs; provided by clients at creation time and encouraged to be human friendly; intended to facilitate creation idempotence and space-uniqueness of singleton objects, distinguish distinct entities, and reference particular entities across operations. + +[rfc1035](http://www.ietf.org/rfc/rfc1035.txt)/[rfc1123](http://www.ietf.org/rfc/rfc1123.txt) label (DNS_LABEL) +: An alphanumeric (a-z, A-Z, and 0-9) string, with a maximum length of 63 characters, with the '-' character allowed anywhere except the first or last character, suitable for use as a hostname or segment in a domain name + +[rfc1035](http://www.ietf.org/rfc/rfc1035.txt)/[rfc1123](http://www.ietf.org/rfc/rfc1123.txt) subdomain (DNS_SUBDOMAIN) +: One or more rfc1035/rfc1123 labels separated by '.' with a maximum length of 253 characters + +[rfc4122](http://www.ietf.org/rfc/rfc4122.txt) universally unique identifier (UUID) +: A 128 bit generated value that is extremely unlikely to collide across time and space and requires no central coordination + + +## Objectives for names and UIDs + +1. Uniquely identify (via a UID) an object across space and time + +2. Uniquely name (via a name) an object across space + +3. Provide human-friendly names in API operations and/or configuration files + +4. Allow idempotent creation of API resources (#148) and enforcement of space-uniqueness of singleton objects + +5. Allow DNS names to be automatically generated for some objects + + +## General design + +1. When an object is created via an API, a Name string (a DNS_SUBDOMAIN) must be specified. Name must be non-empty and unique within the apiserver. This enables idempotent and space-unique creation operations. Parts of the system (e.g. replication controller) may join strings (e.g. a base name and a random suffix) to create a unique Name. For situations where generating a name is impractical, some or all objects may support a param to auto-generate a name. Generating random names will defeat idempotency. + * Examples: "guestbook.user", "backend-x4eb1" + +2. When an object is created via an api, a Namespace string (a DNS_SUBDOMAIN? format TBD via #1114) may be specified. Depending on the API receiver, namespaces might be validated (e.g. apiserver might ensure that the namespace actually exists). If a namespace is not specified, one will be assigned by the API receiver. This assignment policy might vary across API receivers (e.g. apiserver might have a default, kubelet might generate something semi-random). + * Example: "api.k8s.example.com" + +3. Upon acceptance of an object via an API, the object is assigned a UID (a UUID). UID must be non-empty and unique across space and time. + * Example: "01234567-89ab-cdef-0123-456789abcdef" + + +## Case study: Scheduling a pod + +Pods can be placed onto a particular node in a number of ways. This case +study demonstrates how the above design can be applied to satisfy the +objectives. + +### A pod scheduled by a user through the apiserver + +1. A user submits a pod with Namespace="" and Name="guestbook" to the apiserver. + +2. The apiserver validates the input. + 1. A default Namespace is assigned. + 2. The pod name must be space-unique within the Namespace. + 3. Each container within the pod has a name which must be space-unique within the pod. + +3. The pod is accepted. + 1. A new UID is assigned. + +4. The pod is bound to a node. + 1. The kubelet on the node is passed the pod's UID, Namespace, and Name. + +5. Kubelet validates the input. + +6. Kubelet runs the pod. + 1. Each container is started up with enough metadata to distinguish the pod from whence it came. + 2. Each attempt to run a container is assigned a UID (a string) that is unique across time. + * This may correspond to Docker's container ID. + +### A pod placed by a config file on the node + +1. A config file is stored on the node, containing a pod with UID="", Namespace="", and Name="cadvisor". + +2. Kubelet validates the input. + 1. Since UID is not provided, kubelet generates one. + 2. Since Namespace is not provided, kubelet generates one. + 1. The generated namespace should be deterministic and cluster-unique for the source, such as a hash of the hostname and file path. + * E.g. Namespace="file-f4231812554558a718a01ca942782d81" + +3. Kubelet runs the pod. + 1. Each container is started up with enough metadata to distinguish the pod from whence it came. + 2. Each attempt to run a container is assigned a UID (a string) that is unique across time. + 1. This may correspond to Docker's container ID. diff --git a/labels.md b/labels.md new file mode 100644 index 00000000..ff923931 --- /dev/null +++ b/labels.md @@ -0,0 +1,68 @@ +# Labels + +_Labels_ are key/value pairs identifying client/user-defined attributes (and non-primitive system-generated attributes) of API objects, which are stored and returned as part of the [metadata of those objects](api-conventions.md). Labels can be used to organize and to select subsets of objects according to these attributes. + +Each object can have a set of key/value labels set on it, with at most one label with a particular key. +``` +"labels": { + "key1" : "value1", + "key2" : "value2" +} +``` + +Unlike [names and UIDs](identifiers.md), labels do not provide uniqueness. In general, we expect many objects to carry the same label(s). + +Via a _label selector_, the client/user can identify a set of objects. The label selector is the core grouping primitive in Kubernetes. + +Label selectors may also be used to associate policies with sets of objects. + +We also [plan](https://github.com/GoogleCloudPlatform/kubernetes/issues/560) to make labels available inside pods and [lifecycle hooks](container-environment.md). + +[Namespacing of label keys](https://github.com/GoogleCloudPlatform/kubernetes/issues/1491) is under discussion. + +Valid labels follow a slightly modified RFC952 format: 24 characters or less, all lowercase, begins with alpha, dashes (-) are allowed, and ends with alphanumeric. + +## Motivation + +Service deployments and batch processing pipelines are often multi-dimensional entities (e.g., multiple partitions or deployments, multiple release tracks, multiple tiers, multiple micro-services per tier). Management often requires cross-cutting operations, which breaks encapsulation of strictly hierarchical representations, especially rigid hierarchies determined by the infrastructure rather than by users. Labels enable users to map their own organizational structures onto system objects in a loosely coupled fashion, without requiring clients to store these mappings. + +## Label selectors + +Label selectors permit very simple filtering by label keys and values. The simplicity of label selectors is deliberate. It is intended to facilitate transparency for humans, easy set overlap detection, efficient indexing, and reverse-indexing (i.e., finding all label selectors matching an object's labels - https://github.com/GoogleCloudPlatform/kubernetes/issues/1348). + +Currently the system supports selection by exact match of a map of keys and values. Matching objects must have all of the specified labels (both keys and values), though they may have additional labels as well. + +We are in the process of extending the label selection specification (see [selector.go](../blob/master/pkg/labels/selector.go) and https://github.com/GoogleCloudPlatform/kubernetes/issues/341) to support conjunctions of requirements of the following forms: +``` +key1 in (value11, value12, ...) +key1 not in (value11, value12, ...) +key1 exists +``` + +LIST and WATCH operations may specify label selectors to filter the sets of objects returned using a query parameter: `?labels=key1%3Dvalue1,key2%3Dvalue2,...`. We may extend such filtering to DELETE operations in the future. + +Kubernetes also currently supports two objects that use label selectors to keep track of their members, `service`s and `replicationController`s: +- `service`: A [service](services.md) is a configuration unit for the proxies that run on every worker node. It is named and points to one or more pods. +- `replicationController`: A [replication controller](replication-controller.md) ensures that a specified number of pod "replicas" are running at any one time. If there are too many, it'll kill some. If there are too few, it'll start more. + +The set of pods that a `service` targets is defined with a label selector. Similarly, the population of pods that a `replicationController` is monitoring is also defined with a label selector. + +For management convenience and consistency, `services` and `replicationControllers` may themselves have labels and would generally carry the labels their corresponding pods have in common. + +In the future, label selectors will be used to identify other types of distributed service workers, such as worker pool members or peers in a distributed application. + +Individual labels are used to specify identifying metadata, and to convey the semantic purposes/roles of pods of containers. Examples of typical pod label keys include `service`, `environment` (e.g., with values `dev`, `qa`, or `production`), `tier` (e.g., with values `frontend` or `backend`), and `track` (e.g., with values `daily` or `weekly`), but you are free to develop your own conventions. + +Sets identified by labels and label selectors could be overlapping (think Venn diagrams). For instance, a service might target all pods with `tier in (frontend), environment in (prod)`. Now say you have 10 replicated pods that make up this tier. But you want to be able to 'canary' a new version of this component. You could set up a `replicationController` (with `replicas` set to 9) for the bulk of the replicas with labels `tier=frontend, environment=prod, track=stable` and another `replicationController` (with `replicas` set to 1) for the canary with labels `tier=frontend, environment=prod, track=canary`. Now the service is covering both the canary and non-canary pods. But you can mess with the `replicationControllers` separately to test things out, monitor the results, etc. + +Note that the superset described in the previous example is also heterogeneous. In long-lived, highly available, horizontally scaled, distributed, continuously evolving service applications, heterogeneity is inevitable, due to canaries, incremental rollouts, live reconfiguration, simultaneous updates and auto-scaling, hardware upgrades, and so on. + +Pods (and other objects) may belong to multiple sets simultaneously, which enables representation of service substructure and/or superstructure. In particular, labels are intended to facilitate the creation of non-hierarchical, multi-dimensional deployment structures. They are useful for a variety of management purposes (e.g., configuration, deployment) and for application introspection and analysis (e.g., logging, monitoring, alerting, analytics). Without the ability to form sets by intersecting labels, many implicitly related, overlapping flat sets would need to be created, for each subset and/or superset desired, which would lose semantic information and be difficult to keep consistent. Purely hierarchically nested sets wouldn't readily support slicing sets across different dimensions. + +Pods may be removed from these sets by changing their labels. This flexibility may be used to remove pods from service for debugging, data recovery, etc. + +Since labels can be set at pod creation time, no separate set add/remove operations are necessary, which makes them easier to use than manual set management. Additionally, since labels are directly attached to pods and label selectors are fairly simple, it's easy for users and for clients and tools to determine what sets they belong to (i.e., they are reversible). OTOH, with sets formed by just explicitly enumerating members, one would (conceptually) need to search all sets to determine which ones a pod belonged to. + +## Labels vs. annotations + +We'll eventually index and reverse-index labels for efficient queries and watches, use them to sort and group in UIs and CLIs, etc. We don't want to pollute labels with non-identifying, especially large and/or structured, data. Non-identifying information should be recorded using [annotations](annotations.md). diff --git a/namespaces.md b/namespaces.md new file mode 100644 index 00000000..b80c6825 --- /dev/null +++ b/namespaces.md @@ -0,0 +1,193 @@ +# Kubernetes Proposal - Namespaces + +**Related PR:** + +| Topic | Link | +| ---- | ---- | +| Identifiers.md | https://github.com/GoogleCloudPlatform/kubernetes/pull/1216 | +| Access.md | https://github.com/GoogleCloudPlatform/kubernetes/pull/891 | +| Indexing | https://github.com/GoogleCloudPlatform/kubernetes/pull/1183 | +| Cluster Subdivision | https://github.com/GoogleCloudPlatform/kubernetes/issues/442 | + +## Background + +High level goals: + +* Enable an easy-to-use mechanism to logically scope Kubernetes resources +* Ensure extension resources to Kubernetes can share the same logical scope as core Kubernetes resources +* Ensure it aligns with access control proposal +* Ensure system has log n scale with increasing numbers of scopes + +## Use cases + +Actors: + +1. k8s admin - administers a kubernetes cluster +2. k8s service - k8s daemon operates on behalf of another user (i.e. controller-manager) +2. k8s policy manager - enforces policies imposed on k8s cluster +3. k8s user - uses a kubernetes cluster to schedule pods + +User stories: + +1. Ability to set immutable namespace to k8s resources +2. Ability to list k8s resource scoped to a namespace +3. Restrict a namespace identifier to a DNS-compatible string to support compound naming conventions +4. Ability for a k8s policy manager to enforce a k8s user's access to a set of namespaces +5. Ability to set/unset a default namespace for use by kubecfg client +6. Ability for a k8s service to monitor resource changes across namespaces +7. Ability for a k8s service to list resources across namespaces + +## Proposed Design + +### Model Changes + +Introduce a new attribute *Namespace* for each resource that must be scoped in a Kubernetes cluster. + +A *Namespace* is a DNS compatible subdomain. + +``` +// TypeMeta is shared by all objects sent to, or returned from the client +type TypeMeta struct { + Kind string `json:"kind,omitempty" yaml:"kind,omitempty"` + Uid string `json:"uid,omitempty" yaml:"uid,omitempty"` + CreationTimestamp util.Time `json:"creationTimestamp,omitempty" yaml:"creationTimestamp,omitempty"` + SelfLink string `json:"selfLink,omitempty" yaml:"selfLink,omitempty"` + ResourceVersion uint64 `json:"resourceVersion,omitempty" yaml:"resourceVersion,omitempty"` + APIVersion string `json:"apiVersion,omitempty" yaml:"apiVersion,omitempty"` + Namespace string `json:"namespace,omitempty" yaml:"namespace,omitempty"` + Name string `json:"name,omitempty" yaml:"name,omitempty"` +} +``` + +An identifier, *UID*, is unique across time and space intended to distinguish between historical occurences of similar entities. + +A *Name* is unique within a given *Namespace* at a particular time, used in resource URLs; provided by clients at creation time +and encouraged to be human friendly; intended to facilitate creation idempotence and space-uniqueness of singleton objects, distinguish +distinct entities, and reference particular entities across operations. + +As of this writing, the following resources MUST have a *Namespace* and *Name* + +* pod +* service +* replicationController +* endpoint + +A *policy* MAY be associated with a *Namespace*. + +If a *policy* has an associated *Namespace*, the resource paths it enforces are scoped to a particular *Namespace*. + +## k8s API server + +In support of namespace isolation, the Kubernetes API server will address resources by the following conventions: + +The typical actors for the following requests are the k8s user or the k8s service. + +| Action | HTTP Verb | Path | Description | +| ---- | ---- | ---- | ---- | +| CREATE | POST | /api/{version}/ns/{ns}/{resourceType}/ | Create instance of {resourceType} in namespace {ns} | +| GET | GET | /api/{version}/ns/{ns}/{resourceType}/{name} | Get instance of {resourceType} in namespace {ns} with {name} | +| UPDATE | PUT | /api/{version}/ns/{ns}/{resourceType}/{name} | Update instance of {resourceType} in namespace {ns} with {name} | +| DELETE | DELETE | /api/{version}/ns/{ns}/{resourceType}/{name} | Delete instance of {resourceType} in namespace {ns} with {name} | +| LIST | GET | /api/{version}/ns/{ns}/{resourceType} | List instances of {resourceType} in namespace {ns} | +| WATCH | GET | /api/{version}/watch/ns/{ns}/{resourceType} | Watch for changes to a {resourceType} in namespace {ns} | + +The typical actor for the following requests are the k8s service or k8s admin as enforced by k8s Policy. + +| Action | HTTP Verb | Path | Description | +| ---- | ---- | ---- | ---- | +| WATCH | GET | /api/{version}/watch/{resourceType} | Watch for changes to a {resourceType} across all namespaces | +| LIST | GET | /api/{version}/list/{resourceType} | List instances of {resourceType} across all namespaces | + +The legacy API patterns for k8s are an alias to interacting with the *default* namespace as follows. + +| Action | HTTP Verb | Path | Description | +| ---- | ---- | ---- | ---- | +| CREATE | POST | /api/{version}/{resourceType}/ | Create instance of {resourceType} in namespace *default* | +| GET | GET | /api/{version}/{resourceType}/{name} | Get instance of {resourceType} in namespace *default* | +| UPDATE | PUT | /api/{version}/{resourceType}/{name} | Update instance of {resourceType} in namespace *default* | +| DELETE | DELETE | /api/{version}/{resourceType}/{name} | Delete instance of {resourceType} in namespace *default* | + +The k8s API server verifies the *Namespace* on resource creation matches the *{ns}* on the path. + +The k8s API server will enable efficient mechanisms to filter model resources based on the *Namespace*. This may require +the creation of an index on *Namespace* that could support query by namespace with optional label selectors. + +The k8s API server will associate a resource with a *Namespace* if not populated by the end-user based on the *Namespace* context +of the incoming request. If the *Namespace* of the resource being created, or updated does not match the *Namespace* on the request, +then the k8s API server will reject the request. + +TODO: Update to discuss k8s api server proxy patterns + +## k8s storage + +A namespace provides a unique identifier space and therefore must be in the storage path of a resource. + +In etcd, we want to continue to still support efficient WATCH across namespaces. + +Resources that persist content in etcd will have storage paths as follows: + +/registry/{resourceType}/{resource.Namespace}/{resource.Name} + +This enables k8s service to WATCH /registry/{resourceType} for changes across namespace of a particular {resourceType}. + +Upon scheduling a pod to a particular host, the pod's namespace must be in the key path as follows: + +/host/{host}/pod/{pod.Namespace}/{pod.Name} + +## k8s Authorization service + +This design assumes the existence of an authorization service that filters incoming requests to the k8s API Server in order +to enforce user authorization to a particular k8s resource. It performs this action by associating the *subject* of a request +with a *policy* to an associated HTTP path and verb. This design encodes the *namespace* in the resource path in order to enable +external policy servers to function by resource path alone. If a request is made by an identity that is not allowed by +policy to the resource, the request is terminated. Otherwise, it is forwarded to the apiserver. + +## k8s controller-manager + +The controller-manager will provision pods in the same namespace as the associated replicationController. + +## k8s Kubelet + +There is no major change to the kubelet introduced by this proposal. + +### kubecfg client + +kubecfg supports following: + +``` +kubecfg [OPTIONS] ns {namespace} +``` + +To set a namespace to use across multiple operations: + +``` +$ kubecfg ns ns1 +``` + +To view the current namespace: + +``` +$ kubecfg ns +Using namespace ns1 +``` + +To reset to the default namespace: + +``` +$ kubecfg ns default +``` + +In addition, each kubecfg request may explicitly specify a namespace for the operation via the following OPTION + +--ns + +When loading resource files specified by the -c OPTION, the kubecfg client will ensure the namespace is set in the +message body to match the client specified default. + +If no default namespace is applied, the client will assume the following default namespace: + +* default + +The kubecfg client would store default namespace information in the same manner it caches authentication information today +as a file on user's file system. + diff --git a/networking.md b/networking.md new file mode 100644 index 00000000..167b7382 --- /dev/null +++ b/networking.md @@ -0,0 +1,107 @@ +# Networking + +## Model and motivation + +Kubernetes deviates from the default Docker networking model. The goal is for each pod to have an IP in a flat shared networking namespace that has full communication with other physical computers and containers across the network. IP-per-pod creates a clean, backward-compatible model where pods can be treated much like VMs or physical hosts from the perspectives of port allocation, networking, naming, service discovery, load balancing, application configuration, and migration. + +OTOH, dynamic port allocation requires supporting both static ports (e.g., for externally accessible services) and dynamically allocated ports, requires partitioning centrally allocated and locally acquired dynamic ports, complicates scheduling (since ports are a scarce resource), is inconvenient for users, complicates application configuration, is plagued by port conflicts and reuse and exhaustion, requires non-standard approaches to naming (e.g., etcd rather than DNS), requires proxies and/or redirection for programs using standard naming/addressing mechanisms (e.g., web browsers), requires watching and cache invalidation for address/port changes for instances in addition to watching group membership changes, and obstructs container/pod migration (e.g., using CRIU). NAT introduces additional complexity by fragmenting the addressing space, which breaks self-registration mechanisms, among other problems. + +With the IP-per-pod model, all user containers within a pod behave as if they are on the same host with regard to networking. They can all reach each other’s ports on localhost. Ports which are published to the host interface are done so in the normal Docker way. All containers in all pods can talk to all other containers in all other pods by their 10-dot addresses. + +In addition to avoiding the aforementioned problems with dynamic port allocation, this approach reduces friction for applications moving from the world of uncontainerized apps on physical or virtual hosts to containers within pods. People running application stacks together on the same host have already figured out how to make ports not conflict (e.g., by configuring them through environment variables) and have arranged for clients to find them. + +The approach does reduce isolation between containers within a pod -- ports could conflict, and there couldn't be private ports across containers within a pod, but applications requiring their own port spaces could just run as separate pods and processes requiring private communication could run within the same container. Besides, the premise of pods is that containers within a pod share some resources (volumes, cpu, ram, etc.) and therefore expect and tolerate reduced isolation. Additionally, the user can control what containers belong to the same pod whereas, in general, they don't control what pods land together on a host. + +When any container calls SIOCGIFADDR, it sees the IP that any peer container would see them coming from -- each pod has its own IP address that other pods can know. By making IP addresses and ports the same within and outside the containers and pods, we create a NAT-less, flat address space. "ip addr show" should work as expected. This would enable all existing naming/discovery mechanisms to work out of the box, including self-registration mechanisms and applications that distribute IP addresses. (We should test that with etcd and perhaps one other option, such as Eureka (used by Acme Air) or Consul.) We should be optimizing for inter-pod network communication. Within a pod, containers are more likely to use communication through volumes (e.g., tmpfs) or IPC. + +This is different from the standard Docker model. In that mode, each container gets an IP in the 172-dot space and would only see that 172-dot address from SIOCGIFADDR. If these containers connect to another container the peer would see the connect coming from a different IP than the container itself knows. In short - you can never self-register anything from a container, because a container can not be reached on its private IP. + +An alternative we considered was an additional layer of addressing: pod-centric IP per container. Each container would have its own local IP address, visible only within that pod. This would perhaps make it easier for containerized applications to move from physical/virtual hosts to pods, but would be more complex to implement (e.g., requiring a bridge per pod, split-horizon/VP DNS) and to reason about, due to the additional layer of address translation, and would break self-registration and IP distribution mechanisms. + +## Current implementation + +For the Google Compute Engine cluster configuration scripts, [advanced routing](https://developers.google.com/compute/docs/networking#routing) is set up so that each VM has an extra 256 IP addresses that get routed to it. This is in addition to the 'main' IP address assigned to the VM that is NAT-ed for Internet access. The networking bridge (called `cbr0` to differentiate it from `docker0`) is set up outside of Docker proper and only does NAT for egress network traffic that isn't aimed at the virtual network. + +Ports mapped in from the 'main IP' (and hence the internet if the right firewall rules are set up) are proxied in user mode by Docker. In the future, this should be done with `iptables` by either the Kubelet or Docker: [Issue #15](https://github.com/GoogleCloudPlatform/kubernetes/issues/15). + +We start Docker with: + DOCKER_OPTS="--bridge cbr0 --iptables=false" + +We set up this bridge on each node with SaltStack, in [container_bridge.py](cluster/saltbase/salt/_states/container_bridge.py). + + cbr0: + container_bridge.ensure: + - cidr: {{ grains['cbr-cidr'] }} + ... + grains: + roles: + - kubernetes-pool + cbr-cidr: $MINION_IP_RANGE + +We make these addresses routable in GCE: + + gcutil addroute ${MINION_NAMES[$i]} ${MINION_IP_RANGES[$i]} \ + --norespect_terminal_width \ + --project ${PROJECT} \ + --network ${NETWORK} \ + --next_hop_instance ${ZONE}/instances/${MINION_NAMES[$i]} & + +The minion IP ranges are /24s in the 10-dot space. + +GCE itself does not know anything about these IPs, though. + +These are not externally routable, though, so containers that need to communicate with the outside world need to use host networking. To set up an external IP that forwards to the VM, it will only forward to the VM's primary IP (which is assigned to no pod). So we use docker's -p flag to map published ports to the main interface. This has the side effect of disallowing two pods from exposing the same port. (More discussion on this in [Issue #390](https://github.com/GoogleCloudPlatform/kubernetes/issues/390).) + +We create a container to use for the pod network namespace -- a single loopback device and a single veth device. All the user's containers get their network namespaces from this pod networking container. + +Docker allocates IP addresses from a bridge we create on each node, using its “container” networking mode. + +1. Create a normal (in the networking sense) container which uses a minimal image and runs a command that blocks forever. This is not a user-defined container, and gets a special well-known name. + - creates a new network namespace (netns) and loopback device + - creates a new pair of veth devices and binds them to the netns + - auto-assigns an IP from docker’s IP range + +2. Create the user containers and specify the name of the network container as their “net” argument. Docker finds the PID of the command running in the network container and attaches to the netns of that PID. + +### Other networking implementation examples +With the primary aim of providing IP-per-pod-model, other implementations exist to serve the purpose outside of GCE. + - [OpenVSwitch with GRE/VxLAN](../ovs-networking.md) + - [Flannel](https://github.com/coreos/flannel#flannel) + +## Challenges and future work + +### Docker API + +Right now, docker inspect doesn't show the networking configuration of the containers, since they derive it from another container. That information should be exposed somehow. + +### External IP assignment + +We want to be able to assign IP addresses externally from Docker ([Docker issue #6743](https://github.com/dotcloud/docker/issues/6743)) so that we don't need to statically allocate fixed-size IP ranges to each node, so that IP addresses can be made stable across network container restarts ([Docker issue #2801](https://github.com/dotcloud/docker/issues/2801)), and to facilitate pod migration. Right now, if the network container dies, all the user containers must be stopped and restarted because the netns of the network container will change on restart, and any subsequent user container restart will join that new netns, thereby not being able to see its peers. Additionally, a change in IP address would encounter DNS caching/TTL problems. External IP assignment would also simplify DNS support (see below). + +### Naming, discovery, and load balancing + +In addition to enabling self-registration with 3rd-party discovery mechanisms, we'd like to setup DDNS automatically ([Issue #146](https://github.com/GoogleCloudPlatform/kubernetes/issues/146)). hostname, $HOSTNAME, etc. should return a name for the pod ([Issue #298](https://github.com/GoogleCloudPlatform/kubernetes/issues/298)), and gethostbyname should be able to resolve names of other pods. Probably we need to set up a DNS resolver to do the latter ([Docker issue #2267](https://github.com/dotcloud/docker/issues/2267)), so that we don't need to keep /etc/hosts files up to date dynamically. + +[Service](https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/services.md) endpoints are currently found through environment variables. Both [Docker-links-compatible](https://docs.docker.com/userguide/dockerlinks/) variables and kubernetes-specific variables ({NAME}_SERVICE_HOST and {NAME}_SERVICE_BAR) are supported, and resolve to ports opened by the service proxy. We don't actually use [the Docker ambassador pattern](https://docs.docker.com/articles/ambassador_pattern_linking/) to link containers because we don't require applications to identify all clients at configuration time, yet. While services today are managed by the service proxy, this is an implementation detail that applications should not rely on. Clients should instead use the [service portal IP](https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/services.md) (which the above environment variables will resolve to). However, a flat service namespace doesn't scale and environment variables don't permit dynamic updates, which complicates service deployment by imposing implicit ordering constraints. We intend to register each service portal IP in DNS, and for that to become the preferred resolution protocol. + +We'd also like to accommodate other load-balancing solutions (e.g., HAProxy), non-load-balanced services ([Issue #260](https://github.com/GoogleCloudPlatform/kubernetes/issues/260)), and other types of groups (worker pools, etc.). Providing the ability to Watch a label selector applied to pod addresses would enable efficient monitoring of group membership, which could be directly consumed or synced with a discovery mechanism. Event hooks ([Issue #140](https://github.com/GoogleCloudPlatform/kubernetes/issues/140)) for join/leave events would probably make this even easier. + +### External routability + +We want traffic between containers to use the pod IP addresses across nodes. Say we have Node A with a container IP space of 10.244.1.0/24 and Node B with a container IP space of 10.244.2.0/24. And we have Container A1 at 10.244.1.1 and Container B1 at 10.244.2.1. We want Container A1 to talk to Container B1 directly with no NAT. B1 should see the "source" in the IP packets of 10.244.1.1 -- not the "primary" host IP for Node A. That means that we want to turn off NAT for traffic between containers (and also between VMs and containers). + +We'd also like to make pods directly routable from the external internet. However, we can't yet support the extra container IPs that we've provisioned talking to the internet directly. So, we don't map external IPs to the container IPs. Instead, we solve that problem by having traffic that isn't to the internal network (! 10.0.0.0/8) get NATed through the primary host IP address so that it can get 1:1 NATed by the GCE networking when talking to the internet. Similarly, incoming traffic from the internet has to get NATed/proxied through the host IP. + +So we end up with 3 cases: + +1. Container -> Container or Container <-> VM. These should use 10. addresses directly and there should be no NAT. + +2. Container -> Internet. These have to get mapped to the primary host IP so that GCE knows how to egress that traffic. There is actually 2 layers of NAT here: Container IP -> Internal Host IP -> External Host IP. The first level happens in the guest with IP tables and the second happens as part of GCE networking. The first one (Container IP -> internal host IP) does dynamic port allocation while the second maps ports 1:1. + +3. Internet -> Container. This also has to go through the primary host IP and also has 2 levels of NAT, ideally. However, the path currently is a proxy with (External Host IP -> Internal Host IP -> Docker) -> (Docker -> Container IP). Once [issue #15](https://github.com/GoogleCloudPlatform/kubernetes/issues/15) is closed, it should be External Host IP -> Internal Host IP -> Container IP. But to get that second arrow we have to set up the port forwarding iptables rules per mapped port. + +Another approach could be to create a new host interface alias for each pod, if we had a way to route an external IP to it. This would eliminate the scheduling constraints resulting from using the host's IP address. + +### IPv6 + +IPv6 would be a nice option, also, but we can't depend on it yet. Docker support is in progress: [Docker issue #2974](https://github.com/dotcloud/docker/issues/2974), [Docker issue #6923](https://github.com/dotcloud/docker/issues/6923), [Docker issue #6975](https://github.com/dotcloud/docker/issues/6975). Additionally, direct ipv6 assignment to instances doesn't appear to be supported by major cloud providers (e.g., AWS EC2, GCE) yet. We'd happily take pull requests from people running Kubernetes on bare metal, though. :-) diff --git a/security.md b/security.md new file mode 100644 index 00000000..22034bdf --- /dev/null +++ b/security.md @@ -0,0 +1,26 @@ +# Security in Kubernetes + +General design principles and guidelines related to security of containers, APIs, and infrastructure in Kubernetes. + + +## Objectives + +1. Ensure a clear isolation between container and the underlying host it runs on +2. Limit the ability of the container to negatively impact the infrastructure or other containers +3. [Principle of Least Privilege](http://en.wikipedia.org/wiki/Principle_of_least_privilege) - ensure components are only authorized to perform the actions they need, and limit the scope of a compromise by limiting the capabilities of individual components +4. Reduce the number of systems that have to be hardened and secured by defining clear boundaries between components + + +## Design Points + +### Isolate the data store from the minions and supporting infrastructure + +Access to the central data store (etcd) in Kubernetes allows an attacker to run arbitrary containers on hosts, to gain access to any protected information stored in either volumes or in pods (such as access tokens or shared secrets provided as environment variables), to intercept and redirect traffic from running services by inserting middlemen, or to simply delete the entire history of the custer. + +As a general principle, access to the central data store should be restricted to the components that need full control over the system and which can apply appropriate authorization and authentication of change requests. In the future, etcd may offer granular access control, but that granularity will require an administrator to understand the schema of the data to properly apply security. An administrator must be able to properly secure Kubernetes at a policy level, rather than at an implementation level, and schema changes over time should not risk unintended security leaks. + +Both the Kubelet and Kube Proxy need information related to their specific roles - for the Kubelet, the set of pods it should be running, and for the Proxy, the set of services and endpoints to load balance. The Kubelet also needs to provide information about running pods and historical termination data. The access pattern for both Kubelet and Proxy to load their configuration is an efficient "wait for changes" request over HTTP. It should be possible to limit the Kubelet and Proxy to only access the information they need to perform their roles and no more. + +The controller manager for Replication Controllers and other future controllers act on behalf of a user via delegation to perform automated maintenance on Kubernetes resources. Their ability to access or modify resource state should be strictly limited to their intended duties and they should be prevented from accessing information not pertinent to their role. For example, a replication controller needs only to create a copy of a known pod configuration, to determine the running state of an existing pod, or to delete an existing pod that it created - it does not need to know the contents or current state of a pod, nor have access to any data in the pods attached volumes. + +The Kubernetes pod scheduler is responsible for reading data from the pod to fit it onto a minion in the cluster. At a minimum, it needs access to view the ID of a pod (to craft the binding), its current state, any resource information necessary to identify placement, and other data relevant to concerns like anti-affinity, zone or region preference, or custom logic. It does not need the ability to modify pods or see other resources, only to create bindings. It should not need the ability to delete bindings unless the scheduler takes control of relocating components on failed hosts (which could be implemented by a separate component that can delete bindings but not create them). The scheduler may need read access to user or project-container information to determine preferential location (underspecified at this time). \ No newline at end of file -- cgit v1.2.3 From 3306aecc13331d26f59b36e282278707769d1f90 Mon Sep 17 00:00:00 2001 From: Eric Tune Date: Thu, 16 Oct 2014 14:45:16 -0700 Subject: Separated user, dev, and design docs. Renamed: logging.md -> devel/logging.m Renamed: access.md -> design/access.md Renamed: identifiers.md -> design/identifiers.md Renamed: labels.md -> design/labels.md Renamed: namespaces.md -> design/namespaces.md Renamed: security.md -> design/security.md Renamed: networking.md -> design/networking.md Added abbreviated user user-focused document in place of most moved docs. Added docs/README.md explains how docs are organized. Added short, user-oriented documentation on labels Added a glossary. Fixed up some links. --- logging.md | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) create mode 100644 logging.md diff --git a/logging.md b/logging.md new file mode 100644 index 00000000..9b6bfa2a --- /dev/null +++ b/logging.md @@ -0,0 +1,26 @@ +Logging Conventions +=================== + +The following conventions for the glog levels to use. glog is globally prefered to "log" for better runtime control. + +* glog.Errorf() - Always an error +* glog.Warningf() - Something unexpected, but probably not an error +* glog.Infof() has multiple levels: + * glog.V(0) - Generally useful for this to ALWAYS be visible to an operator + * Programmer errors + * Logging extra info about a panic + * CLI argument handling + * glog.V(1) - A reasonable default log level if you don't want verbosity. + * Information about config (listening on X, watching Y) + * Errors that repeat frequently that relate to conditions that can be corrected (pod detected as unhealthy) + * glog.V(2) - Useful steady state information about the service and important log messages that may correlate to significant changes in the system. This is the recommended default log level for most systems. + * Logging HTTP requests and their exit code + * System state changing (killing pod) + * Controller state change events (starting pods) + * Scheduler log messages + * glog.V(3) - Extended information about changes + * More info about system state changes + * glog.V(4) - Debug level verbosity (for now) + * Logging in particularly thorny parts of code where you may want to come back later and check it + +As per the comments, the practical default level is V(2). Developers and QE environments may wish to run at V(3) or V(4). If you wish to change the log level, you can pass in `-v=X` where X is the desired maximum level to log. -- cgit v1.2.3 From d5bbcd262cf01b76475426aa0100f012f7471cc0 Mon Sep 17 00:00:00 2001 From: Meir Fischer Date: Sun, 9 Nov 2014 22:46:07 -0500 Subject: Fix bad selector file link --- labels.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/labels.md b/labels.md index ff923931..df904d3a 100644 --- a/labels.md +++ b/labels.md @@ -32,7 +32,7 @@ Label selectors permit very simple filtering by label keys and values. The simpl Currently the system supports selection by exact match of a map of keys and values. Matching objects must have all of the specified labels (both keys and values), though they may have additional labels as well. -We are in the process of extending the label selection specification (see [selector.go](../blob/master/pkg/labels/selector.go) and https://github.com/GoogleCloudPlatform/kubernetes/issues/341) to support conjunctions of requirements of the following forms: +We are in the process of extending the label selection specification (see [selector.go](/pkg/labels/selector.go) and https://github.com/GoogleCloudPlatform/kubernetes/issues/341) to support conjunctions of requirements of the following forms: ``` key1 in (value11, value12, ...) key1 not in (value11, value12, ...) -- cgit v1.2.3 From e60fd03ae144559d597553de28f391c27ad50a4c Mon Sep 17 00:00:00 2001 From: Maria Nita Date: Tue, 11 Nov 2014 17:21:38 +0100 Subject: Update path to files in development doc --- development.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/development.md b/development.md index 715ccb8f..220c9371 100644 --- a/development.md +++ b/development.md @@ -2,7 +2,7 @@ # Releases and Official Builds -Official releases are built in Docker containers. Details are [here](build/README.md). You can do simple builds and development with just a local Docker installation. If want to build go locally outside of docker, please continue below. +Official releases are built in Docker containers. Details are [here](../../build/README.md). You can do simple builds and development with just a local Docker installation. If want to build go locally outside of docker, please continue below. ## Go development environment @@ -104,7 +104,7 @@ You need an etcd somewhere in your PATH. To install etcd, run: ``` cd kubernetes -hack/install-etcd.sh +hack/travis/install-etcd.sh sudo ln -s $(pwd)/third_party/etcd/bin/etcd /usr/bin/etcd ``` -- cgit v1.2.3 From fcaa1651e4ba4c6b73284acdd45e18c19ec74a5d Mon Sep 17 00:00:00 2001 From: Deyuan Deng Date: Sun, 2 Nov 2014 20:13:43 -0500 Subject: Fix DESIGN.md link, and etcd installation instruction. --- development.md | 13 +------------ 1 file changed, 1 insertion(+), 12 deletions(-) diff --git a/development.md b/development.md index 220c9371..38635ace 100644 --- a/development.md +++ b/development.md @@ -100,18 +100,7 @@ godep go tool cover -html=target/c.out ## Integration tests -You need an etcd somewhere in your PATH. To install etcd, run: - -``` -cd kubernetes -hack/travis/install-etcd.sh -sudo ln -s $(pwd)/third_party/etcd/bin/etcd /usr/bin/etcd -``` - -``` -cd kubernetes -hack/test-integration.sh -``` +You need an [etcd](https://github.com/coreos/etcd/releases/tag/v0.4.6) in your path, please make sure it is installed and in your ``$PATH``. ## End-to-End tests -- cgit v1.2.3 From 397e99fc2120ecd698696eb98b0dbc3019874a72 Mon Sep 17 00:00:00 2001 From: Eric Tune Date: Mon, 17 Nov 2014 11:20:31 -0800 Subject: Update development.md It looks like magic incantation `hack/e2e-test.sh 1 1 1` is not longer supported. --- development.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/development.md b/development.md index 38635ace..3b831ef4 100644 --- a/development.md +++ b/development.md @@ -110,11 +110,13 @@ cd kubernetes hack/e2e-test.sh ``` -Pressing control-C should result in an orderly shutdown but if something goes wrong and you still have some VMs running you can force a cleanup with the magical incantation: +Pressing control-C should result in an orderly shutdown but if something goes wrong and you still have some VMs running you can force a cleanup with this command: ``` -hack/e2e-test.sh 1 1 1 +go run e2e.go --down ``` +See the flag definitions in `hack/e2e.go` for more options, such as reusing an existing cluster. + ## Testing out flaky tests [Instructions here](docs/devel/flaky-tests.md) -- cgit v1.2.3 From cc78c66a925dd4d35a683ffb50348403f5c2de06 Mon Sep 17 00:00:00 2001 From: Joe Beda Date: Tue, 25 Nov 2014 10:32:27 -0800 Subject: Convert gcutil to gcloud compute --- networking.md | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/networking.md b/networking.md index 167b7382..3f52d388 100644 --- a/networking.md +++ b/networking.md @@ -40,11 +40,12 @@ We set up this bridge on each node with SaltStack, in [container_bridge.py](clus We make these addresses routable in GCE: - gcutil addroute ${MINION_NAMES[$i]} ${MINION_IP_RANGES[$i]} \ - --norespect_terminal_width \ - --project ${PROJECT} \ - --network ${NETWORK} \ - --next_hop_instance ${ZONE}/instances/${MINION_NAMES[$i]} & + gcloud compute routes add "${MINION_NAMES[$i]}" \ + --project "${PROJECT}" \ + --destination-range "${MINION_IP_RANGES[$i]}" \ + --network "${NETWORK}" \ + --next-hop-instance "${MINION_NAMES[$i]}" \ + --next-hop-instance-zone "${ZONE}" & The minion IP ranges are /24s in the 10-dot space. -- cgit v1.2.3 From 3a3112c0e24b348c045adcfaba08ac57051f9d15 Mon Sep 17 00:00:00 2001 From: Tim Hockin Date: Thu, 20 Nov 2014 14:27:11 +0800 Subject: Loosen DNS 952 for labels --- labels.md | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/labels.md b/labels.md index df904d3a..8415376d 100644 --- a/labels.md +++ b/labels.md @@ -18,9 +18,17 @@ Label selectors may also be used to associate policies with sets of objects. We also [plan](https://github.com/GoogleCloudPlatform/kubernetes/issues/560) to make labels available inside pods and [lifecycle hooks](container-environment.md). -[Namespacing of label keys](https://github.com/GoogleCloudPlatform/kubernetes/issues/1491) is under discussion. - -Valid labels follow a slightly modified RFC952 format: 24 characters or less, all lowercase, begins with alpha, dashes (-) are allowed, and ends with alphanumeric. +Valid label keys are comprised of two segments - prefix and name - separated +by a slash (`/`). The name segment is required and must be a DNS label: 63 +characters or less, all lowercase, beginning and ending with an alphanumeric +character (`[a-z0-9]`), with dashes (`-`) and alphanumerics between. The +prefix and slash are optional. If specified, the prefix must be a DNS +subdomain (a series of DNS labels separated by dots (`.`), not longer than 253 +characters in total. + +If the prefix is omitted, the label key is presumed to be private to the user. +System components which use labels must specify a prefix. The `kubernetes.io` +prefix is reserved for kubernetes core components. ## Motivation -- cgit v1.2.3 From d935c1cbe379794974179ebebdd8dad1821035f4 Mon Sep 17 00:00:00 2001 From: goltermann Date: Mon, 1 Dec 2014 19:07:46 -0800 Subject: Adding docs for prioritization of issues. --- issues.md | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) create mode 100644 issues.md diff --git a/issues.md b/issues.md new file mode 100644 index 00000000..491dba49 --- /dev/null +++ b/issues.md @@ -0,0 +1,21 @@ +GitHub Issues for the Kubernetes Project +======================================== + +A list quick overview of how we will review and prioritize incoming issues at https://github.com/GoogleCloudPlatform/kubernetes/issues + +Priorities +---------- + +We will use GitHub issue labels for prioritization. The absence of a priority label means the bug has not been reviewed and prioritized yet. + +Priorities are "moment in time" labels, and what is low priority today, could be high priority tomorrow, and vice versa. As we move to v1.0, we may decide certain bugs aren't actually needed yet, or that others really do need to be pulled in. + +Here we define the priorities for up until v1.0. Once the Kubernetes project hits 1.0, we will revisit the scheme and update as appropriate. + +Definitions +----------- +* P0 - something broken for users, build broken, or critical security issue. Someone must drop everything and work on it. +* P1 - must fix for earliest possible OSS binary release (every two weeks) +* P2 - must fix for v1.0 release - will block the release +* P3 - post v1.0 +* untriaged - anything without a Priority/PX label will be considered untriaged \ No newline at end of file -- cgit v1.2.3 From 5390d1bf88c6255d8e94333420ed9124ca17231f Mon Sep 17 00:00:00 2001 From: goltermann Date: Tue, 2 Dec 2014 14:54:57 -0800 Subject: Create pull-requests.md --- pull-requests.md | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) create mode 100644 pull-requests.md diff --git a/pull-requests.md b/pull-requests.md new file mode 100644 index 00000000..ed12b839 --- /dev/null +++ b/pull-requests.md @@ -0,0 +1,16 @@ +Pull Request Process +==================== + +An overview of how we will manage old or out-of-date pull requests. + +Process +------- + +We will close any pull requests older than two weeks. + +Exceptions can be made for PRs that have active review comments, or that are awaiting other dependent PRs. Closed pull requests are easy to recreate, and little work is lost by closing a pull request that subsequently needs to be reopened. + +We want to limit the total number of PRs in flight to: +* Maintain a clean project +* Remove old PRs that would be difficult to rebase as the underlying code has changed over time +* Encourage code velocity -- cgit v1.2.3 From 5de98eeb18e6714216edc35bb6fb9fe220e7878b Mon Sep 17 00:00:00 2001 From: Sam Ghods Date: Sun, 30 Nov 2014 21:31:52 -0800 Subject: Remove unused YAML tags and GetYAML/SetYAML methods Unneeded after move to ghodss/yaml. --- namespaces.md | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/namespaces.md b/namespaces.md index b80c6825..761daa1a 100644 --- a/namespaces.md +++ b/namespaces.md @@ -48,14 +48,14 @@ A *Namespace* is a DNS compatible subdomain. ``` // TypeMeta is shared by all objects sent to, or returned from the client type TypeMeta struct { - Kind string `json:"kind,omitempty" yaml:"kind,omitempty"` - Uid string `json:"uid,omitempty" yaml:"uid,omitempty"` - CreationTimestamp util.Time `json:"creationTimestamp,omitempty" yaml:"creationTimestamp,omitempty"` - SelfLink string `json:"selfLink,omitempty" yaml:"selfLink,omitempty"` - ResourceVersion uint64 `json:"resourceVersion,omitempty" yaml:"resourceVersion,omitempty"` - APIVersion string `json:"apiVersion,omitempty" yaml:"apiVersion,omitempty"` - Namespace string `json:"namespace,omitempty" yaml:"namespace,omitempty"` - Name string `json:"name,omitempty" yaml:"name,omitempty"` + Kind string `json:"kind,omitempty"` + Uid string `json:"uid,omitempty"` + CreationTimestamp util.Time `json:"creationTimestamp,omitempty"` + SelfLink string `json:"selfLink,omitempty"` + ResourceVersion uint64 `json:"resourceVersion,omitempty"` + APIVersion string `json:"apiVersion,omitempty"` + Namespace string `json:"namespace,omitempty"` + Name string `json:"name,omitempty"` } ``` -- cgit v1.2.3 From e95c87269a04c1d9ac0ae1bd87fdfa1ae39184a7 Mon Sep 17 00:00:00 2001 From: Brendan Burns Date: Mon, 8 Dec 2014 16:01:35 -0800 Subject: Expand e2e instructions. --- development.md | 19 ++++++++++++++++++- 1 file changed, 18 insertions(+), 1 deletion(-) diff --git a/development.md b/development.md index 3b831ef4..14a4df87 100644 --- a/development.md +++ b/development.md @@ -115,7 +115,24 @@ Pressing control-C should result in an orderly shutdown but if something goes wr go run e2e.go --down ``` -See the flag definitions in `hack/e2e.go` for more options, such as reusing an existing cluster. +See the flag definitions in `hack/e2e.go` for more options, such as reusing an existing cluster, here is an overview: + +```sh +# Create a fresh cluster. Deletes a cluster first, if it exists +go run e2e.go --up + +# Test if a cluster is up. +go run e2e.go --isup + +# Push code to an existing cluster +go run e2e.go --push + +# Run all tests +go run e2e.go --test + +# Run tests matching a glob. +go run e2e.go --tests=... +``` ## Testing out flaky tests [Instructions here](docs/devel/flaky-tests.md) -- cgit v1.2.3 From c6a38a5921afbff7ec0f0af5fc2e031bbeb8e69f Mon Sep 17 00:00:00 2001 From: Brendan Burns Date: Mon, 8 Dec 2014 19:48:02 -0800 Subject: address comments. --- development.md | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/development.md b/development.md index 14a4df87..81695803 100644 --- a/development.md +++ b/development.md @@ -115,9 +115,13 @@ Pressing control-C should result in an orderly shutdown but if something goes wr go run e2e.go --down ``` +### Flag options See the flag definitions in `hack/e2e.go` for more options, such as reusing an existing cluster, here is an overview: ```sh +# Build binaries for testing +go run e2e.go --build + # Create a fresh cluster. Deletes a cluster first, if it exists go run e2e.go --up @@ -127,6 +131,9 @@ go run e2e.go --isup # Push code to an existing cluster go run e2e.go --push +# Push to an existing cluster, or bring up a cluster if it's down. +go run e2e.go --pushup + # Run all tests go run e2e.go --test @@ -134,6 +141,22 @@ go run e2e.go --test go run e2e.go --tests=... ``` +### Combining flags +```sh +# Flags can be combined, and their actions will take place in this order: +# -build, -push|-up|-pushup, -test|-tests=..., -down +# e.g.: +go run e2e.go -build -pushup -test -down + +# -v (verbose) can be added if you want streaming output instead of only +# seeing the output of failed commands. + +# -ctl can be used to quickly call kubectl against your e2e cluster. Useful for +# cleaning up after a failed test or viewing logs. +go run e2e.go -ctl='get events' +go run e2e.go -ctl='delete pod foobar' +``` + ## Testing out flaky tests [Instructions here](docs/devel/flaky-tests.md) -- cgit v1.2.3 From cfaed2e3d0677c44d39db16aa96d3a5c25bdbfbb Mon Sep 17 00:00:00 2001 From: MikeJeffrey Date: Fri, 12 Dec 2014 11:05:30 -0800 Subject: Create README.md in docs/devel --- README.md | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) create mode 100644 README.md diff --git a/README.md b/README.md new file mode 100644 index 00000000..82804564 --- /dev/null +++ b/README.md @@ -0,0 +1,19 @@ +# Developing Kubernetes + +Docs in this directory relate to developing Kubernetes. + +* **On Collaborative Development** ([collab.md](collab.md)): info on pull requests and code reviews. + +* **Development Guide** ([development.md](development.md)): Setting up your environment; tests. + +* **Hunting flaky tests** ([flaky-tests.md](flaky-tests.md)): We have a goal of 99.9% flake free tests. + Here's how to run your tests many times. + +* **GitHub Issues** ([issues.md](issues.md)): How incoming issues are reviewed and prioritized. + +* **Logging Conventions** ([logging.md](logging.md)]: Glog levels. + +* **Pull Request Process** ([pull-requests.md](pull-requests.md)): When and why pull requests are closed. + +* **Releasing Kubernetes** ([releasing.md](releasing.md)): How to create a Kubernetes release (as in version) + and how the version information gets embedded into the built binaries. -- cgit v1.2.3 From ada3dfce7d8fc274fb12958d3b5f36036e203b80 Mon Sep 17 00:00:00 2001 From: derekwaynecarr Date: Wed, 19 Nov 2014 10:17:12 -0500 Subject: Admission control proposal --- admission_control.md | 145 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 145 insertions(+) create mode 100644 admission_control.md diff --git a/admission_control.md b/admission_control.md new file mode 100644 index 00000000..60e10198 --- /dev/null +++ b/admission_control.md @@ -0,0 +1,145 @@ +# Kubernetes Proposal - Admission Control + +**Related PR:** + +| Topic | Link | +| ----- | ---- | + +## Background + +High level goals: + +* Enable an easy-to-use mechanism to provide admission control to cluster +* Enable a provider to support multiple admission control strategies or author their own +* Ensure any rejected request can propagate errors back to the caller with why the request failed +* Enable usage of cluster resources to satisfy admission control criteria +* Enable admission controller criteria to change without requiring restart of kube-apiserver + +Policy is focused on answering if a user is authorized to perform an action. + +Admission Control is focused on if the system will accept an authorized action. + +The Kubernetes cluster may choose to dismiss an authorized action based on any number of admission control strategies they choose to author and deploy: + +1. Quota enforcement of allocated desired usage +2. Pod black-lister to restrict running specific images on the cluster +3. Privileged container checker +4. Host port reservation +5. Volume validation - e.g. may or may not use hostDir, etc. +6. Min/max constraint checker for pod requested resources +7. ... + +This proposal therefore attempts to enumerate the basic design, and describe how any number of admission controllers could be injected. + +## kube-apiserver + +The kube-apiserver takes the following OPTIONAL arguments to enable admission control + +| Option | Behavior | +| ------ | -------- | +| admission_controllers | List of addresses (ip:port, dns name) to invoke for admission control | +| admission_controller_service | Service label selector to resolve for admission control (namespace/labelKey/labelValue) | + +If the list of addresses to invoke for admission control are provided as a label selector, the kube-apiserver will update the list +of admission control services at a regular interval. + +Upon an incoming request, the kube-apiserver performs the following basic flow: + +1. Authorize the request, if authorized, continue +2. Invoke the Admission Control REST API for each defined address, if all return true, continue +3. RESTStorage processes request +4. Data is persisted in store + +If there is no configured admission control address, then by default, all requests are admitted. + +Admission control is enforced on POST/PUT operations, but is ignored on GET/DELETE operations. + +## Admission Control REST API + +An admission controller satisfies a stable REST API invoked by the kube-apiserver to satisfy requests. + +| Action | HTTP Verb | Path | Description | +| ---- | ---- | ---- | ---- | +| CREATE | POST | /admissionController | Send a request for admission to evaluate for admittance or denial | + +The message body to the admissionController includes the following: + +1. requesting user identity +2. action to perform +3. proposed resource to create/modify (if any) + +If the request for admission is satisfied, return a HTTP 200. + +If the request for admission is denied, return a HTTP 403, the response must include a reason for why the response failed. + +## System Design + +The following demonstrates potential cluster setups using an external list of admission control endpoints. + + Request + + + | + | + +---------------|----------+ + | API Server | | + |---------------|----------| + | v | + | +--------+ | + | | Policy | | +---------------------+---+ + | ++-------+ | |Endpoints | + +---------+ | | | |-------------------------| + |Scheduler|<---+| v | |E1. Quota Enforcer | + +---------+ | +----------------------+ | |E2. Capacity Planner | + | | Admission Controller +-------->|E3. White-lister | + | +----+-----------------+ | |... | + | | | +-------------------------+ + | +v-------------+ | + | | REST Storage | | + | +--------------+ | + +----------------+---------+ + | + v + +--------------+ + | Data Store | + |--------------| + | | + | | + | | + +--------------+ + +The following demonstrates potential cluster setup that uses services to fulfill admission control. + +In this context, the cluster itself is used to provide HA admission control, and pods may choose to +invoke the API Server to determine if a request is or is not admissible. + + + Request +--------+ + + |Pods... | + | |--------| + | | <-------+ + +---------------|----------+ | | | + | API Server | | | | | + |---------------|----------| +---+----+ | + | v | | | + | +--------+ |<---------------+ | + | | Policy | | +-------------------------------------+ + | ++-------+ | |Service (ns=infra, labelKey=admitter)| + +---------+ | | | |-------------------------------------| + |Scheduler|<---+| v | |Service1 | + +---------+ | +----------------------+ | |Service2 | + | | Admission Controller +-------->|Service3 | + | +----+-----------------+ | |... | + | | | +-------------------------------------+ + | +v-------------+ | + | | REST Storage | | + | +--------------+ | + +----------------+---------+ + | + v + +--------------+ + | Data Store | + |--------------| + | | + | | + | | + +--------------+ \ No newline at end of file -- cgit v1.2.3 From 7cc008cd36abab4dfabc12ed2b31de94326e1256 Mon Sep 17 00:00:00 2001 From: Daniel Smith Date: Tue, 16 Dec 2014 14:36:39 -0800 Subject: fix godep instructions --- development.md | 46 +++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 41 insertions(+), 5 deletions(-) diff --git a/development.md b/development.md index 81695803..c73f339c 100644 --- a/development.md +++ b/development.md @@ -48,13 +48,49 @@ export PATH=$PATH:$GOPATH/bin ``` ### Using godep -Here is a quick summary of `godep`. `godep` helps manage third party dependencies by copying known versions into Godeps/_workspace. You can use `godep` in three ways: +Here is a quick summary of `godep`. `godep` helps manage third party dependencies by copying known versions into Godeps/_workspace. Here is the recommended way to set up your system. There are other ways that may work, but this is the easiest one I know of. -1. Use `godep` to call your `go` commands. For example: `godep go test ./...` -2. Use `godep` to modify your `$GOPATH` so that other tools know where to find the dependencies. Specifically: `export GOPATH=$GOPATH:$(godep path)` -3. Use `godep` to copy the saved versions of packages into your `$GOPATH`. This is done with `godep restore`. +1. Devote a directory to this endeavor: -We recommend using options #1 or #2. +``` +export KPATH=$HOME/code/kubernetes +mkdir -p $KPATH/src/github.com/GoogleCloudPlatform/kubernetes +cd $KPATH/src/github.com/GoogleCloudPlatform/kubernetes +git clone https://path/to/your/fork . +# Or copy your existing local repo here. IMPORTANT: making a symlink doesn't work. +``` + +2. Set up your GOPATH. + +``` +# Option A: this will let your builds see packages that exist elsewhere on your system. +export GOPATH=$KPATH:$GOPATH +# Option B: This will *not* let your local builds see packages that exist elsewhere on your system. +export GOPATH=$KPATH +# Option B is recommended if you're going to mess with the dependencies. +``` + +3. Populate your new $GOPATH. + +``` +cd $KPATH/src/github.com/GoogleCloudPlatform/kubernetes +godep restore +``` + +4. To add a dependency, you can do ```go get path/to/dependency``` as usual. + +5. To package up a dependency, do + +``` +cd $KPATH/src/github.com/GoogleCloudPlatform/kubernetes +godep save ./... +# Sanity check that your Godeps.json file is ok by re-restoring: +godep restore +``` + +I (lavalamp) have sometimes found it expedient to manually fix the /Godeps/godeps.json file to minimize the changes. + +Please send dependency updates in separate commits within your PR, for easier reviewing. ## Hooks -- cgit v1.2.3 From 14464583f81e60a36ef50082822aebf9fabe5ca3 Mon Sep 17 00:00:00 2001 From: derekwaynecarr Date: Mon, 15 Dec 2014 14:32:32 -0500 Subject: Version 2.0 of proposal --- admission_control.md | 171 ++++++++++++++++----------------------------------- 1 file changed, 53 insertions(+), 118 deletions(-) diff --git a/admission_control.md b/admission_control.md index 60e10198..e3f04894 100644 --- a/admission_control.md +++ b/admission_control.md @@ -4,6 +4,7 @@ | Topic | Link | | ----- | ---- | +| Separate validation from RESTStorage | https://github.com/GoogleCloudPlatform/kubernetes/issues/2977 | ## Background @@ -12,24 +13,16 @@ High level goals: * Enable an easy-to-use mechanism to provide admission control to cluster * Enable a provider to support multiple admission control strategies or author their own * Ensure any rejected request can propagate errors back to the caller with why the request failed -* Enable usage of cluster resources to satisfy admission control criteria -* Enable admission controller criteria to change without requiring restart of kube-apiserver -Policy is focused on answering if a user is authorized to perform an action. +Authorization via policy is focused on answering if a user is authorized to perform an action. Admission Control is focused on if the system will accept an authorized action. -The Kubernetes cluster may choose to dismiss an authorized action based on any number of admission control strategies they choose to author and deploy: +Kubernetes may choose to dismiss an authorized action based on any number of admission control strategies. -1. Quota enforcement of allocated desired usage -2. Pod black-lister to restrict running specific images on the cluster -3. Privileged container checker -4. Host port reservation -5. Volume validation - e.g. may or may not use hostDir, etc. -6. Min/max constraint checker for pod requested resources -7. ... +This proposal documents the basic design, and describes how any number of admission control plug-ins could be injected. -This proposal therefore attempts to enumerate the basic design, and describe how any number of admission controllers could be injected. +Implementation of specific admission control strategies are handled in separate documents. ## kube-apiserver @@ -37,109 +30,51 @@ The kube-apiserver takes the following OPTIONAL arguments to enable admission co | Option | Behavior | | ------ | -------- | -| admission_controllers | List of addresses (ip:port, dns name) to invoke for admission control | -| admission_controller_service | Service label selector to resolve for admission control (namespace/labelKey/labelValue) | - -If the list of addresses to invoke for admission control are provided as a label selector, the kube-apiserver will update the list -of admission control services at a regular interval. - -Upon an incoming request, the kube-apiserver performs the following basic flow: - -1. Authorize the request, if authorized, continue -2. Invoke the Admission Control REST API for each defined address, if all return true, continue -3. RESTStorage processes request -4. Data is persisted in store - -If there is no configured admission control address, then by default, all requests are admitted. - -Admission control is enforced on POST/PUT operations, but is ignored on GET/DELETE operations. - -## Admission Control REST API - -An admission controller satisfies a stable REST API invoked by the kube-apiserver to satisfy requests. - -| Action | HTTP Verb | Path | Description | -| ---- | ---- | ---- | ---- | -| CREATE | POST | /admissionController | Send a request for admission to evaluate for admittance or denial | - -The message body to the admissionController includes the following: - -1. requesting user identity -2. action to perform -3. proposed resource to create/modify (if any) - -If the request for admission is satisfied, return a HTTP 200. - -If the request for admission is denied, return a HTTP 403, the response must include a reason for why the response failed. - -## System Design - -The following demonstrates potential cluster setups using an external list of admission control endpoints. - - Request - + - | - | - +---------------|----------+ - | API Server | | - |---------------|----------| - | v | - | +--------+ | - | | Policy | | +---------------------+---+ - | ++-------+ | |Endpoints | - +---------+ | | | |-------------------------| - |Scheduler|<---+| v | |E1. Quota Enforcer | - +---------+ | +----------------------+ | |E2. Capacity Planner | - | | Admission Controller +-------->|E3. White-lister | - | +----+-----------------+ | |... | - | | | +-------------------------+ - | +v-------------+ | - | | REST Storage | | - | +--------------+ | - +----------------+---------+ - | - v - +--------------+ - | Data Store | - |--------------| - | | - | | - | | - +--------------+ - -The following demonstrates potential cluster setup that uses services to fulfill admission control. - -In this context, the cluster itself is used to provide HA admission control, and pods may choose to -invoke the API Server to determine if a request is or is not admissible. - - - Request +--------+ - + |Pods... | - | |--------| - | | <-------+ - +---------------|----------+ | | | - | API Server | | | | | - |---------------|----------| +---+----+ | - | v | | | - | +--------+ |<---------------+ | - | | Policy | | +-------------------------------------+ - | ++-------+ | |Service (ns=infra, labelKey=admitter)| - +---------+ | | | |-------------------------------------| - |Scheduler|<---+| v | |Service1 | - +---------+ | +----------------------+ | |Service2 | - | | Admission Controller +-------->|Service3 | - | +----+-----------------+ | |... | - | | | +-------------------------------------+ - | +v-------------+ | - | | REST Storage | | - | +--------------+ | - +----------------+---------+ - | - v - +--------------+ - | Data Store | - |--------------| - | | - | | - | | - +--------------+ \ No newline at end of file +| admission_control | Comma-delimited, ordered list of admission control choices to invoke prior to modifying or deleting an object. | +| admission_control_config_file | File with admission control configuration parameters to boot-strap plug-in. | + +An **AdmissionControl** plug-in is an implementation of the following interface: + +``` +package admission + +// Attributes is an interface used by a plug-in to make an admission decision on a individual request. +type Attributes interface { + GetClient() client.Interface + GetNamespace() string + GetKind() string + GetOperation() string + GetObject() runtime.Object +} + +// Interface is an abstract, pluggable interface for Admission Control decisions. +type Interface interface { + // Admit makes an admission decision based on the request attributes + // An error is returned if it denies the request. + Admit(a Attributes) (err error) +} +``` + +A **plug-in** must be compiled with the binary, and is registered as an available option by providing a name, and implementation +of admission.Interface. + +``` +func init() { + admission.RegisterPlugin("AlwaysDeny", func(config io.Reader) (admission.Interface, error) { return NewAlwaysDeny(), nil }) +} +``` + +Invocation of admission control is handled by the **APIServer** and not individual **RESTStorage** implementations. + +This design assumes that **Issue 297** is adopted, and as a consequence, the general framework of the APIServer request/response flow +will ensure the following: + +1. Incoming request +2. Authenticate user +3. Authorize user +4. If operation=create|update, then validate(object) +5. If operation=create|update|delete, then admissionControl.AdmissionControl(requestAttributes) + a. invoke each admission.Interface object in sequence +6. Object is persisted + +If at any step, there is an error, the request is canceled. -- cgit v1.2.3 From c02a212f141e56d816f7c252dce911b52ac65d30 Mon Sep 17 00:00:00 2001 From: Sergey Evstifeev Date: Mon, 22 Dec 2014 15:53:32 +0100 Subject: Fix broken flaky-tests.md documentation link The link ended up pointing at ./docs/devel/docs/devel/flaky-tests.md instead of .docs/devel/flaky-tests.md --- development.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/development.md b/development.md index c73f339c..8cccbcd9 100644 --- a/development.md +++ b/development.md @@ -194,7 +194,7 @@ go run e2e.go -ctl='delete pod foobar' ``` ## Testing out flaky tests -[Instructions here](docs/devel/flaky-tests.md) +[Instructions here](flaky-tests.md) ## Add/Update dependencies -- cgit v1.2.3 From 606dcf108b265a6a886ac025b076789b9fbf27ff Mon Sep 17 00:00:00 2001 From: Clayton Coleman Date: Mon, 11 Aug 2014 21:23:37 -0400 Subject: Proposal: Isolate kubelet from etcd Discusses the current security risks posed by the kubelet->etcd pattern and discusses some options. Triggered by #846 and referenced in #859 --- isolation_between_nodes_and_master.md | 48 +++++++++++++++++++++++++++++++++++ 1 file changed, 48 insertions(+) create mode 100644 isolation_between_nodes_and_master.md diff --git a/isolation_between_nodes_and_master.md b/isolation_between_nodes_and_master.md new file mode 100644 index 00000000..a91927d8 --- /dev/null +++ b/isolation_between_nodes_and_master.md @@ -0,0 +1,48 @@ +# Design: Limit direct access to etcd from within Kubernetes + +All nodes have effective access of "root" on the entire Kubernetes cluster today because they have access to etcd, the central data store. The kubelet, the service proxy, and the nodes themselves have a connection to etcd that can be used to read or write any data in the system. In a cluster with many hosts, any container or user that gains the ability to write to the network device that can reach etcd, on any host, also gains that access. + +* The Kubelet and Kube Proxy currently rely on an efficient "wait for changes over HTTP" interface get their current state and avoid missing changes + * This interface is implemented by etcd as the "watch" operation on a given key containing useful data + + +## Options: + +1. Do nothing +2. Introduce an HTTP proxy that limits the ability of nodes to access etcd + 1. Prevent writes of data from the kubelet + 2. Prevent reading data not associated with the client responsibilities + 3. Introduce a security token granting access +3. Introduce an API on the apiserver that returns the data a node Kubelet and Kube Proxy needs + 1. Remove the ability of nodes to access etcd via network configuration + 2. Provide an alternate implementation for the event writing code Kubelet + 3. Implement efficient "watch for changes over HTTP" to offer comparable function with etcd + 4. Ensure that the apiserver can scale at or above the capacity of the etcd system. + 5. Implement authorization scoping for the nodes that limits the data they can view +4. Implement granular access control in etcd + 1. Authenticate HTTP clients with client certificates, tokens, or BASIC auth and authorize them for read only access + 2. Allow read access of certain subpaths based on what the requestor's tokens are + + +## Evaluation: + +Option 1 would be considered unacceptable for deployment in a multi-tenant or security conscious environment. It would be acceptable in a low security deployment where all software is trusted. It would be acceptable in proof of concept environments on a single machine. + +Option 2 would require implementing an http proxy that for 2-1 could block POST/PUT/DELETE requests (and potentially HTTP method tunneling parameters accepted by etcd). 2-2 would be more complicated and would require filtering operations based on deep understanding of the etcd API *and* the underlying schema. It would be possible, but involve extra software. + +Option 3 would involve extending the existing apiserver to return pods associated with a given node over an HTTP "watch for changes" mechanism, which is already implemented. Proper security would involve checking that the caller is authorized to access that data - one imagines a per node token, key, or SSL certificate that could be used to authenticate and then authorize access to only the data belonging to that node. The current event publishing mechanism from the kubelet would also need to be replaced with a secure API endpoint or a change to a polling model. The apiserver would also need to be able to function in a horizontally scalable mode by changing or fixing the "operations" queue to work in a stateless, scalable model. In practice, the amount of traffic even a large Kubernetes deployment would drive towards an apiserver would be tens of requests per second (500 hosts, 1 request per host every minute) which is negligible if well implemented. Implementing this would also decouple the data store schema from the nodes, allowing a different data store technology to be added in the future without affecting existing nodes. This would also expose that data to other consumers for their own purposes (monitoring, implementing service discovery). + +Option 4 would involve extending etcd to [support access control](https://github.com/coreos/etcd/issues/91). Administrators would need to authorize nodes to connect to etcd, and expose network routability directly to etcd. The mechanism for handling this authentication and authorization would be different than the authorization used by Kubernetes controllers and API clients. It would not be possible to completely replace etcd as a data store without also implementing a new Kubelet config endpoint. + + +## Preferred solution: + +Implement the first parts of option 3 - an efficient watch API for the pod, service, and endpoints data for the Kubelet and Kube Proxy. Authorization and authentication are planned in the future - when a solution is available, implement a custom authorization scope that allows API access to be restricted to only the data about a single node or the service endpoint data. + +In general, option 4 is desirable in addition to option 3 as a mechanism to further secure the store to infrastructure components that must access it. + + +## Caveats + +In all four options, compromise of a host will allow an attacker to imitate that host. For attack vectors that are reproducible from inside containers (privilege escalation), an attacker can distribute himself to other hosts by requesting new containers be spun up. In scenario 1, the cluster is totally compromised immediately. In 2-1, the attacker can view all information about the cluster including keys or authorization data defined with pods. In 2-2 and 3, the attacker must still distribute himself in order to get access to a large subset of information, and cannot see other data that is potentially located in etcd like side storage or system configuration. For attack vectors that are not exploits, but instead allow network access to etcd, an attacker in 2ii has no ability to spread his influence, and is instead restricted to the subset of information on the host. For 3-5, they can do nothing they could not do already (request access to the nodes / services endpoint) because the token is not visible to them on the host. + -- cgit v1.2.3 From d45be03704bdc6a74ca4a43edbadace9cfd0c586 Mon Sep 17 00:00:00 2001 From: Alex Robinson Date: Tue, 6 Jan 2015 18:05:33 +0000 Subject: Minor doc/comment fixes that came up while reading through some code. --- development.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/development.md b/development.md index 8cccbcd9..2b7476e8 100644 --- a/development.md +++ b/development.md @@ -161,6 +161,9 @@ go run e2e.go --build # Create a fresh cluster. Deletes a cluster first, if it exists go run e2e.go --up +# Create a fresh cluster at a specific release version. +go run e2e.go --up --version=0.7.0 + # Test if a cluster is up. go run e2e.go --isup -- cgit v1.2.3 From 84569936d964a0e2ad2ddeaf76001979f635d9b7 Mon Sep 17 00:00:00 2001 From: Joe Beda Date: Wed, 7 Jan 2015 12:35:38 -0800 Subject: Design doc for clustering. This is related to #2303 and steals from #2435. --- clustering.md | 56 +++++++++++++++++++++++++++++++++++++++++++++ clustering/.gitignore | 1 + clustering/Makefile | 16 +++++++++++++ clustering/README.md | 9 ++++++++ clustering/dynamic.png | Bin 0 -> 87530 bytes clustering/dynamic.seqdiag | 24 +++++++++++++++++++ clustering/static.png | Bin 0 -> 45845 bytes clustering/static.seqdiag | 16 +++++++++++++ 8 files changed, 122 insertions(+) create mode 100644 clustering.md create mode 100644 clustering/.gitignore create mode 100644 clustering/Makefile create mode 100644 clustering/README.md create mode 100644 clustering/dynamic.png create mode 100644 clustering/dynamic.seqdiag create mode 100644 clustering/static.png create mode 100644 clustering/static.seqdiag diff --git a/clustering.md b/clustering.md new file mode 100644 index 00000000..659bed7d --- /dev/null +++ b/clustering.md @@ -0,0 +1,56 @@ +# Clustering in Kubernetes + + +## Overview +The term "clustering" refers to the process of having all members of the kubernetes cluster find and trust each other. There are multiple different ways to achieve clustering with different security and usability profiles. This document attempts to lay out the user experiences for clustering that Kubernetes aims to address. + +Once a cluster is established, the following is true: + +1. **Master -> Node** The master needs to know which nodes can take work and what their current status is wrt capacity. + 1. **Location** The master knows the name and location of all of the nodes in the cluster. + 2. **Target AuthN** A way to securely talk to the kubelet on that node. Currently we call out to the kubelet over HTTP. This should be over HTTPS and the master should know what CA to trust for that node. + 3. **Caller AuthN/Z** Currently, this is only used to collect statistics as authorization isn't critical. This may change in the future though. +2. **Node -> Master** The nodes currently talk to the master to know which pods have been assigned to them and to publish events. + 1. **Location** The nodes must know where the master is at. + 2. **Target AuthN** Since the master is assigning work to the nodes, it is critical that they verify whom they are talking to. + 3. **Caller AuthN/Z** The nodes publish events and so must be authenticated to the master. Ideally this authentication is specific to each node so that authorization can be narrowly scoped. The details of the work to run (including things like environment variables) might be considered sensitive and should be locked down also. + +## Current Implementation + +A central authority (generally the master) is responsible for determining the set of machines which are members of the cluster. Calls to create and remove worker nodes in the cluster are restricted to this single authority, and any other requests to add or remove worker nodes are rejected. (1.i). + +Communication from the master to nodes is currently over HTTP and is not secured or authenticated in any way. (1.ii, 1.iii). + +The location of the master is communicated out of band to the nodes. For GCE, this is done via Salt. Other cluster instructions/scripts use other methods. (2.i) + +Currently most communication from the node to the master is over HTTP. When it is done over HTTPS there is currently no verification of the cert of the master (2.ii). + +Currently, the node/kubelet is authenticated to the master via a token shared across all nodes. This token is distributed out of band (using Salt for GCE) and is optional. If it is not present then the kubelet is unable to publish events to the master. (2.iii) + +Our current mix of out of band communication doesn't meet all of our needs from a security point of view and is difficult to set up and configure. + +## Proposed Solution + +The proposed solution will provide a range of options for setting up and maintaining a secure Kubernetes cluster. We want to both allow for centrally controlled systems (leveraging pre-existing trust and configuration systems) or more ad-hoc automagic systems that are incredibly easy to set up. + +The building blocks of an easier solution: + +* **Move to TLS** We will move to using TLS for all intra-cluster communication. We will work to explicitly distributing and trusting the CAs that should be trusted for each link. We will also use client certificates for all AuthN. +* [optional] **API driven CA** Optionally, we will run a CA in the master that will mint certificates for the nodes/kubelets. There will be pluggable policies that will automatically approve certificate requests here as appropriate. + * **CA approval policy** This is a pluggable policy object that can automatically approve CA signing requests. Stock policies will include `always-reject`, `queue` and `insecure-always-approve`. With `queue` there would be an API for evaluating and accepting/rejecting requests. Cloud providers could implement a policy here that verifies other out of band information and automatically approves/rejects based on other external factors. +* **Scoped Kubelet Accounts** These accounts are per-minion and (optionally) give a minion permission to register itself. +* [optional] **Bootstrap API endpoint** This is a helper service hosted outside of the Kubernetes cluster that helps with initial discovery of the master. + +### Static Clustering + +In this sequence diagram there is out of band admin entity that is creating all certificates and distributing them. It is also making sure that the kubelets know where to find the master. This provides for a lot of control but is more difficult to set up as lots of information must be communicated outside of Kubernetes. + +![Static Sequence Diagram](clustering/static.png) + +### Dynamic Clustering + +This diagram dynamic clustering using the bootstrap API endpoint. That API endpoint is used to both find the location of the master and communicate the root CA for the master. + +This flow has the admin manually approving the kubelet signing requests. This is the `queue` policy defined above.This manual intervention could be replaced by code that can verify the signing requests via other means. + +![Dynamic Sequence Diagram](clustering/dynamic.png) diff --git a/clustering/.gitignore b/clustering/.gitignore new file mode 100644 index 00000000..67bcd6cb --- /dev/null +++ b/clustering/.gitignore @@ -0,0 +1 @@ +DroidSansMono.ttf diff --git a/clustering/Makefile b/clustering/Makefile new file mode 100644 index 00000000..3f95bc07 --- /dev/null +++ b/clustering/Makefile @@ -0,0 +1,16 @@ +FONT := DroidSansMono.ttf + +PNGS := $(patsubst %.seqdiag,%.png,$(wildcard *.seqdiag)) + +.PHONY: all +all: $(PNGS) + +.PHONY: watch +watch: + fswatch *.seqdiag | xargs -n 1 sh -c "make || true" + +$(FONT): + curl -sLo $@ https://googlefontdirectory.googlecode.com/hg/apache/droidsansmono/DroidSansMono.ttf + +%.png: %.seqdiag $(FONT) + seqdiag -a -f '$(FONT)' $< diff --git a/clustering/README.md b/clustering/README.md new file mode 100644 index 00000000..04abb1bc --- /dev/null +++ b/clustering/README.md @@ -0,0 +1,9 @@ +This directory contains diagrams for the clustering design doc. + +This depends on the `seqdiag` [utility](http://blockdiag.com/en/seqdiag/index.html). Assuming you have a non-borked python install, this should be installable with + +```bash +pip install seqdiag +``` + +Just call `make` to regenerate the diagrams. diff --git a/clustering/dynamic.png b/clustering/dynamic.png new file mode 100644 index 00000000..9f2ff9db Binary files /dev/null and b/clustering/dynamic.png differ diff --git a/clustering/dynamic.seqdiag b/clustering/dynamic.seqdiag new file mode 100644 index 00000000..95bb395e --- /dev/null +++ b/clustering/dynamic.seqdiag @@ -0,0 +1,24 @@ +seqdiag { + activation = none; + + + user[label = "Admin User"]; + bootstrap[label = "Bootstrap API\nEndpoint"]; + master; + kubelet[stacked]; + + user -> bootstrap [label="createCluster", return="cluster ID"]; + user <-- bootstrap [label="returns\n- bootstrap-cluster-uri"]; + + user ->> master [label="start\n- bootstrap-cluster-uri"]; + master => bootstrap [label="setMaster\n- master-location\n- master-ca"]; + + user ->> kubelet [label="start\n- bootstrap-cluster-uri"]; + kubelet => bootstrap [label="get-master", return="returns\n- master-location\n- master-ca"]; + kubelet ->> master [label="signCert\n- unsigned-kubelet-cert", return="retuns\n- kubelet-cert"]; + user => master [label="getSignRequests"]; + user => master [label="approveSignRequests"]; + kubelet <<-- master [label="returns\n- kubelet-cert"]; + + kubelet => master [label="register\n- kubelet-location"] +} diff --git a/clustering/static.png b/clustering/static.png new file mode 100644 index 00000000..a01ebbe8 Binary files /dev/null and b/clustering/static.png differ diff --git a/clustering/static.seqdiag b/clustering/static.seqdiag new file mode 100644 index 00000000..bdc54b76 --- /dev/null +++ b/clustering/static.seqdiag @@ -0,0 +1,16 @@ +seqdiag { + activation = none; + + admin[label = "Manual Admin"]; + ca[label = "Manual CA"] + master; + kubelet[stacked]; + + admin => ca [label="create\n- master-cert"]; + admin ->> master [label="start\n- ca-root\n- master-cert"]; + + admin => ca [label="create\n- kubelet-cert"]; + admin ->> kubelet [label="start\n- ca-root\n- kubelet-cert\n- master-location"]; + + kubelet => master [label="register\n- kubelet-location"]; +} -- cgit v1.2.3 From 5c7bc51c532fe12fb14a4838b02c23998a69802c Mon Sep 17 00:00:00 2001 From: derekwaynecarr Date: Thu, 8 Jan 2015 11:15:40 -0500 Subject: Update design doc with final PR merge --- admission_control.md | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/admission_control.md b/admission_control.md index e3f04894..88afda73 100644 --- a/admission_control.md +++ b/admission_control.md @@ -40,7 +40,6 @@ package admission // Attributes is an interface used by a plug-in to make an admission decision on a individual request. type Attributes interface { - GetClient() client.Interface GetNamespace() string GetKind() string GetOperation() string @@ -60,7 +59,7 @@ of admission.Interface. ``` func init() { - admission.RegisterPlugin("AlwaysDeny", func(config io.Reader) (admission.Interface, error) { return NewAlwaysDeny(), nil }) + admission.RegisterPlugin("AlwaysDeny", func(client client.Interface, config io.Reader) (admission.Interface, error) { return NewAlwaysDeny(), nil }) } ``` @@ -73,7 +72,7 @@ will ensure the following: 2. Authenticate user 3. Authorize user 4. If operation=create|update, then validate(object) -5. If operation=create|update|delete, then admissionControl.AdmissionControl(requestAttributes) +5. If operation=create|update|delete, then admission.Admit(requestAttributes) a. invoke each admission.Interface object in sequence 6. Object is persisted -- cgit v1.2.3 From 59e0bba24631462700ad9db6b41fecc730a807e7 Mon Sep 17 00:00:00 2001 From: Joe Beda Date: Fri, 9 Jan 2015 09:11:26 -0800 Subject: Tweaks based on comments --- clustering.md | 8 ++++++-- clustering/Makefile | 2 +- 2 files changed, 7 insertions(+), 3 deletions(-) diff --git a/clustering.md b/clustering.md index 659bed7d..f447ef10 100644 --- a/clustering.md +++ b/clustering.md @@ -8,13 +8,16 @@ Once a cluster is established, the following is true: 1. **Master -> Node** The master needs to know which nodes can take work and what their current status is wrt capacity. 1. **Location** The master knows the name and location of all of the nodes in the cluster. + * For the purposes of this doc, location and name should be enough information so that the master can open a TCP connection to the Node. Most probably we will make this either an IP address or a DNS name. It is going to be important to be consistent here (master must be able to reach kubelet on that DNS name) so that we can verify certificates appropriately. 2. **Target AuthN** A way to securely talk to the kubelet on that node. Currently we call out to the kubelet over HTTP. This should be over HTTPS and the master should know what CA to trust for that node. - 3. **Caller AuthN/Z** Currently, this is only used to collect statistics as authorization isn't critical. This may change in the future though. + 3. **Caller AuthN/Z** This would be the master verifying itself (and permissions) when calling the node. Currently, this is only used to collect statistics as authorization isn't critical. This may change in the future though. 2. **Node -> Master** The nodes currently talk to the master to know which pods have been assigned to them and to publish events. 1. **Location** The nodes must know where the master is at. 2. **Target AuthN** Since the master is assigning work to the nodes, it is critical that they verify whom they are talking to. 3. **Caller AuthN/Z** The nodes publish events and so must be authenticated to the master. Ideally this authentication is specific to each node so that authorization can be narrowly scoped. The details of the work to run (including things like environment variables) might be considered sensitive and should be locked down also. +**Note:** While the description here refers to a singular Master, in the future we should enable multiple Masters operating in an HA mode. While the "Master" is currently the combination of the API Server, Scheduler and Controller Manager, we will restrict ourselves to thinking about the main API and policy engine -- the API Server. + ## Current Implementation A central authority (generally the master) is responsible for determining the set of machines which are members of the cluster. Calls to create and remove worker nodes in the cluster are restricted to this single authority, and any other requests to add or remove worker nodes are rejected. (1.i). @@ -35,10 +38,11 @@ The proposed solution will provide a range of options for setting up and maintai The building blocks of an easier solution: -* **Move to TLS** We will move to using TLS for all intra-cluster communication. We will work to explicitly distributing and trusting the CAs that should be trusted for each link. We will also use client certificates for all AuthN. +* **Move to TLS** We will move to using TLS for all intra-cluster communication. We will explicitly idenitfy the trust chain (the set of trusted CAs) as opposed to trusting the system CAs. We will also use client certificates for all AuthN. * [optional] **API driven CA** Optionally, we will run a CA in the master that will mint certificates for the nodes/kubelets. There will be pluggable policies that will automatically approve certificate requests here as appropriate. * **CA approval policy** This is a pluggable policy object that can automatically approve CA signing requests. Stock policies will include `always-reject`, `queue` and `insecure-always-approve`. With `queue` there would be an API for evaluating and accepting/rejecting requests. Cloud providers could implement a policy here that verifies other out of band information and automatically approves/rejects based on other external factors. * **Scoped Kubelet Accounts** These accounts are per-minion and (optionally) give a minion permission to register itself. + * To start with, we'd have the kubelets generate a cert/account in the form of `kubelet:`. To start we would then hard code policy such that we give that particular account appropriate permissions. Over time, we can make the policy engine more generic. * [optional] **Bootstrap API endpoint** This is a helper service hosted outside of the Kubernetes cluster that helps with initial discovery of the master. ### Static Clustering diff --git a/clustering/Makefile b/clustering/Makefile index 3f95bc07..c4095421 100644 --- a/clustering/Makefile +++ b/clustering/Makefile @@ -10,7 +10,7 @@ watch: fswatch *.seqdiag | xargs -n 1 sh -c "make || true" $(FONT): - curl -sLo $@ https://googlefontdirectory.googlecode.com/hg/apache/droidsansmono/DroidSansMono.ttf + curl -sLo $@ https://googlefontdirectory.googlecode.com/hg/apache/droidsansmono/$(FONT).ttf %.png: %.seqdiag $(FONT) seqdiag -a -f '$(FONT)' $< -- cgit v1.2.3 From 3c3d2468b90ea9050b40a4ae97eae7b8d8c16c22 Mon Sep 17 00:00:00 2001 From: Alex Robinson Date: Fri, 9 Jan 2015 22:38:14 +0000 Subject: Update the doc on how to test for flakiness to actually work and to use kubectl. --- flaky-tests.md | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/flaky-tests.md b/flaky-tests.md index d2cc8fad..ccd32afb 100644 --- a/flaky-tests.md +++ b/flaky-tests.md @@ -12,6 +12,8 @@ There is a testing image ```brendanburns/flake``` up on the docker hub. We will Create a replication controller with the following config: ```yaml id: flakeController +kind: ReplicationController +apiVersion: v1beta1 desiredState: replicas: 24 replicaSelector: @@ -37,14 +39,14 @@ labels: name: flake ``` -```./cluster/kubecfg.sh -c controller.yaml create replicaControllers``` +```./cluster/kubectl.sh create -f controller.yaml``` This will spin up 100 instances of the test. They will run to completion, then exit, the kubelet will restart them, eventually you will have sufficient -runs for your purposes, and you can stop the replication controller: +runs for your purposes, and you can stop the replication controller by setting the ```replicas``` field to 0 and then running: ```sh -./cluster/kubecfg.sh stop flakeController -./cluster/kubecfg.sh rm flakeController +./cluster/kubectl.sh update -f controller.yaml +./cluster/kubectl.sh delete -f controller.yaml ``` Now examine the machines with ```docker ps -a``` and look for tasks that exited with non-zero exit codes (ignore those that exited -1, since that's what happens when you stop the replica controller) -- cgit v1.2.3 From 43612a093e26501be9cbea1150c4bae42925cf51 Mon Sep 17 00:00:00 2001 From: Deyuan Deng Date: Sat, 10 Jan 2015 17:24:20 -0500 Subject: Doc fixes --- development.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/development.md b/development.md index 2b7476e8..bd18b828 100644 --- a/development.md +++ b/development.md @@ -137,6 +137,10 @@ godep go tool cover -html=target/c.out ## Integration tests You need an [etcd](https://github.com/coreos/etcd/releases/tag/v0.4.6) in your path, please make sure it is installed and in your ``$PATH``. +``` +cd kubernetes +hack/test-integration.sh +``` ## End-to-End tests -- cgit v1.2.3 From ce93f7027812035c48e20a539ff9d74940f284bd Mon Sep 17 00:00:00 2001 From: Brendan Burns Date: Wed, 14 Jan 2015 21:27:13 -0800 Subject: Add a gendocs pre-submit hook. --- development.md | 11 ++--------- 1 file changed, 2 insertions(+), 9 deletions(-) diff --git a/development.md b/development.md index bd18b828..0e2f6fbf 100644 --- a/development.md +++ b/development.md @@ -238,16 +238,9 @@ git fetch upstream git rebase upstream/master ``` -## Regenerating the API documentation +## Regenerating the CLI documentation ``` -cd kubernetes/api -sudo docker build -t kubernetes/raml2html . -sudo docker run --name="docgen" kubernetes/raml2html -sudo docker cp docgen:/data/kubernetes.html . +hack/run-gendocs.sh ``` -View the API documentation using htmlpreview (works on your fork, too): -``` -http://htmlpreview.github.io/?https://github.com/GoogleCloudPlatform/kubernetes/blob/master/api/kubernetes.html -``` -- cgit v1.2.3 From 61d146fd7faae5047f02c46f8454a186a8d99daf Mon Sep 17 00:00:00 2001 From: Parth Oberoi Date: Tue, 20 Jan 2015 05:02:13 +0530 Subject: typo fixed --- collab.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/collab.md b/collab.md index c4644048..633b7682 100644 --- a/collab.md +++ b/collab.md @@ -12,7 +12,7 @@ For the time being, most of the people working on this project are in the US and ## Code reviews -All changes must be code reviewed. For non-maintainers this is obvious, since you can't commit anyway. But even for maintainers, we want all changes to get at least one review, preferably from someone who knows the areas the change touches. For non-trivial changes we may want two reviewers. The primary reviewer will make this decision and nominate a second reviewer, if needed. Except for trivial changes, PRs should sit for at least a 2 hours to allow for wider review. +All changes must be code reviewed. For non-maintainers this is obvious, since you can't commit anyway. But even for maintainers, we want all changes to get at least one review, preferably from someone who knows the areas the change touches. For non-trivial changes we may want two reviewers. The primary reviewer will make this decision and nominate a second reviewer, if needed. Except for trivial changes, PRs should sit for at least 2 hours to allow for wider review. Most PRs will find reviewers organically. If a maintainer intends to be the primary reviewer of a PR they should set themselves as the assignee on GitHub and say so in a reply to the PR. Only the primary reviewer of a change should actually do the merge, except in rare cases (e.g. they are unavailable in a reasonable timeframe). -- cgit v1.2.3 From bab87d954eded80b96f38ba9f38c4d3a32fd15d7 Mon Sep 17 00:00:00 2001 From: Clayton Coleman Date: Tue, 20 Jan 2015 13:55:17 -0500 Subject: Clarify name must be lowercase in docs, to match code We restrict DNS_SUBDOMAIN to lowercase for sanity. --- identifiers.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/identifiers.md b/identifiers.md index 1c0660c6..260c237a 100644 --- a/identifiers.md +++ b/identifiers.md @@ -12,10 +12,10 @@ Name : A non-empty string guaranteed to be unique within a given scope at a particular time; used in resource URLs; provided by clients at creation time and encouraged to be human friendly; intended to facilitate creation idempotence and space-uniqueness of singleton objects, distinguish distinct entities, and reference particular entities across operations. [rfc1035](http://www.ietf.org/rfc/rfc1035.txt)/[rfc1123](http://www.ietf.org/rfc/rfc1123.txt) label (DNS_LABEL) -: An alphanumeric (a-z, A-Z, and 0-9) string, with a maximum length of 63 characters, with the '-' character allowed anywhere except the first or last character, suitable for use as a hostname or segment in a domain name +: An alphanumeric (a-z, and 0-9) string, with a maximum length of 63 characters, with the '-' character allowed anywhere except the first or last character, suitable for use as a hostname or segment in a domain name [rfc1035](http://www.ietf.org/rfc/rfc1035.txt)/[rfc1123](http://www.ietf.org/rfc/rfc1123.txt) subdomain (DNS_SUBDOMAIN) -: One or more rfc1035/rfc1123 labels separated by '.' with a maximum length of 253 characters +: One or more lowercase rfc1035/rfc1123 labels separated by '.' with a maximum length of 253 characters [rfc4122](http://www.ietf.org/rfc/rfc4122.txt) universally unique identifier (UUID) : A 128 bit generated value that is extremely unlikely to collide across time and space and requires no central coordination -- cgit v1.2.3 From 972ee6e91f7a4be56e06490a8d8eb8cca2bc8eeb Mon Sep 17 00:00:00 2001 From: Parth Oberoi Date: Wed, 21 Jan 2015 14:29:56 +0530 Subject: typo fixed ';' unexpected ';' after environment on line 7 --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 82804564..ab41448d 100644 --- a/README.md +++ b/README.md @@ -4,7 +4,7 @@ Docs in this directory relate to developing Kubernetes. * **On Collaborative Development** ([collab.md](collab.md)): info on pull requests and code reviews. -* **Development Guide** ([development.md](development.md)): Setting up your environment; tests. +* **Development Guide** ([development.md](development.md)): Setting up your environment tests. * **Hunting flaky tests** ([flaky-tests.md](flaky-tests.md)): We have a goal of 99.9% flake free tests. Here's how to run your tests many times. -- cgit v1.2.3 From d0eebeeb6c173c6c2ad74d84cc2468b20a4d3d1f Mon Sep 17 00:00:00 2001 From: derekwaynecarr Date: Thu, 18 Dec 2014 13:58:23 -0500 Subject: Resource controller proposal --- resource_controller.md | 231 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 231 insertions(+) create mode 100644 resource_controller.md diff --git a/resource_controller.md b/resource_controller.md new file mode 100644 index 00000000..2150f6dc --- /dev/null +++ b/resource_controller.md @@ -0,0 +1,231 @@ +# Kubernetes Proposal: ResourceController + +**Related PR:** + +| Topic | Link | +| ----- | ---- | +| Admission Control Proposal | https://github.com/GoogleCloudPlatform/kubernetes/pull/2501 | +| Separate validation from RESTStorage | https://github.com/GoogleCloudPlatform/kubernetes/issues/2977 | + +## Background + +This document proposes a system for enforcing resource limits as part of admission control. + +## Model Changes + +A new resource, **ResourceController**, is introduced to enumerate resource usage constraints scoped to a Kubernetes namespace. + +Authorized users are able to set the **ResourceController.Spec** fields to enumerate desired constraints. + +``` +// ResourceController is an enumerated set of resources constraints enforced as part admission control plug-in +type ResourceController struct { + TypeMeta `json:",inline"` + ObjectMeta `json:"metadata,omitempty"` + // Spec represents the imposed constraints for allowed resources + Spec ResourceControllerSpec `json:"spec,omitempty"` + // Status represents the observed allocated resources to inform constraints + Status ResourceControllerStatus `json:"status,omitempty"` +} + +type ResourceControllerSpec struct { + // Allowed represents the available resources allowed in a quota + Allowed ResourceList `json:"allowed,omitempty"` +} + +type ResourceControllerStatus struct { + // Allowed represents the available resources allowed in a quota + Allowed ResourceList `json:"allowed,omitempty"` + // Allocated represents the allocated resources leveraged against your quota + Allocated ResourceList `json:"allocated,omitempty"` +} + +// ResourceControllerList is a collection of resource controllers. +type ResourceControllerList struct { + TypeMeta `json:",inline"` + ListMeta `json:"metadata,omitempty"` + Items []ResourceController `json:"items"` +} +``` + +Authorized users are able to provide a **ResourceObservation** to control a **ResourceController.Status**. + +``` +// ResourceObservation is written by a resource-controller to update ResourceController.Status +type ResourceObservation struct { + TypeMeta `json:",inline"` + ObjectMeta `json:"metadata,omitempty"` + + // Status represents the observed allocated resources to inform constraints + Status ResourceControllerStatus `json:"status,omitempty"` +} +``` + +## AdmissionControl plugin: ResourceLimits + +The **ResourceLimits** plug-in introspects all incoming admission requests. + +It makes decisions by introspecting the incoming object, current status, and enumerated constraints on **ResourceController**. + +The following constraints are proposed as enforceable: + +| Key | Type | Description | +| ------ | -------- | -------- | +| kubernetes.io/namespace/pods | int | Maximum number of pods per namespace | +| kubernetes.io/namespace/replicationControllers | int | Maximum number of replicationControllers per namespace | +| kubernetes.io/namespace/services | int | Maximum number of services per namespace | +| kubernetes.io/pods/containers | int | Maximum number of containers per pod | +| kubernetes.io/pods/containers/memory/max | int | Maximum amount of memory per container in a pod | +| kubernetes.io/pods/containers/memory/min | int | Minimum amount of memory per container in a pod | +| kubernetes.io/pods/containers/cpu/max | int | Maximum amount of CPU per container in a pod | +| kubernetes.io/pods/containers/cpu/min | int | Minimum amount of CPU per container in a pod | +| kubernetes.io/pods/cpu/max | int | Maximum CPU usage across all containers per pod | +| kubernetes.io/pods/cpu/min | int | Minimum CPU usage across all containers per pod | +| kubernetes.io/pods/memory/max | int | Maximum memory usage across all containers in pod | +| kubernetes.io/pods/memory/min | int | Minimum memory usage across all containers in pod | +| kubernetes.io/replicationController/replicas | int | Maximum number of replicas per replication controller | + +If the incoming resource would cause a violation of the enumerated constraints, the request is denied with a set of +messages explaining what constraints were the source of the denial. + +If a constraint is not enumerated by a **ResourceController** it is not tracked. + +If a constraint spans resources, for example, it tracks the total number of some **kind** in a **namespace**, +the plug-in will post a **ResourceObservation** with the new incremented **Allocated*** usage for that constraint +using a compare-and-swap to ensure transactional integrity. It is possible that the allocated usage will be persisted +on a create operation, but the create can fail later in the request flow for some other unknown reason. For this scenario, +the allocated usage will appear greater than the actual usage, the **kube-resource-controller** is responsible for +synchronizing the observed allocated usage with actual usage. For delete requests, we will not decrement usage right away, +and will always rely on the **kube-resource-controller** to bring the observed value in line. This is needed until +etcd supports atomic transactions across multiple resources. + +## kube-apiserver + +The server is updated to be aware of **ResourceController** and **ResourceObservation** objects. + +The constraints are only enforced if the kube-apiserver is started as follows: + +``` +$ kube-apiserver -admission_control=ResourceLimits +``` + +## kube-resource-controller + +This is a new daemon that observes **ResourceController** objects in the cluster, and updates their status with current cluster state. + +The daemon runs a synchronization loop to do the following: + +For each resource controller, perform the following steps: + + 1. Reconcile **ResourceController.Status.Allowed** with **ResourceController.Spec.Allowed** + 2. Reconcile **ResourceController.Status.Allocated** with constraints enumerated in **ResourceController.Status.Allowed** + 3. If there was a change, atomically update **ResourceObservation** to force an update to **ResourceController.Status** + +At step 1, allow the **kube-resource-controller** to support an administrator supplied override to enforce that what the +set of constraints desired to not conflict with any configured global constraints. For example, do not let +a **kubernetes.io/pods/memory/max** for any pod in any namespace exceed 8GB. These global constraints could be supplied +via an alternate location in **etcd**, for example, a **ResourceController** in an **infra** namespace that is populated on +bootstrap. + +At step 2, for fields that track total number of {kind} in a namespace, we query the cluster to ensure that the observed status +is in-line with the actual tracked status. This is a stop-gap to etcd supporting transactions across resource updates to ensure +that when a resource is deleted, we can update the observed status. + +## kubectl + +kubectl is modified to support the **ResourceController** resource. + +```kubectl describe``` provides a human-readable output of current constraints and usage in the namespace. + +For example, + +``` +$ kubectl namespace myspace +$ kubectl create -f examples/resource-controller/resource-controller.json +$ kubectl get resourceControllers +NAME LABELS +limits +$ kubectl describe resourceController limits +Name: limits +Key Enforced Allocated +---- ----- ---- +Max pods 15 13 +Max replication controllers 2 2 +Max services 5 0 +Max containers per pod 2 0 +Max replica size 10 0 +... +``` + +## Scenario: How this works in practice + +Admin user wants to impose resource constraints in namespace ```dev``` to enforce the following: + +1. A pod cannot use more than 8GB of RAM +2. The namespace cannot run more than 100 pods at a time. + +To enforce this constraint, the Admin does the following: + +``` +$ cat resource-controller.json +{ + "id": "limits", + "kind": "ResourceController", + "apiVersion": "v1beta1", + "spec": { + "allowed": { + "kubernetes.io/namespace/pods": 100, + "kubernetes.io/pods/memory/max": 8000, + } + }, + "labels": {} +} +$ kubectl namespace dev +$ kubectl create -f resource-controller.json +``` + +The **kube-resource-controller** sees that a new **ResourceController** resource was created, and updates its +status with the current observations in the namespace. + +The Admin describes the resource controller to see the current status: + +``` +$ kubectl describe resourceController limits +Name: limits +Key Enforced Allocated +---- ----- ---- +Max pods 100 50 +Max memory per pod 8000 4000 +```` + +The Admin sees that the current ```dev``` namespace is using 50 pods, and the largest pod consumes 4GB of RAM. + +The Developer that uses this namespace uses the system until he discovers he has exceeded his limits: + +``` +$ kubectl namespace dev +$ kubectl create -f pod.json +Unable to create pod. You have exceeded your max pods in the namespace of 100. +``` + +or + +``` +$ kubectl namespace dev +$ kubectl create -f pod.json +Unable to create pod. It exceeds the max memory usage per pod of 8000 MB. +``` + +The Developer can observe his constraints as appropriate: +``` +$ kubectl describe resourceController limits +Name: limits +Key Enforced Allocated +---- ----- ---- +Max pods 100 100 +Max memory per pod 8000 4000 +```` + +And as a consequence reduce his current number of running pods, or memory requirements of the pod to proceed. +Or he could contact the Admin for his namespace to allocate him more resources. + -- cgit v1.2.3 From 1203b0e6e4d4b83672ea999f158ef77cb9d7ad6b Mon Sep 17 00:00:00 2001 From: derekwaynecarr Date: Thu, 22 Jan 2015 22:31:28 -0500 Subject: Design document for LimitRange --- admission_control_limit_range.md | 122 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 122 insertions(+) create mode 100644 admission_control_limit_range.md diff --git a/admission_control_limit_range.md b/admission_control_limit_range.md new file mode 100644 index 00000000..69fe144b --- /dev/null +++ b/admission_control_limit_range.md @@ -0,0 +1,122 @@ +# Admission control plugin: LimitRanger + +## Background + +This document proposes a system for enforcing min/max limits per resource as part of admission control. + +## Model Changes + +A new resource, **LimitRange**, is introduced to enumerate min/max limits for a resource type scoped to a +Kubernetes namespace. + +``` +const ( + // Limit that applies to all pods in a namespace + LimitTypePod string = "Pod" + // Limit that applies to all containers in a namespace + LimitTypeContainer string = "Container" +) + +// LimitRangeItem defines a min/max usage limit for any resource that matches on kind +type LimitRangeItem struct { + // Type of resource that this limit applies to + Type string `json:"type,omitempty"` + // Max usage constraints on this kind by resource name + Max ResourceList `json:"max,omitempty"` + // Min usage constraints on this kind by resource name + Min ResourceList `json:"min,omitempty"` +} + +// LimitRangeSpec defines a min/max usage limit for resources that match on kind +type LimitRangeSpec struct { + // Limits is the list of LimitRangeItem objects that are enforced + Limits []LimitRangeItem `json:"limits"` +} + +// LimitRange sets resource usage limits for each kind of resource in a Namespace +type LimitRange struct { + TypeMeta `json:",inline"` + ObjectMeta `json:"metadata,omitempty"` + + // Spec defines the limits enforced + Spec LimitRangeSpec `json:"spec,omitempty"` +} + +// LimitRangeList is a list of LimitRange items. +type LimitRangeList struct { + TypeMeta `json:",inline"` + ListMeta `json:"metadata,omitempty"` + + // Items is a list of LimitRange objects + Items []LimitRange `json:"items"` +} +``` + +## AdmissionControl plugin: LimitRanger + +The **LimitRanger** plug-in introspects all incoming admission requests. + +It makes decisions by evaluating the incoming object against all defined **LimitRange** objects in the request context namespace. + +The following min/max limits are imposed: + +**Type: Container** + +| ResourceName | Description | +| ------------ | ----------- | +| cpu | Min/Max amount of cpu per container | +| memory | Min/Max amount of memory per container | + +**Type: Pod** + +| ResourceName | Description | +| ------------ | ----------- | +| cpu | Min/Max amount of cpu per pod | +| memory | Min/Max amount of memory per pod | + +If the incoming object would cause a violation of the enumerated constraints, the request is denied with a set of +messages explaining what constraints were the source of the denial. + +If a constraint is not enumerated by a **LimitRange** it is not tracked. + +## kube-apiserver + +The server is updated to be aware of **LimitRange** objects. + +The constraints are only enforced if the kube-apiserver is started as follows: + +``` +$ kube-apiserver -admission_control=LimitRanger +``` + +## kubectl + +kubectl is modified to support the **LimitRange** resource. + +```kubectl describe``` provides a human-readable output of limits. + +For example, + +``` +$ kubectl namespace myspace +$ kubectl create -f examples/limitrange/limit-range.json +$ kubectl get limits +NAME +limits +$ kubectl describe limits limits +Name: limits +Type Resource Min Max +---- -------- --- --- +Pod memory 1Mi 1Gi +Pod cpu 250m 2 +Container cpu 250m 2 +Container memory 1Mi 1Gi +``` + +## Future Enhancements: Define limits for a particular pod or container. + +In the current proposal, the **LimitRangeItem** matches purely on **LimitRangeItem.Type** + +It is expected we will want to define limits for particular pods or containers by name/uid and label/field selector. + +To make a **LimitRangeItem** more restrictive, we will intend to add these additional restrictions at a future point in time. -- cgit v1.2.3 From a44f8f8aaa9177f8f1cdf7e37e74437695fc36fd Mon Sep 17 00:00:00 2001 From: derekwaynecarr Date: Fri, 23 Jan 2015 12:38:59 -0500 Subject: ResourceQuota proposal --- admission_control_resource_quota.md | 146 ++++++++++++++++++++++++++++++++++++ 1 file changed, 146 insertions(+) create mode 100644 admission_control_resource_quota.md diff --git a/admission_control_resource_quota.md b/admission_control_resource_quota.md new file mode 100644 index 00000000..c5cc60c4 --- /dev/null +++ b/admission_control_resource_quota.md @@ -0,0 +1,146 @@ +# Admission control plugin: ResourceQuota + +## Background + +This document proposes a system for enforcing hard resource usage limits per namespace as part of admission control. + +## Model Changes + +A new resource, **ResourceQuota**, is introduced to enumerate hard resource limits in a Kubernetes namespace. + +A new resource, **ResourceQuotaUsage**, is introduced to support atomic updates of a **ResourceQuota** status. + +``` +// The following identify resource constants for Kubernetes object types +const ( + // Pods, number + ResourcePods ResourceName = "pods" + // Services, number + ResourceServices ResourceName = "services" + // ReplicationControllers, number + ResourceReplicationControllers ResourceName = "replicationcontrollers" + // ResourceQuotas, number + ResourceQuotas ResourceName = "resourcequotas" +) + +// ResourceQuotaSpec defines the desired hard limits to enforce for Quota +type ResourceQuotaSpec struct { + // Hard is the set of desired hard limits for each named resource + Hard ResourceList `json:"hard,omitempty"` +} + +// ResourceQuotaStatus defines the enforced hard limits and observed use +type ResourceQuotaStatus struct { + // Hard is the set of enforced hard limits for each named resource + Hard ResourceList `json:"hard,omitempty"` + // Used is the current observed total usage of the resource in the namespace + Used ResourceList `json:"used,omitempty"` +} + +// ResourceQuota sets aggregate quota restrictions enforced per namespace +type ResourceQuota struct { + TypeMeta `json:",inline"` + ObjectMeta `json:"metadata,omitempty"` + + // Spec defines the desired quota + Spec ResourceQuotaSpec `json:"spec,omitempty"` + + // Status defines the actual enforced quota and its current usage + Status ResourceQuotaStatus `json:"status,omitempty"` +} + +// ResourceQuotaUsage captures system observed quota status per namespace +// It is used to enforce atomic updates of a backing ResourceQuota.Status field in storage +type ResourceQuotaUsage struct { + TypeMeta `json:",inline"` + ObjectMeta `json:"metadata,omitempty"` + + // Status defines the actual enforced quota and its current usage + Status ResourceQuotaStatus `json:"status,omitempty"` +} + +// ResourceQuotaList is a list of ResourceQuota items +type ResourceQuotaList struct { + TypeMeta `json:",inline"` + ListMeta `json:"metadata,omitempty"` + + // Items is a list of ResourceQuota objects + Items []ResourceQuota `json:"items"` +} + +``` + +## AdmissionControl plugin: ResourceQuota + +The **ResourceQuota** plug-in introspects all incoming admission requests. + +It makes decisions by evaluating the incoming object against all defined **ResourceQuota.Status.Hard** resource limits in the request +namespace. If acceptance of the resource would cause the total usage of a named resource to exceed its hard limit, the request is denied. + +The following resource limits are imposed as part of core Kubernetes: + +| ResourceName | Description | +| ------------ | ----------- | +| cpu | Total cpu usage | +| memory | Total memory usage | +| pods | Total number of pods | +| services | Total number of services | +| replicationcontrollers | Total number of replication controllers | +| resourcequotas | Total number of resource quotas | + +Any resource that is not part of core Kubernetes must follow the resource naming convention prescribed by Kubernetes. + +This means the resource must have a fully-qualified name (i.e. mycompany.org/shinynewresource) + +If the incoming request does not cause the total usage to exceed any of the enumerated hard resource limits, the plug-in will post a +**ResourceQuotaUsage** document to the server to atomically update the observed usage based on the previously read +**ResourceQuota.ResourceVersion**. This keeps incremental usage atomically consistent, but does introduce a bottleneck (intentionally) +into the system. + +## kube-apiserver + +The server is updated to be aware of **ResourceQuota** objects. + +The quota is only enforced if the kube-apiserver is started as follows: + +``` +$ kube-apiserver -admission_control=ResourceQuota +``` + +## kube-controller-manager + +A new controller is defined that runs a synch loop to run usage stats across the namespace. + +If the observed usage is different than the recorded usage, the controller sends a **ResourceQuotaUsage** resource +to the server to atomically update. + +The synchronization loop frequency will control how quickly DELETE actions are recorded in the system and usage is ticked down. + +To optimize the synchronization loop, this controller will WATCH on Pod resources to track DELETE events, and in response, recalculate +usage. This is because a Pod deletion will have the most impact on observed cpu and memory usage in the system, and we anticipate +this being the resource most closely running at the prescribed quota limits. + +## kubectl + +kubectl is modified to support the **ResourceQuota** resource. + +```kubectl describe``` provides a human-readable output of quota. + +For example, + +``` +$ kubectl namespace myspace +$ kubectl create -f examples/resourcequota/resource-quota.json +$ kubectl get quota +NAME +myquota +$ kubectl describe quota myquota +Name: myquota +Resource Used Hard +-------- ---- ---- +cpu 100m 20 +memory 0 1.5Gb +pods 1 10 +replicationControllers 1 10 +services 2 3 +``` \ No newline at end of file -- cgit v1.2.3 From 24f580084eb3c2acf9a9ec7e83e6129cdc5065dc Mon Sep 17 00:00:00 2001 From: derekwaynecarr Date: Fri, 23 Jan 2015 12:39:53 -0500 Subject: Remove resource_controller proposal --- resource_controller.md | 231 ------------------------------------------------- 1 file changed, 231 deletions(-) delete mode 100644 resource_controller.md diff --git a/resource_controller.md b/resource_controller.md deleted file mode 100644 index 2150f6dc..00000000 --- a/resource_controller.md +++ /dev/null @@ -1,231 +0,0 @@ -# Kubernetes Proposal: ResourceController - -**Related PR:** - -| Topic | Link | -| ----- | ---- | -| Admission Control Proposal | https://github.com/GoogleCloudPlatform/kubernetes/pull/2501 | -| Separate validation from RESTStorage | https://github.com/GoogleCloudPlatform/kubernetes/issues/2977 | - -## Background - -This document proposes a system for enforcing resource limits as part of admission control. - -## Model Changes - -A new resource, **ResourceController**, is introduced to enumerate resource usage constraints scoped to a Kubernetes namespace. - -Authorized users are able to set the **ResourceController.Spec** fields to enumerate desired constraints. - -``` -// ResourceController is an enumerated set of resources constraints enforced as part admission control plug-in -type ResourceController struct { - TypeMeta `json:",inline"` - ObjectMeta `json:"metadata,omitempty"` - // Spec represents the imposed constraints for allowed resources - Spec ResourceControllerSpec `json:"spec,omitempty"` - // Status represents the observed allocated resources to inform constraints - Status ResourceControllerStatus `json:"status,omitempty"` -} - -type ResourceControllerSpec struct { - // Allowed represents the available resources allowed in a quota - Allowed ResourceList `json:"allowed,omitempty"` -} - -type ResourceControllerStatus struct { - // Allowed represents the available resources allowed in a quota - Allowed ResourceList `json:"allowed,omitempty"` - // Allocated represents the allocated resources leveraged against your quota - Allocated ResourceList `json:"allocated,omitempty"` -} - -// ResourceControllerList is a collection of resource controllers. -type ResourceControllerList struct { - TypeMeta `json:",inline"` - ListMeta `json:"metadata,omitempty"` - Items []ResourceController `json:"items"` -} -``` - -Authorized users are able to provide a **ResourceObservation** to control a **ResourceController.Status**. - -``` -// ResourceObservation is written by a resource-controller to update ResourceController.Status -type ResourceObservation struct { - TypeMeta `json:",inline"` - ObjectMeta `json:"metadata,omitempty"` - - // Status represents the observed allocated resources to inform constraints - Status ResourceControllerStatus `json:"status,omitempty"` -} -``` - -## AdmissionControl plugin: ResourceLimits - -The **ResourceLimits** plug-in introspects all incoming admission requests. - -It makes decisions by introspecting the incoming object, current status, and enumerated constraints on **ResourceController**. - -The following constraints are proposed as enforceable: - -| Key | Type | Description | -| ------ | -------- | -------- | -| kubernetes.io/namespace/pods | int | Maximum number of pods per namespace | -| kubernetes.io/namespace/replicationControllers | int | Maximum number of replicationControllers per namespace | -| kubernetes.io/namespace/services | int | Maximum number of services per namespace | -| kubernetes.io/pods/containers | int | Maximum number of containers per pod | -| kubernetes.io/pods/containers/memory/max | int | Maximum amount of memory per container in a pod | -| kubernetes.io/pods/containers/memory/min | int | Minimum amount of memory per container in a pod | -| kubernetes.io/pods/containers/cpu/max | int | Maximum amount of CPU per container in a pod | -| kubernetes.io/pods/containers/cpu/min | int | Minimum amount of CPU per container in a pod | -| kubernetes.io/pods/cpu/max | int | Maximum CPU usage across all containers per pod | -| kubernetes.io/pods/cpu/min | int | Minimum CPU usage across all containers per pod | -| kubernetes.io/pods/memory/max | int | Maximum memory usage across all containers in pod | -| kubernetes.io/pods/memory/min | int | Minimum memory usage across all containers in pod | -| kubernetes.io/replicationController/replicas | int | Maximum number of replicas per replication controller | - -If the incoming resource would cause a violation of the enumerated constraints, the request is denied with a set of -messages explaining what constraints were the source of the denial. - -If a constraint is not enumerated by a **ResourceController** it is not tracked. - -If a constraint spans resources, for example, it tracks the total number of some **kind** in a **namespace**, -the plug-in will post a **ResourceObservation** with the new incremented **Allocated*** usage for that constraint -using a compare-and-swap to ensure transactional integrity. It is possible that the allocated usage will be persisted -on a create operation, but the create can fail later in the request flow for some other unknown reason. For this scenario, -the allocated usage will appear greater than the actual usage, the **kube-resource-controller** is responsible for -synchronizing the observed allocated usage with actual usage. For delete requests, we will not decrement usage right away, -and will always rely on the **kube-resource-controller** to bring the observed value in line. This is needed until -etcd supports atomic transactions across multiple resources. - -## kube-apiserver - -The server is updated to be aware of **ResourceController** and **ResourceObservation** objects. - -The constraints are only enforced if the kube-apiserver is started as follows: - -``` -$ kube-apiserver -admission_control=ResourceLimits -``` - -## kube-resource-controller - -This is a new daemon that observes **ResourceController** objects in the cluster, and updates their status with current cluster state. - -The daemon runs a synchronization loop to do the following: - -For each resource controller, perform the following steps: - - 1. Reconcile **ResourceController.Status.Allowed** with **ResourceController.Spec.Allowed** - 2. Reconcile **ResourceController.Status.Allocated** with constraints enumerated in **ResourceController.Status.Allowed** - 3. If there was a change, atomically update **ResourceObservation** to force an update to **ResourceController.Status** - -At step 1, allow the **kube-resource-controller** to support an administrator supplied override to enforce that what the -set of constraints desired to not conflict with any configured global constraints. For example, do not let -a **kubernetes.io/pods/memory/max** for any pod in any namespace exceed 8GB. These global constraints could be supplied -via an alternate location in **etcd**, for example, a **ResourceController** in an **infra** namespace that is populated on -bootstrap. - -At step 2, for fields that track total number of {kind} in a namespace, we query the cluster to ensure that the observed status -is in-line with the actual tracked status. This is a stop-gap to etcd supporting transactions across resource updates to ensure -that when a resource is deleted, we can update the observed status. - -## kubectl - -kubectl is modified to support the **ResourceController** resource. - -```kubectl describe``` provides a human-readable output of current constraints and usage in the namespace. - -For example, - -``` -$ kubectl namespace myspace -$ kubectl create -f examples/resource-controller/resource-controller.json -$ kubectl get resourceControllers -NAME LABELS -limits -$ kubectl describe resourceController limits -Name: limits -Key Enforced Allocated ----- ----- ---- -Max pods 15 13 -Max replication controllers 2 2 -Max services 5 0 -Max containers per pod 2 0 -Max replica size 10 0 -... -``` - -## Scenario: How this works in practice - -Admin user wants to impose resource constraints in namespace ```dev``` to enforce the following: - -1. A pod cannot use more than 8GB of RAM -2. The namespace cannot run more than 100 pods at a time. - -To enforce this constraint, the Admin does the following: - -``` -$ cat resource-controller.json -{ - "id": "limits", - "kind": "ResourceController", - "apiVersion": "v1beta1", - "spec": { - "allowed": { - "kubernetes.io/namespace/pods": 100, - "kubernetes.io/pods/memory/max": 8000, - } - }, - "labels": {} -} -$ kubectl namespace dev -$ kubectl create -f resource-controller.json -``` - -The **kube-resource-controller** sees that a new **ResourceController** resource was created, and updates its -status with the current observations in the namespace. - -The Admin describes the resource controller to see the current status: - -``` -$ kubectl describe resourceController limits -Name: limits -Key Enforced Allocated ----- ----- ---- -Max pods 100 50 -Max memory per pod 8000 4000 -```` - -The Admin sees that the current ```dev``` namespace is using 50 pods, and the largest pod consumes 4GB of RAM. - -The Developer that uses this namespace uses the system until he discovers he has exceeded his limits: - -``` -$ kubectl namespace dev -$ kubectl create -f pod.json -Unable to create pod. You have exceeded your max pods in the namespace of 100. -``` - -or - -``` -$ kubectl namespace dev -$ kubectl create -f pod.json -Unable to create pod. It exceeds the max memory usage per pod of 8000 MB. -``` - -The Developer can observe his constraints as appropriate: -``` -$ kubectl describe resourceController limits -Name: limits -Key Enforced Allocated ----- ----- ---- -Max pods 100 100 -Max memory per pod 8000 4000 -```` - -And as a consequence reduce his current number of running pods, or memory requirements of the pod to proceed. -Or he could contact the Admin for his namespace to allocate him more resources. - -- cgit v1.2.3 From 89f9224cc11190711f35230398dac3c30f590a06 Mon Sep 17 00:00:00 2001 From: derekwaynecarr Date: Fri, 23 Jan 2015 12:41:44 -0500 Subject: Doc tweaks --- admission_control_resource_quota.md | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/admission_control_resource_quota.md b/admission_control_resource_quota.md index c5cc60c4..08bc6bec 100644 --- a/admission_control_resource_quota.md +++ b/admission_control_resource_quota.md @@ -77,7 +77,7 @@ The **ResourceQuota** plug-in introspects all incoming admission requests. It makes decisions by evaluating the incoming object against all defined **ResourceQuota.Status.Hard** resource limits in the request namespace. If acceptance of the resource would cause the total usage of a named resource to exceed its hard limit, the request is denied. -The following resource limits are imposed as part of core Kubernetes: +The following resource limits are imposed as part of core Kubernetes at the namespace level: | ResourceName | Description | | ------------ | ----------- | @@ -97,6 +97,10 @@ If the incoming request does not cause the total usage to exceed any of the enum **ResourceQuota.ResourceVersion**. This keeps incremental usage atomically consistent, but does introduce a bottleneck (intentionally) into the system. +To optimize system performance, it is encouraged that all resource quotas are tracked on the same **ResourceQuota** document. As a result, +its encouraged to actually impose a cap on the total number of individual quotas that are tracked in the **Namespace** to 1 by explicitly +capping it in **ResourceQuota** document. + ## kube-apiserver The server is updated to be aware of **ResourceQuota** objects. @@ -109,7 +113,9 @@ $ kube-apiserver -admission_control=ResourceQuota ## kube-controller-manager -A new controller is defined that runs a synch loop to run usage stats across the namespace. +A new controller is defined that runs a synch loop to calculate quota usage across the namespace. + +**ResourceQuota** usage is only calculated if a namespace has a **ResourceQuota** object. If the observed usage is different than the recorded usage, the controller sends a **ResourceQuotaUsage** resource to the server to atomically update. -- cgit v1.2.3 From d48dabc0cd457f54117b4fae601e0a0415a0fb6c Mon Sep 17 00:00:00 2001 From: Victor Marmol Date: Fri, 23 Jan 2015 15:51:01 -0800 Subject: Update developer docs to use hack/ for e2e. --- development.md | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/development.md b/development.md index 0e2f6fbf..f4770db3 100644 --- a/development.md +++ b/development.md @@ -152,7 +152,7 @@ hack/e2e-test.sh Pressing control-C should result in an orderly shutdown but if something goes wrong and you still have some VMs running you can force a cleanup with this command: ``` -go run e2e.go --down +go run hack/e2e.go --down ``` ### Flag options @@ -160,28 +160,28 @@ See the flag definitions in `hack/e2e.go` for more options, such as reusing an e ```sh # Build binaries for testing -go run e2e.go --build +go run hack/e2e.go --build # Create a fresh cluster. Deletes a cluster first, if it exists -go run e2e.go --up +go run hack/e2e.go --up # Create a fresh cluster at a specific release version. -go run e2e.go --up --version=0.7.0 +go run hack/e2e.go --up --version=0.7.0 # Test if a cluster is up. -go run e2e.go --isup +go run hack/e2e.go --isup # Push code to an existing cluster -go run e2e.go --push +go run hack/e2e.go --push # Push to an existing cluster, or bring up a cluster if it's down. -go run e2e.go --pushup +go run hack/e2e.go --pushup # Run all tests -go run e2e.go --test +go run hack/e2e.go --test # Run tests matching a glob. -go run e2e.go --tests=... +go run hack/e2e.go --tests=... ``` ### Combining flags -- cgit v1.2.3 From f7b6bd0a26a9fae8d3b90378d02ca6a3f7b2c548 Mon Sep 17 00:00:00 2001 From: Joe Beda Date: Mon, 26 Jan 2015 10:34:44 -0800 Subject: Small tweaks to sequence diagram generation. Fix up name of font download and no transparency so it is easier to iterate. --- clustering/Makefile | 4 ++-- clustering/dynamic.png | Bin 87530 -> 72373 bytes clustering/static.png | Bin 45845 -> 36583 bytes 3 files changed, 2 insertions(+), 2 deletions(-) diff --git a/clustering/Makefile b/clustering/Makefile index c4095421..298479f1 100644 --- a/clustering/Makefile +++ b/clustering/Makefile @@ -10,7 +10,7 @@ watch: fswatch *.seqdiag | xargs -n 1 sh -c "make || true" $(FONT): - curl -sLo $@ https://googlefontdirectory.googlecode.com/hg/apache/droidsansmono/$(FONT).ttf + curl -sLo $@ https://googlefontdirectory.googlecode.com/hg/apache/droidsansmono/$(FONT) %.png: %.seqdiag $(FONT) - seqdiag -a -f '$(FONT)' $< + seqdiag --no-transparency -a -f '$(FONT)' $< diff --git a/clustering/dynamic.png b/clustering/dynamic.png index 9f2ff9db..92b40fee 100644 Binary files a/clustering/dynamic.png and b/clustering/dynamic.png differ diff --git a/clustering/static.png b/clustering/static.png index a01ebbe8..bcdeca7e 100644 Binary files a/clustering/static.png and b/clustering/static.png differ -- cgit v1.2.3 From 050db5a2f886f39b18cfe36ea768976bb91fdf55 Mon Sep 17 00:00:00 2001 From: Joe Beda Date: Mon, 26 Jan 2015 13:50:26 -0800 Subject: Add Dockerfile for sequence diagram generation --- clustering/Dockerfile | 12 ++++++++++++ clustering/Makefile | 13 +++++++++++++ clustering/README.md | 17 +++++++++++++++++ 3 files changed, 42 insertions(+) create mode 100644 clustering/Dockerfile diff --git a/clustering/Dockerfile b/clustering/Dockerfile new file mode 100644 index 00000000..3353419d --- /dev/null +++ b/clustering/Dockerfile @@ -0,0 +1,12 @@ +FROM debian:jessie + +RUN apt-get update +RUN apt-get -qy install python-seqdiag make curl + +WORKDIR /diagrams + +RUN curl -sLo DroidSansMono.ttf https://googlefontdirectory.googlecode.com/hg/apache/droidsansmono/DroidSansMono.ttf + +ADD . /diagrams + +CMD bash -c 'make >/dev/stderr && tar cf - *.png' \ No newline at end of file diff --git a/clustering/Makefile b/clustering/Makefile index 298479f1..f6aa53ed 100644 --- a/clustering/Makefile +++ b/clustering/Makefile @@ -14,3 +14,16 @@ $(FONT): %.png: %.seqdiag $(FONT) seqdiag --no-transparency -a -f '$(FONT)' $< + +# Build the stuff via a docker image +.PHONY: docker +docker: + docker build -t clustering-seqdiag . + docker run --rm clustering-seqdiag | tar xvf - + +docker-clean: + docker rmi clustering-seqdiag || true + docker images -q --filter "dangling=true" | xargs docker rmi + +fix-clock-skew: + boot2docker ssh sudo date -u -D "%Y%m%d%H%M.%S" --set "$(shell date -u +%Y%m%d%H%M.%S)" diff --git a/clustering/README.md b/clustering/README.md index 04abb1bc..7e9d79c8 100644 --- a/clustering/README.md +++ b/clustering/README.md @@ -7,3 +7,20 @@ pip install seqdiag ``` Just call `make` to regenerate the diagrams. + +## Building with Docker +If you are on a Mac or your pip install is messed up, you can easily build with docker. + +``` +make docker +``` + +The first run will be slow but things should be fast after that. + +To clean up the docker containers that are created (and other cruft that is left around) you can run `make docker-clean`. + +If you are using boot2docker and get warnings about clock skew (or if things aren't building for some reason) then you can fix that up with `make fix-clock-skew`. + +## Automatically rebuild on file changes + +If you have the fswatch utility installed, you can have it monitor the file system and automatically rebuild when files have changed. Just do a `make watch`. \ No newline at end of file -- cgit v1.2.3 From c1937164730775dbadad1542ed4119a1f56e0494 Mon Sep 17 00:00:00 2001 From: Mrunal Patel Date: Wed, 28 Jan 2015 15:03:06 -0800 Subject: Replace "net" by "pod infra" in docs and format strings. --- networking.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/networking.md b/networking.md index 3f52d388..d90f56b1 100644 --- a/networking.md +++ b/networking.md @@ -62,7 +62,7 @@ Docker allocates IP addresses from a bridge we create on each node, using its - creates a new pair of veth devices and binds them to the netns - auto-assigns an IP from docker’s IP range -2. Create the user containers and specify the name of the network container as their “net” argument. Docker finds the PID of the command running in the network container and attaches to the netns of that PID. +2. Create the user containers and specify the name of the pod infra container as their “POD” argument. Docker finds the PID of the command running in the pod infra container and attaches to the netns and ipcns of that PID. ### Other networking implementation examples With the primary aim of providing IP-per-pod-model, other implementations exist to serve the purpose outside of GCE. @@ -77,7 +77,7 @@ Right now, docker inspect doesn't show the networking configuration of the conta ### External IP assignment -We want to be able to assign IP addresses externally from Docker ([Docker issue #6743](https://github.com/dotcloud/docker/issues/6743)) so that we don't need to statically allocate fixed-size IP ranges to each node, so that IP addresses can be made stable across network container restarts ([Docker issue #2801](https://github.com/dotcloud/docker/issues/2801)), and to facilitate pod migration. Right now, if the network container dies, all the user containers must be stopped and restarted because the netns of the network container will change on restart, and any subsequent user container restart will join that new netns, thereby not being able to see its peers. Additionally, a change in IP address would encounter DNS caching/TTL problems. External IP assignment would also simplify DNS support (see below). +We want to be able to assign IP addresses externally from Docker ([Docker issue #6743](https://github.com/dotcloud/docker/issues/6743)) so that we don't need to statically allocate fixed-size IP ranges to each node, so that IP addresses can be made stable across pod infra container restarts ([Docker issue #2801](https://github.com/dotcloud/docker/issues/2801)), and to facilitate pod migration. Right now, if the pod infra container dies, all the user containers must be stopped and restarted because the netns of the pod infra container will change on restart, and any subsequent user container restart will join that new netns, thereby not being able to see its peers. Additionally, a change in IP address would encounter DNS caching/TTL problems. External IP assignment would also simplify DNS support (see below). ### Naming, discovery, and load balancing -- cgit v1.2.3 From ab574621c1398afe67a4a58018bd46a4b1908a27 Mon Sep 17 00:00:00 2001 From: csrwng Date: Thu, 22 Jan 2015 09:32:30 -0500 Subject: [Proposal] Security Contexts --- security_context.md | 158 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 158 insertions(+) create mode 100644 security_context.md diff --git a/security_context.md b/security_context.md new file mode 100644 index 00000000..87d67aa7 --- /dev/null +++ b/security_context.md @@ -0,0 +1,158 @@ +# Security Contexts +## Abstract +A security context is a set of constraints that are applied to a container in order to achieve the following goals (from [security design](security.md)): + +1. Ensure a clear isolation between container and the underlying host it runs on +2. Limit the ability of the container to negatively impact the infrastructure or other containers + +## Background + +The problem of securing containers in Kubernetes has come up [before](https://github.com/GoogleCloudPlatform/kubernetes/issues/398) and the potential problems with container security are [well known](http://opensource.com/business/14/7/docker-security-selinux). Although it is not possible to completely isolate Docker containers from their hosts, new features like [user namespaces](https://github.com/docker/libcontainer/pull/304) make it possible to greatly reduce the attack surface. + +## Motivation + +### Container isolation + +In order to improve container isolation from host and other containers running on the host, containers should only be +granted the access they need to perform their work. To this end it should be possible to take advantage of Docker +features such as the ability to [add or remove capabilities](https://docs.docker.com/reference/run/#runtime-privilege-linux-capabilities-and-lxc-configuration) and [assign MCS labels](https://docs.docker.com/reference/run/#security-configuration) +to the container process. + +Support for user namespaces has recently been [merged](https://github.com/docker/libcontainer/pull/304) into Docker's libcontainer project and should soon surface in Docker itself. It will make it possible to assign a range of unprivileged uids and gids from the host to each container, improving the isolation between host and container and between containers. + +### External integration with shared storage +In order to support external integration with shared storage, processes running in a Kubernetes cluster +should be able to be uniquely identified by their Unix UID, such that a chain of ownership can be established. +Processes in pods will need to have consistent UID/GID/SELinux category labels in order to access shared disks. + +## Constraints and Assumptions +* It is out of the scope of this document to prescribe a specific set + of constraints to isolate containers from their host. Different use cases need different + settings. +* The concept of a security context should not be tied to a particular security mechanism or platform + (ie. SELinux, AppArmor) +* Applying a different security context to a scope (namespace or pod) requires a solution such as the one proposed for + [service accounts](https://github.com/GoogleCloudPlatform/kubernetes/pull/2297). + +## Use Cases + +In order of increasing complexity, following are example use cases that would +be addressed with security contexts: + +1. Kubernetes is used to run a single cloud application. In order to protect + nodes from containers: + * All containers run as a single non-root user + * Privileged containers are disabled + * All containers run with a particular MCS label + * Kernel capabilities like CHOWN and MKNOD are removed from containers + +2. Just like case #1, except that I have more than one application running on + the Kubernetes cluster. + * Each application is run in its own namespace to avoid name collisions + * For each application a different uid and MCS label is used + +3. Kubernetes is used as the base for a PAAS with + multiple projects, each project represented by a namespace. + * Each namespace is associated with a range of uids/gids on the node that + are mapped to uids/gids on containers using linux user namespaces. + * Certain pods in each namespace have special privileges to perform system + actions such as talking back to the server for deployment, run docker + builds, etc. + * External NFS storage is assigned to each namespace and permissions set + using the range of uids/gids assigned to that namespace. + +## Proposed Design + +### Overview +A *security context* consists of a set of constraints that determine how a container +is secured before getting created and run. It has a 1:1 correspondence to a +[service account](https://github.com/GoogleCloudPlatform/kubernetes/pull/2297). A *security context provider* is passed to the Kubelet so it can have a chance +to mutate Docker API calls in order to apply the security context. + +It is recommended that this design be implemented in two phases: + +1. Implement the security context provider extension point in the Kubelet + so that a default security context can be applied on container run and creation. +2. Implement a security context structure that is part of a service account. The + default context provider can then be used to apply a security context based + on the service account associated with the pod. + +### Security Context Provider + +The Kubelet will have an interface that points to a `SecurityContextProvider`. The `SecurityContextProvider` is invoked before creating and running a given container: + +```go +type SecurityContextProvider interface { + // ModifyContainerConfig is called before the Docker createContainer call. + // The security context provider can make changes to the Config with which + // the container is created. + // An error is returned if it's not possible to secure the container as + // requested with a security context. + ModifyContainerConfig(pod *api.BoundPod, container *api.Container, config *docker.Config) error + + // ModifyHostConfig is called before the Docker runContainer call. + // The security context provider can make changes to the HostConfig, affecting + // security options, whether the container is privileged, volume binds, etc. + // An error is returned if it's not possible to secure the container as requested + // with a security context. + ModifyHostConfig(pod *api.BoundPod, container *api.Container, hostConfig *docker.HostConfig) +} +``` +If the value of the SecurityContextProvider field on the Kubelet is nil, the kubelet will create and run the container as it does today. + +### Security Context + +A security context has a 1:1 correspondence to a service account and it can be included as +part of the service account resource. Following is an example of an initial implementation: + +```go +type SecurityContext struct { + // user is the uid to use when running the container + User int + + // allowPrivileged indicates whether this context allows privileged mode containers + AllowPrivileged bool + + // allowedVolumeTypes lists the types of volumes that a container can bind + AllowedVolumeTypes []string + + // addCapabilities is the list of Linux kernel capabilities to add + AddCapabilities []string + + // removeCapabilities is the list of Linux kernel capabilities to remove + RemoveCapabilities []string + + // SELinux specific settings (optional) + SELinux *SELinuxContext + + // AppArmor specific settings (optional) + AppArmor *AppArmorContext + + // FUTURE: + // With Linux user namespace support, it should be possible to map + // a range of container uids/gids to arbitrary host uids/gids + // UserMappings []IDMapping + // GroupMappings []IDMapping +} + +type SELinuxContext struct { + // MCS label/SELinux level to run the container under + Level string + + // SELinux type label for container processes + Type string + + // FUTURE: + // LabelVolumeMountsExclusive []Volume + // LabelVolumeMountsShared []Volume +} + +type AppArmorContext struct { + // AppArmor profile + Profile string +} +``` + +#### Security Context Lifecycle + +The lifecycle of a security context will be tied to that of a service account. It is expected that a service account with a default security context will be created for every Kubernetes namespace (without administrator intervention). If resources need to be allocated when creating a security context (for example, assign a range of host uids/gids), a pattern such as [finalizers](https://github.com/GoogleCloudPlatform/kubernetes/issues/3585) can be used before declaring the security context / service account / namespace ready for use. \ No newline at end of file -- cgit v1.2.3 From 84665f86076376dabf8ed2c69130197b168a42c1 Mon Sep 17 00:00:00 2001 From: Salvatore Dario Minonne Date: Fri, 30 Jan 2015 15:27:41 +0100 Subject: Fix dockerfile for etcd.2.0.0 --- development.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/development.md b/development.md index f4770db3..aa2878f1 100644 --- a/development.md +++ b/development.md @@ -136,7 +136,7 @@ godep go tool cover -html=target/c.out ## Integration tests -You need an [etcd](https://github.com/coreos/etcd/releases/tag/v0.4.6) in your path, please make sure it is installed and in your ``$PATH``. +You need an [etcd](https://github.com/coreos/etcd/releases/tag/v2.0.0) in your path, please make sure it is installed and in your ``$PATH``. ``` cd kubernetes hack/test-integration.sh @@ -243,4 +243,3 @@ git rebase upstream/master ``` hack/run-gendocs.sh ``` - -- cgit v1.2.3 From ea4c801002746183d77d291dc538cc1523d2ae59 Mon Sep 17 00:00:00 2001 From: kargakis Date: Tue, 3 Feb 2015 15:09:46 +0100 Subject: Add links to logging libraries in question --- logging.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/logging.md b/logging.md index 9b6bfa2a..82b6a0c8 100644 --- a/logging.md +++ b/logging.md @@ -1,7 +1,7 @@ Logging Conventions =================== -The following conventions for the glog levels to use. glog is globally prefered to "log" for better runtime control. +The following conventions for the glog levels to use. [glog](http://godoc.org/github.com/golang/glog) is globally prefered to [log](http://golang.org/pkg/log/) for better runtime control. * glog.Errorf() - Always an error * glog.Warningf() - Something unexpected, but probably not an error -- cgit v1.2.3 From 4c9e6d37b6276e38b1d24d5299545ee1c7ca0472 Mon Sep 17 00:00:00 2001 From: Alex Robinson Date: Tue, 3 Feb 2015 22:38:01 +0000 Subject: Fix the broken links in the labels and access design docs. --- access.md | 4 ++-- labels.md | 12 ++++++------ 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/access.md b/access.md index 7af64ac9..8a2f1edd 100644 --- a/access.md +++ b/access.md @@ -151,7 +151,7 @@ In the Simple Profile: Namespaces versus userAccount vs Labels: - `userAccount`s are intended for audit logging (both name and UID should be logged), and to define who has access to `namespace`s. -- `labels` (see [docs/labels.md](labels.md)) should be used to distinguish pods, users, and other objects that cooperate towards a common goal but are different in some way, such as version, or responsibilities. +- `labels` (see [docs/labels.md](/docs/labels.md)) should be used to distinguish pods, users, and other objects that cooperate towards a common goal but are different in some way, such as version, or responsibilities. - `namespace`s prevent name collisions between uncoordinated groups of people, and provide a place to attach common policies for co-operating groups of people. @@ -212,7 +212,7 @@ Policy objects may be applicable only to a single namespace or to all namespaces ## Accounting -The API should have a `quota` concept (see https://github.com/GoogleCloudPlatform/kubernetes/issues/442). A quota object relates a namespace (and optionally a label selector) to a maximum quantity of resources that may be used (see [resources.md](resources.md)). +The API should have a `quota` concept (see https://github.com/GoogleCloudPlatform/kubernetes/issues/442). A quota object relates a namespace (and optionally a label selector) to a maximum quantity of resources that may be used (see [resources.md](/docs/resources.md)). Initially: - a `quota` object is immutable. diff --git a/labels.md b/labels.md index 8415376d..bc151f7c 100644 --- a/labels.md +++ b/labels.md @@ -1,6 +1,6 @@ # Labels -_Labels_ are key/value pairs identifying client/user-defined attributes (and non-primitive system-generated attributes) of API objects, which are stored and returned as part of the [metadata of those objects](api-conventions.md). Labels can be used to organize and to select subsets of objects according to these attributes. +_Labels_ are key/value pairs identifying client/user-defined attributes (and non-primitive system-generated attributes) of API objects, which are stored and returned as part of the [metadata of those objects](/docs/api-conventions.md). Labels can be used to organize and to select subsets of objects according to these attributes. Each object can have a set of key/value labels set on it, with at most one label with a particular key. ``` @@ -10,13 +10,13 @@ Each object can have a set of key/value labels set on it, with at most one label } ``` -Unlike [names and UIDs](identifiers.md), labels do not provide uniqueness. In general, we expect many objects to carry the same label(s). +Unlike [names and UIDs](/docs/identifiers.md), labels do not provide uniqueness. In general, we expect many objects to carry the same label(s). Via a _label selector_, the client/user can identify a set of objects. The label selector is the core grouping primitive in Kubernetes. Label selectors may also be used to associate policies with sets of objects. -We also [plan](https://github.com/GoogleCloudPlatform/kubernetes/issues/560) to make labels available inside pods and [lifecycle hooks](container-environment.md). +We also [plan](https://github.com/GoogleCloudPlatform/kubernetes/issues/560) to make labels available inside pods and [lifecycle hooks](/docs/container-environment.md). Valid label keys are comprised of two segments - prefix and name - separated by a slash (`/`). The name segment is required and must be a DNS label: 63 @@ -50,8 +50,8 @@ key1 exists LIST and WATCH operations may specify label selectors to filter the sets of objects returned using a query parameter: `?labels=key1%3Dvalue1,key2%3Dvalue2,...`. We may extend such filtering to DELETE operations in the future. Kubernetes also currently supports two objects that use label selectors to keep track of their members, `service`s and `replicationController`s: -- `service`: A [service](services.md) is a configuration unit for the proxies that run on every worker node. It is named and points to one or more pods. -- `replicationController`: A [replication controller](replication-controller.md) ensures that a specified number of pod "replicas" are running at any one time. If there are too many, it'll kill some. If there are too few, it'll start more. +- `service`: A [service](/docs/services.md) is a configuration unit for the proxies that run on every worker node. It is named and points to one or more pods. +- `replicationController`: A [replication controller](/docs/replication-controller.md) ensures that a specified number of pod "replicas" are running at any one time. If there are too many, it'll kill some. If there are too few, it'll start more. The set of pods that a `service` targets is defined with a label selector. Similarly, the population of pods that a `replicationController` is monitoring is also defined with a label selector. @@ -73,4 +73,4 @@ Since labels can be set at pod creation time, no separate set add/remove operati ## Labels vs. annotations -We'll eventually index and reverse-index labels for efficient queries and watches, use them to sort and group in UIs and CLIs, etc. We don't want to pollute labels with non-identifying, especially large and/or structured, data. Non-identifying information should be recorded using [annotations](annotations.md). +We'll eventually index and reverse-index labels for efficient queries and watches, use them to sort and group in UIs and CLIs, etc. We don't want to pollute labels with non-identifying, especially large and/or structured, data. Non-identifying information should be recorded using [annotations](/docs/annotations.md). -- cgit v1.2.3 From 3b687b605b7bd6abe51b91e4ec18678e05849814 Mon Sep 17 00:00:00 2001 From: csrwng Date: Mon, 9 Feb 2015 14:17:51 -0500 Subject: Specify intent for container isolation and add details for id mapping --- security_context.md | 86 ++++++++++++++++++++++++++++++++++++----------------- 1 file changed, 59 insertions(+), 27 deletions(-) diff --git a/security_context.md b/security_context.md index 87d67aa7..400d30e9 100644 --- a/security_context.md +++ b/security_context.md @@ -98,6 +98,7 @@ type SecurityContextProvider interface { ModifyHostConfig(pod *api.BoundPod, container *api.Container, hostConfig *docker.HostConfig) } ``` + If the value of the SecurityContextProvider field on the Kubelet is nil, the kubelet will create and run the container as it does today. ### Security Context @@ -106,53 +107,84 @@ A security context has a 1:1 correspondence to a service account and it can be i part of the service account resource. Following is an example of an initial implementation: ```go + +// SecurityContext specifies the security constraints associated with a service account type SecurityContext struct { // user is the uid to use when running the container User int - // allowPrivileged indicates whether this context allows privileged mode containers + // AllowPrivileged indicates whether this context allows privileged mode containers AllowPrivileged bool - // allowedVolumeTypes lists the types of volumes that a container can bind + // AllowedVolumeTypes lists the types of volumes that a container can bind AllowedVolumeTypes []string - // addCapabilities is the list of Linux kernel capabilities to add + // AddCapabilities is the list of Linux kernel capabilities to add AddCapabilities []string - // removeCapabilities is the list of Linux kernel capabilities to remove + // RemoveCapabilities is the list of Linux kernel capabilities to remove RemoveCapabilities []string - // SELinux specific settings (optional) - SELinux *SELinuxContext - - // AppArmor specific settings (optional) - AppArmor *AppArmorContext + // Isolation specifies the type of isolation required for containers + // in this security context + Isolation ContainerIsolationSpec +} + +// ContainerIsolationSpec indicates intent for container isolation +type ContainerIsolationSpec struct { + // Type is the container isolation type (None, Private) + Type ContainerIsolationType - // FUTURE: - // With Linux user namespace support, it should be possible to map - // a range of container uids/gids to arbitrary host uids/gids - // UserMappings []IDMapping - // GroupMappings []IDMapping + // FUTURE: IDMapping specifies how users and groups from the host will be mapped + IDMapping *IDMapping } -type SELinuxContext struct { - // MCS label/SELinux level to run the container under - Level string - - // SELinux type label for container processes - Type string - - // FUTURE: - // LabelVolumeMountsExclusive []Volume - // LabelVolumeMountsShared []Volume +// ContainerIsolationType is the type of container isolation for a security context +type ContainerIsolationType string + +const ( + // ContainerIsolationNone means that no additional consraints are added to + // containers to isolate them from their host + ContainerIsolationNone ContainerIsolationType = "None" + + // ContainerIsolationPrivate means that containers are isolated in process + // and storage from their host and other containers. + ContainerIsolationPrivate ContainerIsolationType = "Private" +) + +// IDMapping specifies the requested user and group mappings for containers +// associated with a specific security context +type IDMapping struct { + // SharedUsers is the set of user ranges that must be unique to the entire cluster + SharedUsers []IDMappingRange + + // SharedGroups is the set of group ranges that must be unique to the entire cluster + SharedGroups []IDMappingRange + + // PrivateUsers are mapped to users on the host node, but are not necessarily + // unique to the entire cluster + PrivateUsers []IDMappingRange + + // PrivateGroups are mapped to groups on the host node, but are not necessarily + // unique to the entire cluster + PrivateGroups []IDMappingRange } -type AppArmorContext struct { - // AppArmor profile - Profile string +// IDMappingRange specifies a mapping between container IDs and node IDs +type IDMappingRange struct { + // ContainerID is the starting container ID + ContainerID int + + // HostID is the starting host ID + HostID int + + // Length is the length of the ID range + Length int } + ``` + #### Security Context Lifecycle The lifecycle of a security context will be tied to that of a service account. It is expected that a service account with a default security context will be created for every Kubernetes namespace (without administrator intervention). If resources need to be allocated when creating a security context (for example, assign a range of host uids/gids), a pattern such as [finalizers](https://github.com/GoogleCloudPlatform/kubernetes/issues/3585) can be used before declaring the security context / service account / namespace ready for use. \ No newline at end of file -- cgit v1.2.3 From b133560996a739a9e5160147da264df28d0d7e95 Mon Sep 17 00:00:00 2001 From: Alex Robinson Date: Tue, 10 Feb 2015 09:35:11 +0000 Subject: Add steps to the development guide for how to use godep to update an existing dependency. Also change from the numbered lists from markdown that didn't work due to the intervening code blocks to just raw text numbered lists. --- development.md | 32 +++++++++++++++++--------------- 1 file changed, 17 insertions(+), 15 deletions(-) diff --git a/development.md b/development.md index aa2878f1..3d05f71f 100644 --- a/development.md +++ b/development.md @@ -31,17 +31,18 @@ Kubernetes uses [godep](https://github.com/tools/godep) to manage dependencies. ### Installing godep There are many ways to build and host go binaries. Here is an easy way to get utilities like ```godep``` installed: -1. Ensure that [mercurial](http://mercurial.selenic.com/wiki/Download) is installed on your system. (some of godep's dependencies use the mercurial +1) Ensure that [mercurial](http://mercurial.selenic.com/wiki/Download) is installed on your system. (some of godep's dependencies use the mercurial source control system). Use ```apt-get install mercurial``` or ```yum install mercurial``` on Linux, or [brew.sh](http://brew.sh) on OS X, or download directly from mercurial. -2. Create a new GOPATH for your tools and install godep: + +2) Create a new GOPATH for your tools and install godep: ``` export GOPATH=$HOME/go-tools mkdir -p $GOPATH go get github.com/tools/godep ``` -3. Add $GOPATH/bin to your path. Typically you'd add this to your ~/.profile: +3) Add $GOPATH/bin to your path. Typically you'd add this to your ~/.profile: ``` export GOPATH=$HOME/go-tools export PATH=$PATH:$GOPATH/bin @@ -50,8 +51,7 @@ export PATH=$PATH:$GOPATH/bin ### Using godep Here is a quick summary of `godep`. `godep` helps manage third party dependencies by copying known versions into Godeps/_workspace. Here is the recommended way to set up your system. There are other ways that may work, but this is the easiest one I know of. -1. Devote a directory to this endeavor: - +1) Devote a directory to this endeavor: ``` export KPATH=$HOME/code/kubernetes mkdir -p $KPATH/src/github.com/GoogleCloudPlatform/kubernetes @@ -60,8 +60,7 @@ git clone https://path/to/your/fork . # Or copy your existing local repo here. IMPORTANT: making a symlink doesn't work. ``` -2. Set up your GOPATH. - +2) Set up your GOPATH. ``` # Option A: this will let your builds see packages that exist elsewhere on your system. export GOPATH=$KPATH:$GOPATH @@ -70,24 +69,27 @@ export GOPATH=$KPATH # Option B is recommended if you're going to mess with the dependencies. ``` -3. Populate your new $GOPATH. - +3) Populate your new $GOPATH. ``` cd $KPATH/src/github.com/GoogleCloudPlatform/kubernetes godep restore ``` -4. To add a dependency, you can do ```go get path/to/dependency``` as usual. - -5. To package up a dependency, do - +4) Next, you can either add a new dependency or update an existing one. ``` +# To add a new dependency, run: cd $KPATH/src/github.com/GoogleCloudPlatform/kubernetes +go get path/to/dependency godep save ./... -# Sanity check that your Godeps.json file is ok by re-restoring: -godep restore + +# To update an existing dependency, do +cd $KPATH/src/github.com/GoogleCloudPlatform/kubernetes +go get -u path/to/dependency +godep update path/to/dependency ``` +5) Before sending your PR, it's a good idea to sanity check that your Godeps.json file is ok by re-restoring: ```godep restore``` + I (lavalamp) have sometimes found it expedient to manually fix the /Godeps/godeps.json file to minimize the changes. Please send dependency updates in separate commits within your PR, for easier reviewing. -- cgit v1.2.3 From 66a676bf589605da619b9b6ac6123491787df441 Mon Sep 17 00:00:00 2001 From: Alex Robinson Date: Wed, 11 Feb 2015 12:16:16 -0800 Subject: Fix bad config in flaky test documentation and add script to help check for flakes. --- flaky-tests.md | 24 ++++++++++++++++++------ 1 file changed, 18 insertions(+), 6 deletions(-) diff --git a/flaky-tests.md b/flaky-tests.md index ccd32afb..e352e110 100644 --- a/flaky-tests.md +++ b/flaky-tests.md @@ -11,7 +11,7 @@ There is a testing image ```brendanburns/flake``` up on the docker hub. We will Create a replication controller with the following config: ```yaml -id: flakeController +id: flakecontroller kind: ReplicationController apiVersion: v1beta1 desiredState: @@ -41,14 +41,26 @@ labels: ```./cluster/kubectl.sh create -f controller.yaml``` -This will spin up 100 instances of the test. They will run to completion, then exit, the kubelet will restart them, eventually you will have sufficient -runs for your purposes, and you can stop the replication controller by setting the ```replicas``` field to 0 and then running: +This will spin up 24 instances of the test. They will run to completion, then exit, and the kubelet will restart them, accumulating more and more runs of the test. +You can examine the recent runs of the test by calling ```docker ps -a``` and looking for tasks that exited with non-zero exit codes. Unfortunately, docker ps -a only keeps around the exit status of the last 15-20 containers with the same image, so you have to check them frequently. +You can use this script to automate checking for failures, assuming your cluster is running on GCE and has four nodes: ```sh -./cluster/kubectl.sh update -f controller.yaml -./cluster/kubectl.sh delete -f controller.yaml +echo "" > output.txt +for i in {1..4}; do + echo "Checking kubernetes-minion-${i}" + echo "kubernetes-minion-${i}:" >> output.txt + gcloud compute ssh "kubernetes-minion-${i}" --command="sudo docker ps -a" >> output.txt +done +grep "Exited ([^0])" output.txt ``` -Now examine the machines with ```docker ps -a``` and look for tasks that exited with non-zero exit codes (ignore those that exited -1, since that's what happens when you stop the replica controller) +Eventually you will have sufficient runs for your purposes. At that point you can stop and delete the replication controller by running: + +```sh +./cluster/kubectl.sh stop replicationcontroller flakecontroller +``` + +If you do a final check for flakes with ```docker ps -a```, ignore tasks that exited -1, since that's what happens when you stop the replication controller. Happy flake hunting! -- cgit v1.2.3 From 4df971f078f655daad6e51dabd2fc05da8811e33 Mon Sep 17 00:00:00 2001 From: Saad Ali Date: Wed, 11 Feb 2015 18:04:30 -0800 Subject: Documentation for Event Compression --- event_compression.md | 79 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 79 insertions(+) create mode 100644 event_compression.md diff --git a/event_compression.md b/event_compression.md new file mode 100644 index 00000000..ab33a509 --- /dev/null +++ b/event_compression.md @@ -0,0 +1,79 @@ +# Kubernetes Event Compression + +This document captures the design of event compression. + + +## Background + +Kubernetes components can get into a state where they generate tons of events which are identical except for the timestamp. For example, when pulling a non-existing image, Kubelet will repeatedly generate ```image_not_existing``` and ```container_is_waiting``` events until upstream components correct the image. When this happens, the spam from the repeated events makes the entire event mechanism useless. It also appears to cause memory pressure in etcd (see [#3853](https://github.com/GoogleCloudPlatform/kubernetes/issues/3853)). + +## Proposal +Each binary that generates events (for example, ```kubelet```) should keep track of previously generated events so that it can collapse recurring events into a single event instead of creating a new instance for each new event. + +Event compression should be best effort (not guaranteed). Meaning, in the worst case, ```n``` identical (minus timestamp) events may still result in ```n``` event entries. + +## Design +Instead of a single Timestamp, each event object [contains](https://github.com/GoogleCloudPlatform/kubernetes/blob/master/pkg/api/types.go#L1111) the following fields: + * ```FirstTimestamp util.Time``` + * The date/time of the first occurrence of the event. + * ```LastTimestamp util.Time``` + * The date/time of the most recent occurrence of the event. + * On first occurrence, this is equal to the FirstTimestamp. + * ```Count int``` + * The number of occurrences of this event between FirstTimestamp and LastTimestamp + * On first occurrence, this is 1. + +Each binary that generates events will: + * Maintain a new global hash table to keep track of previously generated events (see ```pkg/client/record/events_cache.go```). + * The code that “records/writes” events (see ```StartRecording``` in ```pkg/client/record/event.go```), uses the global hash table to check if any new event has been seen previously. + * The key for the hash table is generated from the event object minus timestamps/count/transient fields (see ```pkg/client/record/events_cache.go```), specifically the following events fields are used to construct a unique key for an event: + * ```event.Source.Component``` + * ```event.Source.Host``` + * ```event.InvolvedObject.Kind``` + * ```event.InvolvedObject.Namespace``` + * ```event.InvolvedObject.Name``` + * ```event.InvolvedObject.UID``` + * ```event.InvolvedObject.APIVersion``` + * ```event.Reason``` + * ```event.Message``` + * If the key for a new event matches the key for a previously generated events (meaning all of the above fields match between the new event and some previously generated event), then the event is considered to be a duplicate: + * Instead of the usual POST/create event API, the new PUT (update) event API is called to update the existing event entry in etcd with the new last seen timestamp and count. + * The event is also updated in the global hash table with an incremented count, updated last seen timestamp, name, and new resource version (all required to issue a future event update). + * If the key for a new event does not match the key for any previously generated event (meaning none of the above fields match between the new event and any previously generated events), then the event is considered to be new/unique: + * The usual POST/create event API is called to create a new event entry in etcd. + * An entry for the event is also added to the global hash table. + +## Issues/Risks + * Hash table clean up + * If the component (e.g. kubelet) runs for a long period of time and generates a ton of unique events, the hash table could grow very large in memory. + * *Future consideration:* remove entries from the hash table that are older than some specified time. + * Event history is not preserved across application restarts + * Each component keeps track of event history in memory, a restart causes event history to be cleared. + * That means that compression will not occur across component restarts. + * Similarly, if in the future events are aged out of the hash table, then events will only be compressed until they age out of the hash table, at which point any new instance of the event will cause a new entry to be created in etcd. + +## Example +Sample kubectl output +``` +FIRSTTIME LASTTIME COUNT NAME KIND SUBOBJECT REASON SOURCE MESSAGE +Thu, 12 Feb 2015 01:13:02 +0000 Thu, 12 Feb 2015 01:13:02 +0000 1 kubernetes-minion-4.c.saad-dev-vms.internal Minion starting {kubelet kubernetes-minion-4.c.saad-dev-vms.internal} Starting kubelet. +Thu, 12 Feb 2015 01:13:09 +0000 Thu, 12 Feb 2015 01:13:09 +0000 1 kubernetes-minion-1.c.saad-dev-vms.internal Minion starting {kubelet kubernetes-minion-1.c.saad-dev-vms.internal} Starting kubelet. +Thu, 12 Feb 2015 01:13:09 +0000 Thu, 12 Feb 2015 01:13:09 +0000 1 kubernetes-minion-3.c.saad-dev-vms.internal Minion starting {kubelet kubernetes-minion-3.c.saad-dev-vms.internal} Starting kubelet. +Thu, 12 Feb 2015 01:13:09 +0000 Thu, 12 Feb 2015 01:13:09 +0000 1 kubernetes-minion-2.c.saad-dev-vms.internal Minion starting {kubelet kubernetes-minion-2.c.saad-dev-vms.internal} Starting kubelet. +Thu, 12 Feb 2015 01:13:05 +0000 Thu, 12 Feb 2015 01:13:12 +0000 4 monitoring-influx-grafana-controller-0133o Pod failedScheduling {scheduler } Error scheduling: no minions available to schedule pods +Thu, 12 Feb 2015 01:13:05 +0000 Thu, 12 Feb 2015 01:13:12 +0000 4 elasticsearch-logging-controller-fplln Pod failedScheduling {scheduler } Error scheduling: no minions available to schedule pods +Thu, 12 Feb 2015 01:13:05 +0000 Thu, 12 Feb 2015 01:13:12 +0000 4 kibana-logging-controller-gziey Pod failedScheduling {scheduler } Error scheduling: no minions available to schedule pods +Thu, 12 Feb 2015 01:13:05 +0000 Thu, 12 Feb 2015 01:13:12 +0000 4 skydns-ls6k1 Pod failedScheduling {scheduler } Error scheduling: no minions available to schedule pods +Thu, 12 Feb 2015 01:13:05 +0000 Thu, 12 Feb 2015 01:13:12 +0000 4 monitoring-heapster-controller-oh43e Pod failedScheduling {scheduler } Error scheduling: no minions available to schedule pods +Thu, 12 Feb 2015 01:13:20 +0000 Thu, 12 Feb 2015 01:13:20 +0000 1 kibana-logging-controller-gziey BoundPod implicitly required container POD pulled {kubelet kubernetes-minion-4.c.saad-dev-vms.internal} Successfully pulled image "kubernetes/pause:latest" +Thu, 12 Feb 2015 01:13:20 +0000 Thu, 12 Feb 2015 01:13:20 +0000 1 kibana-logging-controller-gziey Pod scheduled {scheduler } Successfully assigned kibana-logging-controller-gziey to kubernetes-minion-4.c.saad-dev-vms.internal + +``` + +This demonstrates what would have been 20 separate entries (indicating scheduling failure) collapsed/compressed down to 5 entries. + +## Related Pull Requests/Issues + * Issue [#4073](https://github.com/GoogleCloudPlatform/kubernetes/issues/4073): Compress duplicate events + * PR [#4157](https://github.com/GoogleCloudPlatform/kubernetes/issues/4157): Add "Update Event" to Kubernetes API + * PR [#4206](https://github.com/GoogleCloudPlatform/kubernetes/issues/4206): Modify Event struct to allow compressing multiple recurring events in to a single event + * PR [#4306](https://github.com/GoogleCloudPlatform/kubernetes/issues/4073): Compress recurring events in to a single event to optimize etcd storage -- cgit v1.2.3 From cbbd382b3f5bfb7207f7a83aea99b1967067f002 Mon Sep 17 00:00:00 2001 From: Clayton Coleman Date: Mon, 2 Feb 2015 14:35:33 -0500 Subject: Kubernetes pod and namespace security model This proposed update to docs/design/security.md includes proposals on how to ensure containers have consistent Linux security behavior across nodes, how containers authenticate and authorize to the master and other components, and how secret data could be distributed to pods to allow that authentication. References concepts from #3910, #2030, and #2297 as well as upstream issues around the Docker vault and Docker secrets. --- security.md | 97 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 93 insertions(+), 4 deletions(-) diff --git a/security.md b/security.md index 22034bdf..27f07cd6 100644 --- a/security.md +++ b/security.md @@ -1,17 +1,106 @@ # Security in Kubernetes -General design principles and guidelines related to security of containers, APIs, and infrastructure in Kubernetes. +Kubernetes should define a reasonable set of security best practices that allows processes to be isolated from each other, from the cluster infrastructure, and which preserves important boundaries between those who manage the cluster, and those who use the cluster. +While Kubernetes today is not primarily a multi-tenant system, the long term evolution of Kubernetes will increasingly rely on proper boundaries between users and administrators. The code running on the cluster must be appropriately isolated and secured to prevent malicious parties from affecting the entire cluster. -## Objectives -1. Ensure a clear isolation between container and the underlying host it runs on +## High Level Goals + +1. Ensure a clear isolation between the container and the underlying host it runs on 2. Limit the ability of the container to negatively impact the infrastructure or other containers 3. [Principle of Least Privilege](http://en.wikipedia.org/wiki/Principle_of_least_privilege) - ensure components are only authorized to perform the actions they need, and limit the scope of a compromise by limiting the capabilities of individual components 4. Reduce the number of systems that have to be hardened and secured by defining clear boundaries between components +5. Allow users of the system to be cleanly separated from administrators +6. Allow administrative functions to be delegated to users where necessary +7. Allow applications to be run on the cluster that have "secret" data (keys, certs, passwords) which is properly abstracted from "public" data. + + +## Use cases + +### Roles: + +We define "user" as a unique identity accessing the Kubernetes API server, which may be a human or an automated process. Human users fall into the following categories: + +1. k8s admin - administers a kubernetes cluster and has access to the undelying components of the system +2. k8s project administrator - administrates the security of a small subset of the cluster +3. k8s developer - launches pods on a kubernetes cluster and consumes cluster resources + +Automated process users fall into the following categories: + +1. k8s container user - a user that processes running inside a container (on the cluster) can use to access other cluster resources indepedent of the human users attached to a project +2. k8s infrastructure user - the user that kubernetes infrastructure components use to perform cluster functions with clearly defined roles + + +### Description of roles: + +* Developers: + * write pod specs. + * making some of their own images, and using some "community" docker images + * know which pods need to talk to which other pods + * decide which pods should be share files with other pods, and which should not. + * reason about application level security, such as containing the effects of a local-file-read exploit in a webserver pod. + * do not often reason about operating system or organizational security. + * are not necessarily comfortable reasoning about the security properties of a system at the level of detail of Linux Capabilities, SELinux, AppArmor, etc. + +* Project Admins: + * allocate identity and roles within a namespace + * reason about organizational security within a namespace + * don't give a developer permissions that are not needed for role. + * protect files on shared storage from unnecessary cross-team access + * are less focused about application security + +* Administrators: + * are less focused on application security. Focused on operating system security. + * protect the node from bad actors in containers, and properly-configured innocent containers from bad actors in other containers. + * comfortable reasoning about the security properties of a system at the level of detail of Linux Capabilities, SELinux, AppArmor, etc. + * decides who can use which Linux Capabilities, run privileged containers, use hostDir, etc. + * e.g. a team that manages Ceph or a mysql server might be trusted to have raw access to storage devices in some organizations, but teams that develop the applications at higher layers would not. + + +## Proposed Design + +A pod runs in a *security context* under a *service account* that is defined by an administrator or project administrator, and the *secrets* a pod has access to is limited by that *service account*. + + +1. The API should authenticate and authorize user actions [authn and authz](https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/design/access.md) +2. All infrastructure components (kubelets, kube-proxies, controllers, scheduler) should have an infrastructure user that they can authenticate with and be authorized to perform only the functions they require against the API. +3. Most infrastructure components should use the API as a way of exchanging data and changing the system, and only the API should have access to the underlying data store (etcd) +4. When containers run on the cluster and need to talk to other containers or the API server, they should be identified and authorized clearly as an autonomous process via a [service account](https://github.com/GoogleCloudPlatform/kubernetes/pull/2297) + 1. If the user who started a long-lived process is removed from access to the cluster, the process should be able to continue without interruption + 2. If the user who started processes are removed from the cluster, administrators may wish to terminate their processes in bulk + 3. When containers run with a service account, the user that created / triggered the service account behavior must be associated to the container's action +5. When container processes runs on the cluster, they should run in a [security context](https://github.com/GoogleCloudPlatform/kubernetes/pull/3910) that isolates those processes via Linux user security, user namespaces, and permissions. + 1. Administrators should be able to configure the cluster to automatically confine all container processes as a non-root, randomly assigned UID + 2. Administrators should be able to ensure that container processes within the same namespace are all assigned the same unix user UID + 3. Administrators should be able to limit which developers and project administrators have access to higher privilege actions + 4. Project administrators should be able to run pods within a namespace under different security contexts, and developers must be able to specify which of the available security contexts they may use + 5. Developers should be able to run their own images or images from the community and expect those images to run correctly + 6. Developers may need to ensure their images work within higher security requirements specified by administrators + 7. When available, Linux kernel user namespaces can be used to ensure 5.2 and 5.4 are met. + 8. When application developers want to share filesytem data via distributed filesystems, the Unix user ids on those filesystems must be consistent across different container processes +6. Developers should be able to define [secrets](https://github.com/GoogleCloudPlatform/kubernetes/pull/2297) that are automatically added to the containers when pods are run + 1. Secrets are files injected into the container whose values should not be displayed within a pod. Examples: + 1. An SSH private key for git cloning remote data + 2. A client certificate for accessing a remote system + 3. A private key and certificate for a web server + 4. A .kubeconfig file with embedded cert / token data for accessing the Kubernetes master + 5. A .dockercfg file for pulling images from a protected registry + 2. Developers should be able to define the pod spec so that a secret lands in a specific location + 3. Project administrators should be able to limit developers within a namespace from viewing or modify secrets (anyone who can launch an arbitrary pod can view secrets) + 4. Secrets are generally not copied from one namespace to another when a developer's application definitions are copied + + +### Related design discussion + +* Authorization and authentication https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/design/access.md +* Secret distribution via files https://github.com/GoogleCloudPlatform/kubernetes/pull/2030 +* Docker secrets https://github.com/docker/docker/pull/6697 +* Docker vault https://github.com/docker/docker/issues/10310 +## Specific Design Points -## Design Points +### TODO: authorization, authentication ### Isolate the data store from the minions and supporting infrastructure -- cgit v1.2.3 From 83e0629cedcf0fa400034ab154c72c2543b03d5f Mon Sep 17 00:00:00 2001 From: Marek Grabowski Date: Sat, 14 Feb 2015 00:11:38 +0100 Subject: Added instruction for profiling apiserver --- README.md | 2 ++ profiling.md | 30 ++++++++++++++++++++++++++++++ 2 files changed, 32 insertions(+) create mode 100644 profiling.md diff --git a/README.md b/README.md index ab41448d..bf398e9f 100644 --- a/README.md +++ b/README.md @@ -17,3 +17,5 @@ Docs in this directory relate to developing Kubernetes. * **Releasing Kubernetes** ([releasing.md](releasing.md)): How to create a Kubernetes release (as in version) and how the version information gets embedded into the built binaries. + +* **Profiling Kubernetes** ([profiling.md](profiling.md)): How to plug in go pprof profiler to Kubernetes. diff --git a/profiling.md b/profiling.md new file mode 100644 index 00000000..68d1cc24 --- /dev/null +++ b/profiling.md @@ -0,0 +1,30 @@ +# Profiling Kubernetes + +This document explain how to plug in profiler and how to profile Kubernetes services. + +## Profiling library + +Go comes with inbuilt 'net/http/pprof' profiling library and profiling web service. The way service works is binding debug/pprof/ subtree on a running webserver to the profiler. Reading from subpages of debug/pprof returns pprof-formated profiles of the running binary. The output can be processed offline by the tool of choice, or used as an input to handy 'go tool pprof', which can graphically represent the result. + +## Adding profiling to services to APIserver. + +TL;DR: Add lines: +``` + m.mux.HandleFunc("/debug/pprof/", pprof.Index) + m.mux.HandleFunc("/debug/pprof/profile", pprof.Profile) + m.mux.HandleFunc("/debug/pprof/symbol", pprof.Symbol) +``` +to the init(c *Config) method in 'pkg/master/master.go' and import 'net/http/pprof' package. + +In most use cases to use profiler service it's enough to do 'import _ net/http/pprof', which automatically registers a handler in the default http.Server. Slight inconvenience is that APIserver uses default server for intra-cluster communication, so plugging profiler to it is not really useful. In 'pkg/master/server/server.go' more servers are created and started as separate goroutines. The one that is usually serving external traffic is secureServer. The handler for this traffic is defined in 'pkg/master/master.go' and stored in Handler variable. It is created from HTTP multiplexer, so the only thing that needs to be done is adding profiler handler functions to this multiplexer. This is exactly what lines after TL;DR do. + +## Connecting to the profiler +Even when running profiler I found not really straightforward to use 'go tool pprof' with it. The problem is that at least for dev purposes certificates generated for APIserver are not signed by anyone trusted and because secureServer serves only secure traffic it isn't straightforward to connect to the service. The best workaround I found is by creating an ssh tunnel from the kubernetes_master open unsecured port to some external server, and use this server as a proxy. To save everyone looking for correct ssh flags, it is done by running: +``` + ssh kubernetes_master -L:localhost:8080 +``` +or analogous one for you Cloud provider. Afterwards you can e.g. run +``` +go tool pprof http://localhost:/debug/pprof/profile +``` +to get 30 sec. CPU profile. -- cgit v1.2.3 From ec77204e813546253b7af509aad039e989b53ad4 Mon Sep 17 00:00:00 2001 From: Saad Ali Date: Tue, 17 Feb 2015 16:36:08 -0800 Subject: Update Event Compression Design Doc with LRU Cache --- event_compression.md | 33 ++++++++++++++++----------------- 1 file changed, 16 insertions(+), 17 deletions(-) diff --git a/event_compression.md b/event_compression.md index ab33a509..99dda143 100644 --- a/event_compression.md +++ b/event_compression.md @@ -23,10 +23,10 @@ Instead of a single Timestamp, each event object [contains](https://github.com/G * The number of occurrences of this event between FirstTimestamp and LastTimestamp * On first occurrence, this is 1. -Each binary that generates events will: - * Maintain a new global hash table to keep track of previously generated events (see ```pkg/client/record/events_cache.go```). - * The code that “records/writes” events (see ```StartRecording``` in ```pkg/client/record/event.go```), uses the global hash table to check if any new event has been seen previously. - * The key for the hash table is generated from the event object minus timestamps/count/transient fields (see ```pkg/client/record/events_cache.go```), specifically the following events fields are used to construct a unique key for an event: +Each binary that generates events: + * Maintains a historical record of previously generated events: + * Implmented with ["Least Recently Used Cache"](https://github.com/golang/groupcache/blob/master/lru/lru.go) in [```pkg/client/record/events_cache.go```](https://github.com/GoogleCloudPlatform/kubernetes/tree/master/pkg/client/record/events_cache.go). + * The key in the cache is generated from the event object minus timestamps/count/transient fields, specifically the following events fields are used to construct a unique key for an event: * ```event.Source.Component``` * ```event.Source.Host``` * ```event.InvolvedObject.Kind``` @@ -36,26 +36,24 @@ Each binary that generates events will: * ```event.InvolvedObject.APIVersion``` * ```event.Reason``` * ```event.Message``` - * If the key for a new event matches the key for a previously generated events (meaning all of the above fields match between the new event and some previously generated event), then the event is considered to be a duplicate: - * Instead of the usual POST/create event API, the new PUT (update) event API is called to update the existing event entry in etcd with the new last seen timestamp and count. - * The event is also updated in the global hash table with an incremented count, updated last seen timestamp, name, and new resource version (all required to issue a future event update). - * If the key for a new event does not match the key for any previously generated event (meaning none of the above fields match between the new event and any previously generated events), then the event is considered to be new/unique: + * The LRU cache is capped at 4096 events. That means if a component (e.g. kubelet) runs for a long period of time and generates tons of unique events, the previously generated events cache will not grow unchecked in memory. Instead, after 4096 unique events are generated, the oldest events are evicted from the cache. + * When an event is generated, the previously generated events cache is checked (see [```pkg/client/record/event.go```](https://github.com/GoogleCloudPlatform/kubernetes/blob/master/pkg/client/record/event.go)). + * If the key for the new event matches the key for a previously generated event (meaning all of the above fields match between the new event and some previously generated event), then the event is considered to be a duplicate and the existing event entry is updated in etcd: + * The new PUT (update) event API is called to update the existing event entry in etcd with the new last seen timestamp and count. + * The event is also updated in the previously generated events cache with an incremented count, updated last seen timestamp, name, and new resource version (all required to issue a future event update). + * If the key for the new event does not match the key for any previously generated event (meaning none of the above fields match between the new event and any previously generated events), then the event is considered to be new/unique and a new event entry is created in etcd: * The usual POST/create event API is called to create a new event entry in etcd. - * An entry for the event is also added to the global hash table. + * An entry for the event is also added to the previously generated events cache. ## Issues/Risks - * Hash table clean up - * If the component (e.g. kubelet) runs for a long period of time and generates a ton of unique events, the hash table could grow very large in memory. - * *Future consideration:* remove entries from the hash table that are older than some specified time. - * Event history is not preserved across application restarts - * Each component keeps track of event history in memory, a restart causes event history to be cleared. - * That means that compression will not occur across component restarts. - * Similarly, if in the future events are aged out of the hash table, then events will only be compressed until they age out of the hash table, at which point any new instance of the event will cause a new entry to be created in etcd. + * Compression is not guaranteed, because each component keeps track of event history in memory + * An application restart causes event history to be cleared, meaning event history is not preserved across application restarts and compression will not occur across component restarts. + * Because an LRU cache is used to keep track of previously generated events, if too many unique events are generated, old events will be evicted from the cache, so events will only be compressed until they age out of the events cache, at which point any new instance of the event will cause a new entry to be created in etcd. ## Example Sample kubectl output ``` -FIRSTTIME LASTTIME COUNT NAME KIND SUBOBJECT REASON SOURCE MESSAGE +FIRSTSEEN LASTSEEN COUNT NAME KIND SUBOBJECT REASON SOURCE MESSAGE Thu, 12 Feb 2015 01:13:02 +0000 Thu, 12 Feb 2015 01:13:02 +0000 1 kubernetes-minion-4.c.saad-dev-vms.internal Minion starting {kubelet kubernetes-minion-4.c.saad-dev-vms.internal} Starting kubelet. Thu, 12 Feb 2015 01:13:09 +0000 Thu, 12 Feb 2015 01:13:09 +0000 1 kubernetes-minion-1.c.saad-dev-vms.internal Minion starting {kubelet kubernetes-minion-1.c.saad-dev-vms.internal} Starting kubelet. Thu, 12 Feb 2015 01:13:09 +0000 Thu, 12 Feb 2015 01:13:09 +0000 1 kubernetes-minion-3.c.saad-dev-vms.internal Minion starting {kubelet kubernetes-minion-3.c.saad-dev-vms.internal} Starting kubelet. @@ -77,3 +75,4 @@ This demonstrates what would have been 20 separate entries (indicating schedulin * PR [#4157](https://github.com/GoogleCloudPlatform/kubernetes/issues/4157): Add "Update Event" to Kubernetes API * PR [#4206](https://github.com/GoogleCloudPlatform/kubernetes/issues/4206): Modify Event struct to allow compressing multiple recurring events in to a single event * PR [#4306](https://github.com/GoogleCloudPlatform/kubernetes/issues/4073): Compress recurring events in to a single event to optimize etcd storage + * PR [#4444](https://github.com/GoogleCloudPlatform/kubernetes/pull/4444): Switch events history to use LRU cache instead of map -- cgit v1.2.3 From 35402355a74c93d81df5052148e071c1c2bd6feb Mon Sep 17 00:00:00 2001 From: Paul Morie Date: Tue, 17 Feb 2015 20:18:38 -0500 Subject: Secrets proposal --- secrets.md | 547 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 547 insertions(+) create mode 100644 secrets.md diff --git a/secrets.md b/secrets.md new file mode 100644 index 00000000..6d561eec --- /dev/null +++ b/secrets.md @@ -0,0 +1,547 @@ +# Secret Distribution + +## Abstract + +A proposal for the distribution of secrets (passwords, keys, etc) to the Kubelet and to +containers inside Kubernetes using a custom volume type. + +## Motivation + +Secrets are needed in containers to access internal resources like the Kubernetes master or +external resources such as git repositories, databases, etc. Users may also want behaviors in the +kubelet that depend on secret data (credentials for image pull from a docker registry) associated +with pods. + +Goals of this design: + +1. Describe a secret resource +2. Define the various challenges attendant to managing secrets on the node +3. Define a mechanism for consuming secrets in containers without modification + +## Constraints and Assumptions + +* This design does not prescribe a method for storing secrets; storage of secrets should be + pluggable to accomodate different use-cases +* Encryption of secret data and node security are orthogonal concerns +* It is assumed that node and master are secure and that compromising their security could also + compromise secrets: + * If a node is compromised, the only secrets that could potentially be exposed should be the + secrets belonging to containers scheduled onto it + * If the master is compromised, all secrets in the cluster may be exposed +* Secret rotation is an orthogonal concern, but it should be facilitated by this proposal + +## Use Cases + +1. As a user, I want to store secret artifacts for my applications and consume them securely in + containers, so that I can keep the configuration for my applications separate from the images + that use them: + 1. As a cluster operator, I want to allow a pod to access the Kubernetes master using a custom + `.kubeconfig` file, so that I can securely reach the master + 2. As a cluster operator, I want to allow a pod to access a Docker registry using credentials + from a `.dockercfg` file, so that containers can push images + 3. As a cluster operator, I want to allow a pod to access a git repository using SSH keys, + so that I can push and fetch to and from the repository +2. As a user, I want to allow containers to consume supplemental information about services such + as username and password which should be kept secret, so that I can share secrets about a + service amongst the containers in my application securely +3. As a user, I want to associate a pod with a `ServiceAccount` that consumes a secret and have + the kubelet implement some reserved behaviors based on the types of secrets the service account + consumes: + 1. Use credentials for a docker registry to pull the pod's docker image + 2. Present kubernetes auth token to the pod or transparently decorate traffic between the pod + and master service +4. As a user, I want to be able to indicate that a secret expires and for that secret's value to + be rotated once it expires, so that the system can help me follow good practices + +### Use-Case: Configuration artifacts + +Many configuration files contain secrets intermixed with other configuration information. For +example, a user's application may contain a properties file than contains database credentials, +SaaS API tokens, etc. Users should be able to consume configuration artifacts in their containers +and be able to control the path on the container's filesystems where the artifact will be +presented. + +### Use-Case: Metadata about services + +Most pieces of information about how to use a service are secrets. For example, a service that +provides a MySQL database needs to provide the username, password, and database name to consumers +so that they can authenticate and use the correct database. Containers in pods consuming the MySQL +service would also consume the secrets associated with the MySQL service. + +### Use-Case: Secrets associated with service accounts + +[Service Accounts](https://github.com/GoogleCloudPlatform/kubernetes/pull/2297) are proposed as a +mechanism to decouple capabilities and security contexts from individual human users. A +`ServiceAccount` contains references to some number of secrets. A `Pod` can specify that it is +associated with a `ServiceAccount`. Secrets should have a `Type` field to allow the Kubelet and +other system components to take action based on the secret's type. + +#### Example: service account consumes auth token secret + +As an example, the service account proposal discusses service accounts consuming secrets which +contain kubernetes auth tokens. When a Kubelet starts a pod associated with a service account +which consumes this type of secret, the Kubelet may take a number of actions: + +1. Expose the secret in a `.kubernetes_auth` file in a well-known location in the container's + file system +2. Configure that node's `kube-proxy` to decorate HTTP requests from that pod to the + `kubernetes-master` service with the auth token, e. g. by adding a header to the request + (see the [LOAS Daemon](https://github.com/GoogleCloudPlatform/kubernetes/issues/2209) proposal) + +#### Example: service account consumes docker registry credentials + +Another example use case is where a pod is associated with a secret containing docker registry +credentials. The Kubelet could use these credentials for the docker pull to retrieve the image. + +### Use-Case: Secret expiry and rotation + +Rotation is considered a good practice for many types of secret data. It should be possible to +express that a secret has an expiry date; this would make it possible to implement a system +component that could regenerate expired secrets. As an example, consider a component that rotates +expired secrets. The rotator could periodically regenerate the values for expired secrets of +common types and update their expiry dates. + +## Deferral: Consuming secrets as environment variables + +Some images will expect to receive configuration items as environment variables instead of files. +We should consider what the best way to allow this is; there are a few different options: + +1. Force the user to adapt files into environment variables. Users can store secrets that need to + be presented as environment variables in a format that is easy to consume from a shell: + + $ cat /etc/secrets/my-secret.txt + export MY_SECRET_ENV=MY_SECRET_VALUE + + The user could `source` the file at `/etc/secrets/my-secret` prior to executing the command for + the image either inline in the command or in an init script, + +2. Give secrets an attribute that allows users to express the intent that the platform should + generate the above syntax in the file used to present a secret. The user could consume these + files in the same manner as the above option. + +3. Give secrets attributes that allow the user to express that the secret should be presented to + the container as an environment variable. The container's environment would contain the + desired values and the software in the container could use them without accomodation the + command or setup script. + +For our initial work, we will treat all secrets as files to narrow the problem space. There will +be a future proposal that handles exposing secrets as environment variables. + +## Flow analysis of secret data with respect to the API server + +There are two fundamentally different use-cases for access to secrets: + +1. CRUD operations on secrets by their owners +2. Read-only access to the secrets needed for a particular node by the kubelet + +### Use-Case: CRUD operations by owners + +In use cases for CRUD operations, the user experience for secrets should be no different than for +other API resources. + +#### Data store backing the REST API + +The data store backing the REST API should be pluggable because different cluster operators will +have different preferences for the central store of secret data. Some possibilities for storage: + +1. An etcd collection alongside the storage for other API resources +2. A collocated [HSM](http://en.wikipedia.org/wiki/Hardware_security_module) +3. An external datastore such as an external etcd, RDBMS, etc. + +#### Size limit for secrets + +There should be a size limit for secrets in order to: + +1. Prevent DOS attacks against the API server +2. Allow kubelet implementations that prevent secret data from touching the node's filesystem + +The size limit should satisfy the following conditions: + +1. Large enough to store common artifact types (encryption keypairs, certificates, small + configuration files) +2. Small enough to avoid large impact on node resource consumption (storage, RAM for tmpfs, etc) + +To begin discussion, we propose an initial value for this size limit of **1MB**. + +#### Other limitations on secrets + +Defining a policy for limitations on how a secret may be referenced by another API resource and how +constraints should be applied throughout the cluster is tricky due to the number of variables +involved: + +1. Should there be a maximum number of secrets a pod can reference via a volume? +2. Should there be a maximum number of secrets a service account can reference? +3. Should there be a total maximum number of secrets a pod can reference via its own spec and its + associated service account? +4. Should there be a total size limit on the amount of secret data consumed by a pod? +5. How will cluster operators want to be able to configure these limits? +6. How will these limits impact API server validations? +7. How will these limits affect scheduling? + +For now, we will not implement validations around these limits. Cluster operators will decide how +much node storage is allocated to secrets. It will be the operator's responsibility to ensure that +the allocated storage is sufficient for the workload scheduled onto a node. + +### Use-Case: Kubelet read of secrets for node + +The use-case where the kubelet reads secrets has several additional requirements: + +1. Kubelets should only be able to receive secret data which is required by pods scheduled onto + the kubelet's node +2. Kubelets should have read-only access to secret data +3. Secret data should not be transmitted over the wire insecurely +4. Kubelets must ensure pods do not have access to each other's secrets + +#### Read of secret data by the Kubelet + +The Kubelet should only be allowed to read secrets which are consumed by pods scheduled onto that +Kubelet's node and their associated service accounts. Authorization of the Kubelet to read this +data would be delegated to an authorization plugin and associated policy rule. + +#### Secret data on the node: data at rest + +Consideration must be given to whether secret data should be allowed to be at rest on the node: + +1. If secret data is not allowed to be at rest, the size of secret data becomes another draw on + the node's RAM - should it affect scheduling? +2. If secret data is allowed to be at rest, should it be encrypted? + 1. If so, how should be this be done? + 2. If not, what threats exist? What types of secret are appropriate to store this way? + +For the sake of limiting complexity, we propose that initially secret data should not be allowed +to be at rest on a node; secret data should be stored on a node-level tmpfs filesystem. This +filesystem can be subdivided into directories for use by the kubelet and by the volume plugin. + +#### Secret data on the node: resource consumption + +The Kubelet will be responsible for creating the per-node tmpfs file system for secret storage. +It is hard to make a prescriptive declaration about how much storage is appropriate to reserve for +secrets because different installations will vary widely in available resources, desired pod to +node density, overcommit policy, and other operation dimensions. That being the case, we propose +for simplicity that the amount of secret storage be controlled by a new parameter to the kubelet +with a default value of **64MB**. It is the cluster operator's responsibility to handle choosing +the right storage size for their installation and configuring their Kubelets correctly. + +Configuring each Kubelet is not the ideal story for operator experience; it is more intuitive that +the cluster-wide storage size be readable from a central configuration store like the one proposed +in [#1553](https://github.com/GoogleCloudPlatform/kubernetes/issues/1553). When such a store +exists, the Kubelet could be modified to read this configuration item from the store. + +When the Kubelet is modified to advertise node resources (as proposed in +[#4441](https://github.com/GoogleCloudPlatform/kubernetes/issues/4441)), the capacity calculation +for available memory should factor in the potential size of the node-level tmpfs in order to avoid +memory overcommit on the node. + +#### Secret data on the node: isolation + +Every pod will have a [security context](https://github.com/GoogleCloudPlatform/kubernetes/pull/3910). +Secret data on the node should be isolated according to the security context of the container. The +Kubelet volume plugin API will be changed so that a volume plugin receives the security context of +a volume along with the volume spec. This will allow volume plugins to implement setting the +security context of volumes they manage. + +## Community work: + +Several proposals / upstream patches are notable as background for this proposal: + +1. [Docker vault proposal](https://github.com/docker/docker/issues/10310) +2. [Specification for image/container standardization based on volumes](https://github.com/docker/docker/issues/9277) +3. [Kubernetes service account proposal](https://github.com/GoogleCloudPlatform/kubernetes/pull/2297) +4. [Secrets proposal for docker (1)](https://github.com/docker/docker/pull/6075) +5. [Secrets proposal for docker (2)](https://github.com/docker/docker/pull/6697) + +## Proposed Design + +We propose a new `Secret` resource which is mounted into containers with a new volume type. Secret +volumes will be handled by a volume plugin that does the actual work of fetching the secret and +storing it. Secrets contain multiple pieces of data that are presented as different files within +the secret volume (example: SSH key pair). + +In order to remove the burden from the end user in specifying every file that a secret consists of, +it should be possible to mount all files provided by a secret with a single ```VolumeMount``` entry +in the container specification. + +### Secret API Resource + +A new resource for secrets will be added to the API: + +```go +type Secret struct { + TypeMeta + ObjectMeta + + // Keys in this map are the paths relative to the volume + // presented to a container for this secret data. + Data map[string][]byte + Type SecretType +} + +type SecretType string + +const ( + SecretTypeOpaque SecretType = "opaque" // Opaque (arbitrary data; default) + SecretTypeKubernetesAuthToken SecretType = "kubernetes-auth" // Kubernetes auth token + SecretTypeDockerRegistryAuth SecretType = "docker-reg-auth" // Docker registry auth + // FUTURE: other type values +) + +const MaxSecretSize = 1 * 1024 * 1024 +``` + +A Secret can declare a type in order to provide type information to system components that work +with secrets. The default type is `opaque`, which represents arbitrary user-owned data. + +Secrets are validated against `MaxSecretSize`. + +A new REST API and registry interface will be added to accompany the `Secret` resource. The +default implementation of the registry will store `Secret` information in etcd. Future registry +implementations could store the `TypeMeta` and `ObjectMeta` fields in etcd and store the secret +data in another data store entirely, or store the whole object in another data store. + +#### Other validations related to secrets + +Initially there will be no validations for the number of secrets a pod references, or the number of +secrets that can be associated with a service account. These may be added in the future as the +finer points of secrets and resource allocation are fleshed out. + +### Secret Volume Source + +A new `SecretSource` type of volume source will be added to the ```VolumeSource``` struct in the +API: + +```go +type VolumeSource struct { + // Other fields omitted + + // SecretSource represents a secret that should be presented in a volume + SecretSource *SecretSource `json:"secret"` +} + +type SecretSource struct { + Target ObjectReference +} +``` + +Secret volume sources are validated to ensure that the specified object reference actually points +to an object of type `Secret`. + +### Secret Volume Plugin + +A new Kubelet volume plugin will be added to handle volumes with a secret source. This plugin will +require access to the API server to retrieve secret data and therefore the volume `Host` interface +will have to change to expose a client interface: + +```go +type Host interface { + // Other methods omitted + + // GetKubeClient returns a client interface + GetKubeClient() client.Interface +} +``` + +The secret volume plugin will be responsible for: + +1. Returning a `volume.Builder` implementation from `NewBuilder` that: + 1. Retrieves the secret data for the volume from the API server + 2. Places the secret data onto the container's filesystem + 3. Sets the correct security attributes for the volume based on the pod's `SecurityContext` +2. Returning a `volume.Cleaner` implementation from `NewClear` that cleans the volume from the + container's filesystem + +### Kubelet: Node-level secret storage + +The Kubelet must be modified to accept a new parameter for the secret storage size and to create +a tmpfs file system of that size to store secret data. Rough accounting of specific changes: + +1. The Kubelet should have a new field added called `secretStorageSize`; units are megabytes +2. `NewMainKubelet` should accept a value for secret storage size +3. The Kubelet server should have a new flag added for secret storage size +4. The Kubelet's `setupDataDirs` method should be changed to create the secret storage + +### Kubelet: New behaviors for secrets associated with service accounts + +For use-cases where the Kubelet's behavior is affected by the secrets associated with a pod's +`ServiceAccount`, the Kubelet will need to be changed. For example, if secrets of type +`docker-reg-auth` affect how the pod's images are pulled, the Kubelet will need to be changed +to accomodate this. Subsequent proposals can address this on a type-by-type basis. + +## Examples + +For clarity, let's examine some detailed examples of some common use-cases in terms of the +suggested changes. All of these examples are assumed to be created in a namespace called +`example`. + +### Use-Case: Pod with ssh keys + +To create a pod that uses an ssh key stored as a secret, we first need to create a secret: + +```json +{ + "apiVersion": "v1beta2", + "kind": "Secret", + "id": "ssh-key-secret", + "data": { + "id_rsa.pub": "dmFsdWUtMQ0K", + "id_rsa": "dmFsdWUtMg0KDQo=" + } +} +``` + +**Note:** The values of secret data are encoded as base64-encoded strings. + +Now we can create a pod which references the secret with the ssh key and consumes it in a volume: + +```json +{ + "id": "secret-test-pod", + "kind": "Pod", + "apiVersion":"v1beta2", + "labels": { + "name": "secret-test" + }, + "desiredState": { + "manifest": { + "version": "v1beta1", + "id": "secret-test-pod", + "containers": [{ + "name": "ssh-test-container", + "image": "mySshImage", + "volumeMounts": [{ + "name": "secret-volume", + "mountPath": "/etc/secret-volume", + "readOnly": true + }] + }], + "volumes": [{ + "name": "secret-volume", + "source": { + "secret": { + "target": { + "kind": "Secret", + "namespace": "example", + "name": "ssh-key-secret" + } + } + } + }] + } + } +} +``` + +When the container's command runs, the pieces of the key will be available in: + + /etc/secret-volume/id_rsa.pub + /etc/secret-volume/id_rsa + +The container is then free to use the secret data to establish an ssh connection. + +### Use-Case: Pods with pod / test credentials + +Let's compare examples where a pod consumes a secret containing prod credentials and another pod +consumes a secret with test environment credentials. + +The secrets: + +```json +[{ + "apiVersion": "v1beta2", + "kind": "Secret", + "id": "prod-db-secret", + "data": { + "username": "dmFsdWUtMQ0K", + "password": "dmFsdWUtMg0KDQo=" + } +}, +{ + "apiVersion": "v1beta2", + "kind": "Secret", + "id": "test-db-secret", + "data": { + "username": "dmFsdWUtMQ0K", + "password": "dmFsdWUtMg0KDQo=" + } +}] +``` + +The pods: + +```json +[{ + "id": "prod-db-client-pod", + "kind": "Pod", + "apiVersion":"v1beta2", + "labels": { + "name": "prod-db-client" + }, + "desiredState": { + "manifest": { + "version": "v1beta1", + "id": "prod-db-pod", + "containers": [{ + "name": "db-client-container", + "image": "myClientImage", + "volumeMounts": [{ + "name": "secret-volume", + "mountPath": "/etc/secret-volume", + "readOnly": true + }] + }], + "volumes": [{ + "name": "secret-volume", + "source": { + "secret": { + "target": { + "kind": "Secret", + "namespace": "example", + "name": "prod-db-secret" + } + } + } + }] + } + } +}, +{ + "id": "test-db-client-pod", + "kind": "Pod", + "apiVersion":"v1beta2", + "labels": { + "name": "test-db-client" + }, + "desiredState": { + "manifest": { + "version": "v1beta1", + "id": "test-db-pod", + "containers": [{ + "name": "db-client-container", + "image": "myClientImage", + "volumeMounts": [{ + "name": "secret-volume", + "mountPath": "/etc/secret-volume", + "readOnly": true + }] + }], + "volumes": [{ + "name": "secret-volume", + "source": { + "secret": { + "target": { + "kind": "Secret", + "namespace": "example", + "name": "test-db-secret" + } + } + } + }] + } + } +}] +``` + +The specs for the two pods differ only in the value of the object referred to by the secret volume +source. Both containers will have the following files present on their filesystems: + + /etc/secret-volume/username + /etc/secret-volume/password -- cgit v1.2.3 From 3cb657fac08b3c302cb244066c69e73469ccafd1 Mon Sep 17 00:00:00 2001 From: Zach Loafman Date: Wed, 18 Feb 2015 07:51:36 -0800 Subject: Document current ways to run a single e2e --- development.md | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/development.md b/development.md index 3d05f71f..302f4af8 100644 --- a/development.md +++ b/development.md @@ -182,8 +182,11 @@ go run hack/e2e.go --pushup # Run all tests go run hack/e2e.go --test -# Run tests matching a glob. -go run hack/e2e.go --tests=... +# Run tests matching the regex "Pods.*env" +go run hack/e2e.go -v -test --test_args="--ginkgo.focus=Pods.*env" + +# Alternately, if you have the e2e cluster up and no desire to see the event stream, you can run ginkgo-e2e.sh directly: +hack/ginkgo-e2e.sh --ginkgo.focus=Pods.*env ``` ### Combining flags -- cgit v1.2.3 From d20061eeff1b9cfa0774b9259143ca7f7c859791 Mon Sep 17 00:00:00 2001 From: Alex Robinson Date: Wed, 18 Feb 2015 22:04:56 +0000 Subject: Combine the two documentation sections on how to use godeps. --- development.md | 32 +++++++------------------------- 1 file changed, 7 insertions(+), 25 deletions(-) diff --git a/development.md b/development.md index 302f4af8..67ef5916 100644 --- a/development.md +++ b/development.md @@ -49,7 +49,7 @@ export PATH=$PATH:$GOPATH/bin ``` ### Using godep -Here is a quick summary of `godep`. `godep` helps manage third party dependencies by copying known versions into Godeps/_workspace. Here is the recommended way to set up your system. There are other ways that may work, but this is the easiest one I know of. +Here's a quick walkthrough of one way to use godeps to add or update a Kubernetes dependency into Godeps/_workspace. For more details, please see the instructions in [godep's documentation](https://github.com/tools/godep). 1) Devote a directory to this endeavor: ``` @@ -69,7 +69,7 @@ export GOPATH=$KPATH # Option B is recommended if you're going to mess with the dependencies. ``` -3) Populate your new $GOPATH. +3) Populate your new GOPATH. ``` cd $KPATH/src/github.com/GoogleCloudPlatform/kubernetes godep restore @@ -77,20 +77,22 @@ godep restore 4) Next, you can either add a new dependency or update an existing one. ``` -# To add a new dependency, run: +# To add a new dependency, do: cd $KPATH/src/github.com/GoogleCloudPlatform/kubernetes go get path/to/dependency +# Change code in Kubernetes to use the dependency. godep save ./... -# To update an existing dependency, do +# To update an existing dependency, do: cd $KPATH/src/github.com/GoogleCloudPlatform/kubernetes go get -u path/to/dependency +# Change code in Kubernetes accordingly if necessary. godep update path/to/dependency ``` 5) Before sending your PR, it's a good idea to sanity check that your Godeps.json file is ok by re-restoring: ```godep restore``` -I (lavalamp) have sometimes found it expedient to manually fix the /Godeps/godeps.json file to minimize the changes. +It is sometimes expedient to manually fix the /Godeps/godeps.json file to minimize the changes. Please send dependency updates in separate commits within your PR, for easier reviewing. @@ -208,26 +210,6 @@ go run e2e.go -ctl='delete pod foobar' ## Testing out flaky tests [Instructions here](flaky-tests.md) -## Add/Update dependencies - -Kubernetes uses [godep](https://github.com/tools/godep) to manage dependencies. To add or update a package, please follow the instructions on [godep's document](https://github.com/tools/godep). - -To add a new package ``foo/bar``: - -- Make sure the kubernetes' root directory is in $GOPATH/github.com/GoogleCloudPlatform/kubernetes -- Run ``godep restore`` to make sure you have all dependancies pulled. -- Download foo/bar into the first directory in GOPATH: ``go get foo/bar``. -- Change code in kubernetes to use ``foo/bar``. -- Run ``godep save ./...`` under kubernetes' root directory. - -To update a package ``foo/bar``: - -- Make sure the kubernetes' root directory is in $GOPATH/github.com/GoogleCloudPlatform/kubernetes -- Run ``godep restore`` to make sure you have all dependancies pulled. -- Update the package with ``go get -u foo/bar``. -- Change code in kubernetes accordingly if necessary. -- Run ``godep update foo/bar`` under kubernetes' root directory. - ## Keeping your development fork in sync One time after cloning your forked repo: -- cgit v1.2.3 From 4ee6432fbc1cadf071dc71634391b4ceacadc425 Mon Sep 17 00:00:00 2001 From: gmarek Date: Thu, 19 Feb 2015 14:50:54 +0100 Subject: Add info about contention profiling to profiling.md --- profiling.md | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/profiling.md b/profiling.md index 68d1cc24..142ef11e 100644 --- a/profiling.md +++ b/profiling.md @@ -23,8 +23,12 @@ Even when running profiler I found not really straightforward to use 'go tool pp ``` ssh kubernetes_master -L:localhost:8080 ``` -or analogous one for you Cloud provider. Afterwards you can e.g. run +or analogous one for you Cloud provider. Afterwards you can e.g. run ``` go tool pprof http://localhost:/debug/pprof/profile ``` to get 30 sec. CPU profile. + +## Contention profiling + +To enable contetion profiling you need to add line ```rt.SetBlockProfileRate(1)``` to ones added before (```rt``` stands for ```runtime``` in ```master.go```). This enables 'debug/pprof/block' subpage, which can be used as an input go to ```go tool pprof```. -- cgit v1.2.3 From e6e17729be57537bc49aa8734c2bfdb202ebdbd4 Mon Sep 17 00:00:00 2001 From: Paul Morie Date: Thu, 19 Feb 2015 10:25:13 -0500 Subject: Minor addendums to secrets proposal --- secrets.md | 31 +++++++++++++++++++++---------- 1 file changed, 21 insertions(+), 10 deletions(-) diff --git a/secrets.md b/secrets.md index 6d561eec..ce02f930 100644 --- a/secrets.md +++ b/secrets.md @@ -29,6 +29,8 @@ Goals of this design: secrets belonging to containers scheduled onto it * If the master is compromised, all secrets in the cluster may be exposed * Secret rotation is an orthogonal concern, but it should be facilitated by this proposal +* A user who can consume a secret in a container can know the value of the secret; secrets must + be provisioned judiciously ## Use Cases @@ -270,10 +272,12 @@ type Secret struct { TypeMeta ObjectMeta - // Keys in this map are the paths relative to the volume - // presented to a container for this secret data. - Data map[string][]byte - Type SecretType + // Data contains the secret data. Each key must be a valid DNS_SUBDOMAIN. + // The serialized form of the secret data is a base64 encoded string. + Data map[string][]byte `json:"data,omitempty"` + + // Used to facilitate programatic handling of secret data. + Type SecretType `json:"type,omitempty"` } type SecretType string @@ -291,7 +295,8 @@ const MaxSecretSize = 1 * 1024 * 1024 A Secret can declare a type in order to provide type information to system components that work with secrets. The default type is `opaque`, which represents arbitrary user-owned data. -Secrets are validated against `MaxSecretSize`. +Secrets are validated against `MaxSecretSize`. The keys in the `Data` field must be valid DNS +subdomains. A new REST API and registry interface will be added to accompany the `Secret` resource. The default implementation of the registry will store `Secret` information in etcd. Future registry @@ -325,6 +330,11 @@ type SecretSource struct { Secret volume sources are validated to ensure that the specified object reference actually points to an object of type `Secret`. +In the future, the `SecretSource` will be extended to allow: + +1. Fine-grained control over which pieces of secret data are exposed in the volume +2. The paths and filenames for how secret data are exposed + ### Secret Volume Plugin A new Kubelet volume plugin will be added to handle volumes with a secret source. This plugin will @@ -382,13 +392,14 @@ To create a pod that uses an ssh key stored as a secret, we first need to create "kind": "Secret", "id": "ssh-key-secret", "data": { - "id_rsa.pub": "dmFsdWUtMQ0K", - "id_rsa": "dmFsdWUtMg0KDQo=" + "id-rsa.pub": "dmFsdWUtMQ0K", + "id-rsa": "dmFsdWUtMg0KDQo=" } } ``` -**Note:** The values of secret data are encoded as base64-encoded strings. +**Note:** The values of secret data are encoded as base64-encoded strings. Newlines are not +valid within these strings and must be omitted. Now we can create a pod which references the secret with the ssh key and consumes it in a volume: @@ -432,8 +443,8 @@ Now we can create a pod which references the secret with the ssh key and consume When the container's command runs, the pieces of the key will be available in: - /etc/secret-volume/id_rsa.pub - /etc/secret-volume/id_rsa + /etc/secret-volume/id-rsa.pub + /etc/secret-volume/id-rsa The container is then free to use the secret data to establish an ssh connection. -- cgit v1.2.3 From 0e7dfbc995f979400dae9c740fe8fa8ea6203c49 Mon Sep 17 00:00:00 2001 From: Jeff Grafton Date: Thu, 19 Feb 2015 18:40:28 -0800 Subject: Update development doc on how to generate code coverage reports. --- development.md | 19 ++++++++++++++++++- 1 file changed, 18 insertions(+), 1 deletion(-) diff --git a/development.md b/development.md index 67ef5916..615b4d55 100644 --- a/development.md +++ b/development.md @@ -133,11 +133,28 @@ ok github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet 0.317s ``` ## Coverage + +Currently, collecting coverage is only supported for the Go unit tests. + +To run all unit tests and generate an HTML coverage report, run the following: + +``` +cd kubernetes +KUBE_COVER=y hack/test-go.sh +``` + +At the end of the run, an the HTML report will be generated with the path printed to stdout. + +To run tests and collect coverage in only one package, pass its relative path under the `kubernetes` directory as an argument, for example: ``` cd kubernetes -godep go tool cover -html=target/c.out +KUBE_COVER=y hack/test-go.sh pkg/kubectl ``` +Multiple arguments can be passed, in which case the coverage results will be combined for all tests run. + +Coverage results for the project can also be viewed on [Coveralls](https://coveralls.io/r/GoogleCloudPlatform/kubernetes), and are continuously updated as commits are merged. Additionally, all pull requests which spawn a Travis build will report unit test coverage results to Coveralls. + ## Integration tests You need an [etcd](https://github.com/coreos/etcd/releases/tag/v2.0.0) in your path, please make sure it is installed and in your ``$PATH``. -- cgit v1.2.3 From a65c29ed5cd9092522b0b6e4791984e2dabe5bce Mon Sep 17 00:00:00 2001 From: gmarek Date: Fri, 20 Feb 2015 09:39:13 +0100 Subject: apply comments --- profiling.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/profiling.md b/profiling.md index 142ef11e..1e14b5c4 100644 --- a/profiling.md +++ b/profiling.md @@ -31,4 +31,4 @@ to get 30 sec. CPU profile. ## Contention profiling -To enable contetion profiling you need to add line ```rt.SetBlockProfileRate(1)``` to ones added before (```rt``` stands for ```runtime``` in ```master.go```). This enables 'debug/pprof/block' subpage, which can be used as an input go to ```go tool pprof```. +To enable contetion profiling you need to add line ```rt.SetBlockProfileRate(1)``` in addition to ```m.mux.HandleFunc(...)``` added before (```rt``` stands for ```runtime``` in ```master.go```). This enables 'debug/pprof/block' subpage, which can be used as an input to ```go tool pprof```. -- cgit v1.2.3 From d1ed142faf5e511f742cab9a77e21ee1ca58edc2 Mon Sep 17 00:00:00 2001 From: Andy Goldstein Date: Thu, 8 Jan 2015 15:41:38 -0500 Subject: Add streaming command execution & port forwarding Add streaming command execution & port forwarding via HTTP connection upgrades (currently using SPDY). --- command_execution_port_forwarding.md | 144 +++++++++++++++++++++++++++++++++++ 1 file changed, 144 insertions(+) create mode 100644 command_execution_port_forwarding.md diff --git a/command_execution_port_forwarding.md b/command_execution_port_forwarding.md new file mode 100644 index 00000000..3b9aeec7 --- /dev/null +++ b/command_execution_port_forwarding.md @@ -0,0 +1,144 @@ +# Container Command Execution & Port Forwarding in Kubernetes + +## Abstract + +This describes an approach for providing support for: + +- executing commands in containers, with stdin/stdout/stderr streams attached +- port forwarding to containers + +## Background + +There are several related issues/PRs: + +- [Support attach](https://github.com/GoogleCloudPlatform/kubernetes/issues/1521) +- [Real container ssh](https://github.com/GoogleCloudPlatform/kubernetes/issues/1513) +- [Provide easy debug network access to services](https://github.com/GoogleCloudPlatform/kubernetes/issues/1863) +- [OpenShift container command execution proposal](https://github.com/openshift/origin/pull/576) + +## Motivation + +Users and administrators are accustomed to being able to access their systems +via SSH to run remote commands, get shell access, and do port forwarding. + +Supporting SSH to containers in Kubernetes is a difficult task. You must +specify a "user" and a hostname to make an SSH connection, and `sshd` requires +real users (resolvable by NSS and PAM). Because a container belongs to a pod, +and the pod belongs to a namespace, you need to specify namespace/pod/container +to uniquely identify the target container. Unfortunately, a +namespace/pod/container is not a real user as far as SSH is concerned. Also, +most Linux systems limit user names to 32 characters, which is unlikely to be +large enough to contain namespace/pod/container. We could devise some scheme to +map each namespace/pod/container to a 32-character user name, adding entries to +`/etc/passwd` (or LDAP, etc.) and keeping those entries fully in sync all the +time. Alternatively, we could write custom NSS and PAM modules that allow the +host to resolve a namespace/pod/container to a user without needing to keep +files or LDAP in sync. + +As an alternative to SSH, we are using a multiplexed streaming protocol that +runs on top of HTTP. There are no requirements about users being real users, +nor is there any limitation on user name length, as the protocol is under our +control. The only downside is that standard tooling that expects to use SSH +won't be able to work with this mechanism, unless adapters can be written. + +## Constraints and Assumptions + +- SSH support is not currently in scope +- CGroup confinement is ultimately desired, but implementing that support is not currently in scope +- SELinux confinement is ultimately desired, but implementing that support is not currently in scope + +## Use Cases + +- As a user of a Kubernetes cluster, I want to run arbitrary commands in a container, attaching my local stdin/stdout/stderr to the container +- As a user of a Kubernetes cluster, I want to be able to connect to local ports on my computer and have them forwarded to ports in the container + +## Process Flow + +### Remote Command Execution Flow +1. The client connects to the Kubernetes Master to initiate a remote command execution +request +2. The Master proxies the request to the Kubelet where the container lives +3. The Kubelet executes nsenter + the requested command and streams stdin/stdout/stderr back and forth between the client and the container + +### Port Forwarding Flow +1. The client connects to the Kubernetes Master to initiate a remote command execution +request +2. The Master proxies the request to the Kubelet where the container lives +3. The client listens on each specified local port, awaiting local connections +4. The client connects to one of the local listening ports +4. The client notifies the Kubelet of the new connection +5. The Kubelet executes nsenter + socat and streams data back and forth between the client and the port in the container + + +## Design Considerations + +### Streaming Protocol + +The current multiplexed streaming protocol used is SPDY. This is not the +long-term desire, however. As soon as there is viable support for HTTP/2 in Go, +we will switch to that. + +### Master as First Level Proxy + +Clients should not be allowed to communicate directly with the Kubelet for +security reasons. Therefore, the Master is currently the only suggested entry +point to be used for remote command execution and port forwarding. This is not +necessarily desirable, as it means that all remote command execution and port +forwarding traffic must travel through the Master, potentially impacting other +API requests. + +In the future, it might make more sense to retrieve an authorization token from +the Master, and then use that token to initiate a remote command execution or +port forwarding request with a load balanced proxy service dedicated to this +functionality. This would keep the streaming traffic out of the Master. + +### Kubelet as Backend Proxy + +The kubelet is currently responsible for handling remote command execution and +port forwarding requests. Just like with the Master described above, this means +that all remote command execution and port forwarding streaming traffic must +travel through the Kubelet, which could result in a degraded ability to service +other requests. + +In the future, it might make more sense to use a separate service on the node. + +Alternatively, we could possibly inject a process into the container that only +listens for a single request, expose that process's listening port on the node, +and then issue a redirect to the client such that it would connect to the first +level proxy, which would then proxy directly to the injected process's exposed +port. This would minimize the amount of proxying that takes place. + +### Scalability + +There are at least 2 different ways to execute a command in a container: +`docker exec` and `nsenter`. While `docker exec` might seem like an easier and +more obvious choice, it has some drawbacks. + +#### `docker exec` + +We could expose `docker exec` (i.e. have Docker listen on an exposed TCP port +on the node), but this would require proxying from the edge and securing the +Docker API. `docker exec` calls go through the Docker daemon, meaning that all +stdin/stdout/stderr traffic is proxied through the Daemon, adding an extra hop. +Additionally, you can't isolate 1 malicious `docker exec` call from normal +usage, meaning an attacker could initiate a denial of service or other attack +and take down the Docker daemon, or the node itself. + +We expect remote command execution and port forwarding requests to be long +running and/or high bandwidth operations, and routing all the streaming data +through the Docker daemon feels like a bottleneck we can avoid. + +#### `nsenter` + +The implementation currently uses `nsenter` to run commands in containers, +joining the appropriate container namespaces. `nsenter` runs directly on the +node and is not proxied through any single daemon process. + +### Security + +Authentication and authorization hasn't specifically been tested yet with this +functionality. We need to make sure that users are not allowed to execute +remote commands or do port forwarding to containers they aren't allowed to +access. + +Additional work is required to ensure that multiple command execution or port forwarding connections from different clients are not able to see each other's data. This can most likely be achieved via SELinux labeling and unique process contexts. \ No newline at end of file -- cgit v1.2.3 From 684bb8868e54d701edab9b6eeaa3500fb6a931e1 Mon Sep 17 00:00:00 2001 From: Deyuan Deng Date: Fri, 20 Feb 2015 10:44:02 -0500 Subject: Admission doc cleanup --- admission_control.md | 6 +++--- admission_control_limit_range.md | 20 ++++++++++---------- admission_control_resource_quota.md | 25 +++++++++++++------------ 3 files changed, 26 insertions(+), 25 deletions(-) diff --git a/admission_control.md b/admission_control.md index 88afda73..1e1c1e53 100644 --- a/admission_control.md +++ b/admission_control.md @@ -1,6 +1,6 @@ # Kubernetes Proposal - Admission Control -**Related PR:** +**Related PR:** | Topic | Link | | ----- | ---- | @@ -35,7 +35,7 @@ The kube-apiserver takes the following OPTIONAL arguments to enable admission co An **AdmissionControl** plug-in is an implementation of the following interface: -``` +```go package admission // Attributes is an interface used by a plug-in to make an admission decision on a individual request. @@ -57,7 +57,7 @@ type Interface interface { A **plug-in** must be compiled with the binary, and is registered as an available option by providing a name, and implementation of admission.Interface. -``` +```go func init() { admission.RegisterPlugin("AlwaysDeny", func(client client.Interface, config io.Reader) (admission.Interface, error) { return NewAlwaysDeny(), nil }) } diff --git a/admission_control_limit_range.md b/admission_control_limit_range.md index 69fe144b..e3a56c87 100644 --- a/admission_control_limit_range.md +++ b/admission_control_limit_range.md @@ -9,7 +9,7 @@ This document proposes a system for enforcing min/max limits per resource as par A new resource, **LimitRange**, is introduced to enumerate min/max limits for a resource type scoped to a Kubernetes namespace. -``` +```go const ( // Limit that applies to all pods in a namespace LimitTypePod string = "Pod" @@ -54,7 +54,7 @@ type LimitRangeList struct { ## AdmissionControl plugin: LimitRanger -The **LimitRanger** plug-in introspects all incoming admission requests. +The **LimitRanger** plug-in introspects all incoming admission requests. It makes decisions by evaluating the incoming object against all defined **LimitRange** objects in the request context namespace. @@ -97,20 +97,20 @@ kubectl is modified to support the **LimitRange** resource. For example, -``` +```shell $ kubectl namespace myspace $ kubectl create -f examples/limitrange/limit-range.json $ kubectl get limits NAME limits $ kubectl describe limits limits -Name: limits -Type Resource Min Max ----- -------- --- --- -Pod memory 1Mi 1Gi -Pod cpu 250m 2 -Container cpu 250m 2 -Container memory 1Mi 1Gi +Name: limits +Type Resource Min Max +---- -------- --- --- +Pod memory 1Mi 1Gi +Pod cpu 250m 2 +Container memory 1Mi 1Gi +Container cpu 250m 2 ``` ## Future Enhancements: Define limits for a particular pod or container. diff --git a/admission_control_resource_quota.md b/admission_control_resource_quota.md index 08bc6bec..ebad0728 100644 --- a/admission_control_resource_quota.md +++ b/admission_control_resource_quota.md @@ -10,7 +10,7 @@ A new resource, **ResourceQuota**, is introduced to enumerate hard resource limi A new resource, **ResourceQuotaUsage**, is introduced to support atomic updates of a **ResourceQuota** status. -``` +```go // The following identify resource constants for Kubernetes object types const ( // Pods, number @@ -139,14 +139,15 @@ $ kubectl namespace myspace $ kubectl create -f examples/resourcequota/resource-quota.json $ kubectl get quota NAME -myquota -$ kubectl describe quota myquota -Name: myquota -Resource Used Hard --------- ---- ---- -cpu 100m 20 -memory 0 1.5Gb -pods 1 10 -replicationControllers 1 10 -services 2 3 -``` \ No newline at end of file +quota +$ kubectl describe quota quota +Name: quota +Resource Used Hard +-------- ---- ---- +cpu 0m 20 +memory 0 1Gi +pods 5 10 +replicationcontrollers 5 20 +resourcequotas 1 1 +services 3 5 +``` -- cgit v1.2.3 From 9b36d8d8ed359be5e43215c7304adf6a53e78409 Mon Sep 17 00:00:00 2001 From: Eric Tune Date: Tue, 11 Nov 2014 10:52:31 -0800 Subject: Service account proposal. COMMIT_BLOCKED_ON_GENDOCS --- security.md | 4 +- security_context.md | 8 +-- service_accounts.md | 164 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 171 insertions(+), 5 deletions(-) create mode 100644 service_accounts.md diff --git a/security.md b/security.md index 27f07cd6..ba699739 100644 --- a/security.md +++ b/security.md @@ -97,6 +97,8 @@ A pod runs in a *security context* under a *service account* that is defined by * Secret distribution via files https://github.com/GoogleCloudPlatform/kubernetes/pull/2030 * Docker secrets https://github.com/docker/docker/pull/6697 * Docker vault https://github.com/docker/docker/issues/10310 +* Service Accounts: https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/design/service_accounts.md +* Secret volumes https://github.com/GoogleCloudPlatform/kubernetes/4126 ## Specific Design Points @@ -112,4 +114,4 @@ Both the Kubelet and Kube Proxy need information related to their specific roles The controller manager for Replication Controllers and other future controllers act on behalf of a user via delegation to perform automated maintenance on Kubernetes resources. Their ability to access or modify resource state should be strictly limited to their intended duties and they should be prevented from accessing information not pertinent to their role. For example, a replication controller needs only to create a copy of a known pod configuration, to determine the running state of an existing pod, or to delete an existing pod that it created - it does not need to know the contents or current state of a pod, nor have access to any data in the pods attached volumes. -The Kubernetes pod scheduler is responsible for reading data from the pod to fit it onto a minion in the cluster. At a minimum, it needs access to view the ID of a pod (to craft the binding), its current state, any resource information necessary to identify placement, and other data relevant to concerns like anti-affinity, zone or region preference, or custom logic. It does not need the ability to modify pods or see other resources, only to create bindings. It should not need the ability to delete bindings unless the scheduler takes control of relocating components on failed hosts (which could be implemented by a separate component that can delete bindings but not create them). The scheduler may need read access to user or project-container information to determine preferential location (underspecified at this time). \ No newline at end of file +The Kubernetes pod scheduler is responsible for reading data from the pod to fit it onto a minion in the cluster. At a minimum, it needs access to view the ID of a pod (to craft the binding), its current state, any resource information necessary to identify placement, and other data relevant to concerns like anti-affinity, zone or region preference, or custom logic. It does not need the ability to modify pods or see other resources, only to create bindings. It should not need the ability to delete bindings unless the scheduler takes control of relocating components on failed hosts (which could be implemented by a separate component that can delete bindings but not create them). The scheduler may need read access to user or project-container information to determine preferential location (underspecified at this time). diff --git a/security_context.md b/security_context.md index 400d30e9..7dc10e69 100644 --- a/security_context.md +++ b/security_context.md @@ -172,13 +172,13 @@ type IDMapping struct { // IDMappingRange specifies a mapping between container IDs and node IDs type IDMappingRange struct { - // ContainerID is the starting container ID + // ContainerID is the starting container UID or GID ContainerID int - // HostID is the starting host ID + // HostID is the starting host UID or GID HostID int - // Length is the length of the ID range + // Length is the length of the UID/GID range Length int } @@ -187,4 +187,4 @@ type IDMappingRange struct { #### Security Context Lifecycle -The lifecycle of a security context will be tied to that of a service account. It is expected that a service account with a default security context will be created for every Kubernetes namespace (without administrator intervention). If resources need to be allocated when creating a security context (for example, assign a range of host uids/gids), a pattern such as [finalizers](https://github.com/GoogleCloudPlatform/kubernetes/issues/3585) can be used before declaring the security context / service account / namespace ready for use. \ No newline at end of file +The lifecycle of a security context will be tied to that of a service account. It is expected that a service account with a default security context will be created for every Kubernetes namespace (without administrator intervention). If resources need to be allocated when creating a security context (for example, assign a range of host uids/gids), a pattern such as [finalizers](https://github.com/GoogleCloudPlatform/kubernetes/issues/3585) can be used before declaring the security context / service account / namespace ready for use. diff --git a/service_accounts.md b/service_accounts.md new file mode 100644 index 00000000..5d86f244 --- /dev/null +++ b/service_accounts.md @@ -0,0 +1,164 @@ +#Service Accounts + +## Motivation + +Processes in Pods may need to call the Kubernetes API. For example: + - scheduler + - replication controller + - minion controller + - a map-reduce type framework which has a controller that then tries to make a dynamically determined number of workers and watch them + - continuous build and push system + - monitoring system + +They also may interact with services other than the Kubernetes API, such as: + - an image repository, such as docker -- both when the images are pulled to start the containers, and for writing + images in the case of pods that generate images. + - accessing other cloud services, such as blob storage, in the context of a larged, integrated, cloud offering (hosted + or private). + - accessing files in an NFS volume attached to the pod + +## Design Overview +A service account binds together several things: + - a *name*, understood by users, and perhaps by peripheral systems, for an identity + - a *principal* that can be authenticated and (authorized)[../authorization.md] + - a [security context](./security_contexts.md), which defines the Linux Capabilities, User IDs, Groups IDs, and other + capabilities and controls on interaction with the file system and OS. + - a set of [secrets](./secrets.md), which a container may use to + access various networked resources. + +## Design Discussion + +A new object Kind is added: +```go +type ServiceAccount struct { + TypeMeta `json:",inline" yaml:",inline"` + ObjectMeta `json:"metadata,omitempty" yaml:"metadata,omitempty"` + + username string + securityContext ObjectReference // (reference to a securityContext object) + secrets []ObjectReference // (references to secret objects +} +``` + +The name ServiceAccount is chosen because it is widely used already (e.g. by Kerberos and LDAP) +to refer to this type of account. Note that it has no relation to kubernetes Service objects. + +The ServiceAccount object does not include any information that could not be defined separately: + - username can be defined however users are defined. + - securityContext and secrets are only referenced and are created using the REST API. + +The purpose of the serviceAccount object is twofold: + - to bind usernames to securityContexts and secrets, so that the username can be used to refer succinctly + in contexts where explicitly naming securityContexts and secrets would be inconvenient + - to provide an interface to simplify allocation of new securityContexts and secrets. +These features are explained later. + +### Names + +From the standpoint of the Kubernetes API, a `user` is any principal which can authenticate to kubernetes API. +This includes a human running `kubectl` on her desktop and a container in a Pod on a Node making API calls. + +There is already a notion of a username in kubernetes, which is populated into a request context after authentication. +However, there is no API object representing a user. While this may evolve, it is expected that in mature installations, +the canonical storage of user identifiers will be handled by a system external to kubernetes. + +Kubernetes does not dictate how to divide up the space of user identifier strings. User names can be +simple Unix-style short usernames, (e.g. `alice`), or may be qualified to allow for federated identity ( +`alice@example.com` vs `alice@example.org`.) Naming convention may distinguish service accounts from user +accounts (e.g. `alice@example.com` vs `build-service-account-a3b7f0@foo-namespace.service-accounts.example.com`), +but Kubernetes does not require this. + +Kubernetes also does not require that there be a distinction between human and Pod users. It will be possible +to setup a cluster where Alice the human talks to the kubernetes API as username `alice` and starts pods that +also talk to the API as user `alice` and write files to NFS as user `alice`. But, this is not recommended. + +Instead, it is recommended that Pods and Humans have distinct identities, and reference implementations will +make this distinction. + +The distinction is useful for a number of reasons: + - the requirements for humans and automated processes are different: + - Humans need a wide range of capabilities to do their daily activities. Automated processes often have more narrowly-defined activities. + - Humans may better tolerate the exceptional conditions created by expiration of a token. Remembering to handle + this in a program is more annoying. So, either long-lasting credentials or automated rotation of credentials is + needed. + - A Human typically keeps credentials on a machine that is not part of the cluster and so not subject to automatic + management. A VM with a role/service-account can have its credentials automatically managed. + - the identity of a Pod cannot in general be mapped to a single human. + - If policy allows, it may be created by one human, and then updated by another, and another, until its behavior cannot be attributed to a single human. + +**TODO**: consider getting rid of separate serviceAccount object and just rolling its parts into the SecurityContext or +Pod Object. + +The `secrets` field is a list of references to /secret objects that an process started as that service account should +have access to to be able to assert that role. + +The secrets are not inline with the serviceAccount object. This way, most or all users can have permission to `GET /serviceAccounts` so they can remind themselves +what serviceAccounts are available for use. + +Nothing will prevent creation of a serviceAccount with two secrets of type `SecretTypeKubernetesAuth`, or secrets of two +different types. Kubelet and client libraries will have some behavior, TBD, to handle the case of multiple secrets of a +given type (pick first or provide all and try each in order, etc). + +When a serviceAccount and a matching secret exist, then a `User.Info` for the serviceAccount and a `BearerToken` from the secret +are added to the map of tokens used by the authentication process in the apiserver, and similarly for other types. (We +might have some types that do not do anything on apiserver but just get pushed to the kubelet.) + +### Pods +The `PodSpec` is extended to have a `Pods.Spec.ServiceAccountUsername` field. If this is unset, then a +default value is chosen. If it is set, then the corresponding value of `Pods.Spec.SecurityContext` is set by the +Service Account Finalizer (see below). + +TBD: how policy limits which users can make pods with which service accounts. + +### Authorization +Kubernetes API Authorization Policies refer to users. Pods created with a `Pods.Spec.ServiceAccountUsername` typically +get a `Secret` which allows them to authenticate to the Kubernetes APIserver as a particular user. So any +policy that is desired can be applied to them. + +A higher level workflow is needed to coordinate creation of serviceAccounts, secrets and relevant policy objects. +Users are free to extend kubernetes to put this business logic wherever is convenient for them, though the +Service Account Finalizer is one place where this can happen (see below). + +### Kubelet + +The kubelet will treat as "not ready to run" (needing a finalizer to act on it) any Pod which has an empty +SecurityContext. + +The kubelet will set a default, restrictive, security context for any pods created from non-Apiserver config +sources (http, file). + +Kubelet watches apiserver for secrets which are needed by pods bound to it. + +**TODO**: how to only let kubelet see secrets it needs to know. + +### The service account finalizer + +There are several ways to use Pods with SecurityContexts and Secrets. + +One way is to explicitly specify the securityContext and all secrets of a Pod when the pod is initially created, +like this: + +**TODO**: example of pod with explicit refs. + +Another way is with the *Service Account Finalizer*, a plugin process which is optional, and which handles +business logic around service accounts. + +The Service Account Finalizer watches Pods, Namespaces, and ServiceAccount definitions. + +First, if it finds pods which have a `Pod.Spec.ServiceAccountUsername` but no `Pod.Spec.SecurityContext` set, +then it copies in the referenced securityContext and secrets references for the corresponding `serviceAccount`. + +Second, if ServiceAccount definitions change, it may take some actions. +**TODO**: decide what actions it takes when a serviceAccount defintion changes. Does it stop pods, or just +allow someone to list ones that out out of spec? In general, people may want to customize this? + +Third, if a new namespace is created, it may create a new serviceAccount for that namespace. This may include +a new username (e.g. `NAMESPACE-default-service-account@serviceaccounts.$CLUSTERID.kubernetes.io`), a new +securityContext, a newly generated secret to authenticate that serviceAccount to the Kubernetes API, and default +policies for that service account. +**TODO**: more concrete example. What are typical default permissions for default service account (e.g. readonly access +to services in the same namespace and read-write access to events in that namespace?) + +Finally, it may provide an interface to automate creation of new serviceAccounts. In that case, the user may want +to GET serviceAccounts to see what has been created. + -- cgit v1.2.3 From 0d339383f4c17664472df0eb03289161062bda11 Mon Sep 17 00:00:00 2001 From: Tim Hockin Date: Thu, 19 Feb 2015 22:03:36 -0800 Subject: minor fixups as I review secrets --- secrets.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/secrets.md b/secrets.md index ce02f930..ac8776bd 100644 --- a/secrets.md +++ b/secrets.md @@ -283,9 +283,9 @@ type Secret struct { type SecretType string const ( - SecretTypeOpaque SecretType = "opaque" // Opaque (arbitrary data; default) - SecretTypeKubernetesAuthToken SecretType = "kubernetes-auth" // Kubernetes auth token - SecretTypeDockerRegistryAuth SecretType = "docker-reg-auth" // Docker registry auth + SecretTypeOpaque SecretType = "Opaque" // Opaque (arbitrary data; default) + SecretTypeKubernetesAuthToken SecretType = "KubernetesAuth" // Kubernetes auth token + SecretTypeDockerRegistryAuth SecretType = "DockerRegistryAuth" // Docker registry auth // FUTURE: other type values ) -- cgit v1.2.3 From dee17e393e2937729a784b34b96fd4418e046f24 Mon Sep 17 00:00:00 2001 From: Tim Hockin Date: Mon, 23 Feb 2015 10:57:51 -0800 Subject: comments on base64-ness of secrets --- secrets.md | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/secrets.md b/secrets.md index ac8776bd..dc596183 100644 --- a/secrets.md +++ b/secrets.md @@ -273,7 +273,8 @@ type Secret struct { ObjectMeta // Data contains the secret data. Each key must be a valid DNS_SUBDOMAIN. - // The serialized form of the secret data is a base64 encoded string. + // The serialized form of the secret data is a base64 encoded string, + // representing the arbitrary (possibly non-string) data value here. Data map[string][]byte `json:"data,omitempty"` // Used to facilitate programatic handling of secret data. @@ -398,8 +399,9 @@ To create a pod that uses an ssh key stored as a secret, we first need to create } ``` -**Note:** The values of secret data are encoded as base64-encoded strings. Newlines are not -valid within these strings and must be omitted. +**Note:** The serialized JSON and YAML values of secret data are encoded as +base64 strings. Newlines are not valid within these strings and must be +omitted. Now we can create a pod which references the secret with the ssh key and consumes it in a volume: -- cgit v1.2.3 From 26159771f22b3e58a0f34be13f1d9ac54e942acf Mon Sep 17 00:00:00 2001 From: Ben McCann Date: Mon, 23 Feb 2015 13:55:02 -0800 Subject: Update links to security contexts and service accounts to point to actual docs instead of pull requests now that those proposals have been merged --- secrets.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/secrets.md b/secrets.md index ce02f930..60e825f2 100644 --- a/secrets.md +++ b/secrets.md @@ -72,7 +72,7 @@ service would also consume the secrets associated with the MySQL service. ### Use-Case: Secrets associated with service accounts -[Service Accounts](https://github.com/GoogleCloudPlatform/kubernetes/pull/2297) are proposed as a +[Service Accounts](https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/design/service_accounts.md) are proposed as a mechanism to decouple capabilities and security contexts from individual human users. A `ServiceAccount` contains references to some number of secrets. A `Pod` can specify that it is associated with a `ServiceAccount`. Secrets should have a `Type` field to allow the Kubelet and @@ -236,7 +236,7 @@ memory overcommit on the node. #### Secret data on the node: isolation -Every pod will have a [security context](https://github.com/GoogleCloudPlatform/kubernetes/pull/3910). +Every pod will have a [security context](https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/design/security_context.md). Secret data on the node should be isolated according to the security context of the container. The Kubelet volume plugin API will be changed so that a volume plugin receives the security context of a volume along with the volume spec. This will allow volume plugins to implement setting the @@ -248,7 +248,7 @@ Several proposals / upstream patches are notable as background for this proposal 1. [Docker vault proposal](https://github.com/docker/docker/issues/10310) 2. [Specification for image/container standardization based on volumes](https://github.com/docker/docker/issues/9277) -3. [Kubernetes service account proposal](https://github.com/GoogleCloudPlatform/kubernetes/pull/2297) +3. [Kubernetes service account proposal](https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/design/service_accounts.md) 4. [Secrets proposal for docker (1)](https://github.com/docker/docker/pull/6075) 5. [Secrets proposal for docker (2)](https://github.com/docker/docker/pull/6697) -- cgit v1.2.3 From db1eb9d48c41fe19eca08bb0b0d0e34f37f4f925 Mon Sep 17 00:00:00 2001 From: Paul Morie Date: Tue, 24 Feb 2015 22:05:24 -0500 Subject: Fix nits in security proposal --- security.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/security.md b/security.md index ba699739..7bdca440 100644 --- a/security.md +++ b/security.md @@ -38,7 +38,7 @@ Automated process users fall into the following categories: * write pod specs. * making some of their own images, and using some "community" docker images * know which pods need to talk to which other pods - * decide which pods should be share files with other pods, and which should not. + * decide which pods should share files with other pods, and which should not. * reason about application level security, such as containing the effects of a local-file-read exploit in a webserver pod. * do not often reason about operating system or organizational security. * are not necessarily comfortable reasoning about the security properties of a system at the level of detail of Linux Capabilities, SELinux, AppArmor, etc. @@ -66,11 +66,11 @@ A pod runs in a *security context* under a *service account* that is defined by 1. The API should authenticate and authorize user actions [authn and authz](https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/design/access.md) 2. All infrastructure components (kubelets, kube-proxies, controllers, scheduler) should have an infrastructure user that they can authenticate with and be authorized to perform only the functions they require against the API. 3. Most infrastructure components should use the API as a way of exchanging data and changing the system, and only the API should have access to the underlying data store (etcd) -4. When containers run on the cluster and need to talk to other containers or the API server, they should be identified and authorized clearly as an autonomous process via a [service account](https://github.com/GoogleCloudPlatform/kubernetes/pull/2297) +4. When containers run on the cluster and need to talk to other containers or the API server, they should be identified and authorized clearly as an autonomous process via a [service account](https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/design/service_accounts.md) 1. If the user who started a long-lived process is removed from access to the cluster, the process should be able to continue without interruption 2. If the user who started processes are removed from the cluster, administrators may wish to terminate their processes in bulk - 3. When containers run with a service account, the user that created / triggered the service account behavior must be associated to the container's action -5. When container processes runs on the cluster, they should run in a [security context](https://github.com/GoogleCloudPlatform/kubernetes/pull/3910) that isolates those processes via Linux user security, user namespaces, and permissions. + 3. When containers run with a service account, the user that created / triggered the service account behavior must be associated with the container's action +5. When container processes run on the cluster, they should run in a [security context](https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/design/security_context.md) that isolates those processes via Linux user security, user namespaces, and permissions. 1. Administrators should be able to configure the cluster to automatically confine all container processes as a non-root, randomly assigned UID 2. Administrators should be able to ensure that container processes within the same namespace are all assigned the same unix user UID 3. Administrators should be able to limit which developers and project administrators have access to higher privilege actions @@ -79,7 +79,7 @@ A pod runs in a *security context* under a *service account* that is defined by 6. Developers may need to ensure their images work within higher security requirements specified by administrators 7. When available, Linux kernel user namespaces can be used to ensure 5.2 and 5.4 are met. 8. When application developers want to share filesytem data via distributed filesystems, the Unix user ids on those filesystems must be consistent across different container processes -6. Developers should be able to define [secrets](https://github.com/GoogleCloudPlatform/kubernetes/pull/2297) that are automatically added to the containers when pods are run +6. Developers should be able to define [secrets](https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/design/secrets.md) that are automatically added to the containers when pods are run 1. Secrets are files injected into the container whose values should not be displayed within a pod. Examples: 1. An SSH private key for git cloning remote data 2. A client certificate for accessing a remote system @@ -87,7 +87,7 @@ A pod runs in a *security context* under a *service account* that is defined by 4. A .kubeconfig file with embedded cert / token data for accessing the Kubernetes master 5. A .dockercfg file for pulling images from a protected registry 2. Developers should be able to define the pod spec so that a secret lands in a specific location - 3. Project administrators should be able to limit developers within a namespace from viewing or modify secrets (anyone who can launch an arbitrary pod can view secrets) + 3. Project administrators should be able to limit developers within a namespace from viewing or modifying secrets (anyone who can launch an arbitrary pod can view secrets) 4. Secrets are generally not copied from one namespace to another when a developer's application definitions are copied -- cgit v1.2.3 From 4cc000f3f4bcb0fa92dd35adb5a59a1b650967cc Mon Sep 17 00:00:00 2001 From: markturansky Date: Tue, 3 Mar 2015 15:06:18 -0500 Subject: Persistent storage proposal --- persistent-storage.md | 85 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 85 insertions(+) create mode 100644 persistent-storage.md diff --git a/persistent-storage.md b/persistent-storage.md new file mode 100644 index 00000000..c29319aa --- /dev/null +++ b/persistent-storage.md @@ -0,0 +1,85 @@ +# PersistentVolume + +This document proposes a model for managing persistent, cluster-scoped storage for applications requiring long lived data. + +### tl;dr + +Two new API kinds: + +A `PersistentVolume` is created by a cluster admin and is a piece of persistent storage exposed as a volume. It is analogous to a node. + +A `PersistentVolumeClaim` is a user's request for a persistent volume to use in a pod. It is analogous to a pod. + +One new system component: + +`PersistentVolumeManager` watches for new volumes to manage in the system, analogous to the node controller. The volume manager also watches for claims by users and binds them to available volumes. This +component is a singleton that manages all persistent volumes in the cluster. + +Kubernetes makes no guarantees at runtime that the underlying storage exists or is available. High availability is left to the storage provider. + +### Goals + +* Allow administrators to describe available storage +* Allow pod authors to discover and request persistent volumes to use with pods +* Enforce security through access control lists and securing storage to the same namespace as the pod volume +* Enforce quotas through admission control +* Enforce scheduler rules by resource counting +* Ensure developers can rely on storage being available without being closely bound to a particular disk, server, network, or storage device. + + +#### Describe available storage + +Cluster adminstrators use the API to manage *PersistentVolumes*. The singleton PersistentVolumeManager watches the Kubernetes API for new volumes and adds them to its internal cache of volumes in the system. +All persistent volumes are managed and made available by the volume manager. The manager also watches for new claims for storage and binds them to an available, matching volume. + +Many means of dynamic provisioning will be eventually be implemented for various storage types. + +``` + + $ cluster/kubectl.sh get pv + +``` + +##### API Implementation: + +| Action | HTTP Verb | Path | Description | +| ---- | ---- | ---- | ---- | +| CREATE | POST | /api/{version}/persistentvolumes/ | Create instance of PersistentVolume in system namespace | +| GET | GET | /api/{version}persistentvolumes/{name} | Get instance of PersistentVolume in system namespace with {name} | +| UPDATE | PUT | /api/{version}/persistentvolumes/{name} | Update instance of PersistentVolume in system namespace with {name} | +| DELETE | DELETE | /api/{version}/persistentvolumes/{name} | Delete instance of PersistentVolume in system namespace with {name} | +| LIST | GET | /api/{version}/persistentvolumes | List instances of PersistentVolume in system namespace | +| WATCH | GET | /api/{version}/watch/persistentvolumes | Watch for changes to a PersistentVolume in system namespace | + + + +#### Request Storage + + +Kubernetes users request a persistent volume for their pod by creating a *PersistentVolumeClaim*. Their request for storage is described by their requirements for resource and mount capabilities. + +Requests for volumes are bound to available volumes by the volume manager, if a suitable match is found. Requests for resources can go unfulfilled. + +Users attach their claim to their pod using a new *PersistentVolumeClaimVolumeSource* volume source. + + +##### Users require a full API to manage their claims. + + +| Action | HTTP Verb | Path | Description | +| ---- | ---- | ---- | ---- | +| CREATE | POST | /api/{version}/ns/{ns}/persistentvolumeclaims/ | Create instance of PersistentVolumeClaim in namespace {ns} | +| GET | GET | /api/{version}/ns/{ns}/persistentvolumeclaims/{name} | Get instance of PersistentVolumeClaim in namespace {ns} with {name} | +| UPDATE | PUT | /api/{version}/ns/{ns}/persistentvolumeclaims/{name} | Update instance of PersistentVolumeClaim in namespace {ns} with {name} | +| DELETE | DELETE | /api/{version}/ns/{ns}/persistentvolumeclaims/{name} | Delete instance of PersistentVolumeClaim in namespace {ns} with {name} | +| LIST | GET | /api/{version}/ns/{ns}/persistentvolumeclaims | List instances of PersistentVolumeClaim in namespace {ns} | +| WATCH | GET | /api/{version}/watch/ns/{ns}/persistentvolumeclaims | Watch for changes to PersistentVolumeClaim in namespace {ns} | + + + +#### Scheduling constraints + +Scheduling constraints are to be handled similar to pod resource constraints. Pods will need to be annotated or decorated with the number of resources it requires on a node. Similarly, a node will need to list how many it has used or available. + +TBD + -- cgit v1.2.3 From 021e5a3ec46c27cde7bba6fbae539cb4ba048a21 Mon Sep 17 00:00:00 2001 From: Eric Tune Date: Tue, 3 Mar 2015 14:29:39 -0800 Subject: Added a doc with coding advice. --- coding-conventions.md | 7 +++++++ 1 file changed, 7 insertions(+) create mode 100644 coding-conventions.md diff --git a/coding-conventions.md b/coding-conventions.md new file mode 100644 index 00000000..3d493803 --- /dev/null +++ b/coding-conventions.md @@ -0,0 +1,7 @@ +Coding style advice for contributors + - Bash + - https://google-styleguide.googlecode.com/svn/trunk/shell.xml + - Go + - https://github.com/golang/go/wiki/CodeReviewComments + - https://gist.github.com/lavalamp/4bd23295a9f32706a48f + -- cgit v1.2.3 From 318adb3b233f4a6b515c04bd7d8c5d94d531a534 Mon Sep 17 00:00:00 2001 From: Paul Weil Date: Wed, 4 Mar 2015 08:52:55 -0500 Subject: auto-scaler proposal --- autoscaling.md | 254 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 254 insertions(+) create mode 100644 autoscaling.md diff --git a/autoscaling.md b/autoscaling.md new file mode 100644 index 00000000..029a6a82 --- /dev/null +++ b/autoscaling.md @@ -0,0 +1,254 @@ +## Abstract +Auto-scaling is a data-driven feature that allows users to increase or decrease capacity as needed by controlling the +number of pods deployed within the system automatically. + +## Motivation + +Applications experience peaks and valleys in usage. In order to respond to increases and decreases in load, administrators +scale their applications by adding computing resources. In the cloud computing environment this can be +done automatically based on statistical analysis and thresholds. + +### Goals + +* Provide a concrete proposal for implementing auto-scaling pods within Kubernetes +* Implementation proposal should be in line with current discussions in existing issues: + * Resize verb - [1629](https://github.com/GoogleCloudPlatform/kubernetes/issues/1629) + * Config conflicts - [Config](https://github.com/GoogleCloudPlatform/kubernetes/blob/c7cb991987193d4ca33544137a5cb7d0292cf7df/docs/config.md#automated-re-configuration-processes) + * Rolling updates - [1353](https://github.com/GoogleCloudPlatform/kubernetes/issues/1353) + * Multiple scalable types - [1624](https://github.com/GoogleCloudPlatform/kubernetes/issues/1624) + +## Constraints and Assumptions + +* This proposal is for horizontal scaling only. Vertical scaling will be handled in [issue 2072](https://github.com/GoogleCloudPlatform/kubernetes/issues/2072) +* `ReplicationControllers` will not know about the auto-scaler, they are the target of the auto-scaler. The `ReplicationController` responsibilities are +constrained to only ensuring that the desired number of pods are operational per the [Replication Controller Design](https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/replication-controller.md#responsibilities-of-the-replication-controller) +* Auto-scalers will be loosely coupled with data gathering components in order to allow a wide variety of input sources +* Auto-scalable resources will support a resize verb ([1629](https://github.com/GoogleCloudPlatform/kubernetes/issues/1629)) +such that the auto-scaler does not directly manipulate the underlying resource. +* Initially, most thresholds will be set by application administrators. It should be possible for an autoscaler to be +written later that sets thresholds automatically based on past behavior (CPU used vs incoming requests). +* The auto-scaler must be aware of user defined actions so it does not override them unintentionally (for instance someone +explicitly setting the replica count to 0 should mean that the auto-scaler does not try to scale the application up) +* It should be possible to write and deploy a custom auto-scaler without modifying existing auto-scalers +* Auto-scalers must be able to monitor multiple replication controllers while only targeting a single scalable +object (for now a ReplicationController, but in the future it could be a job or any resource that implements resize) + +## Use Cases + +### Scaling based on traffic + +The current, most obvious, use case is scaling an application based on network traffic like requests per second. Most +applications will expose one or more network endpoints for clients to connect to. Many of those endpoints will be load +balanced or situated behind a proxy - the data from those proxies and load balancers can be used to estimate client to +server traffic for applications. This is the primary, but not sole, source of data for making decisions. + +Within Kubernetes a [kube proxy](https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/services.md#ips-and-portals) +running on each node directs service requests to the underlying implementation. + +While the proxy provides internal inter-pod connections, there will be L3 and L7 proxies and load balancers that manage +traffic to backends. OpenShift, for instance, adds a "route" resource for defining external to internal traffic flow. +The "routers" are HAProxy or Apache load balancers that aggregate many different services and pods and can serve as a +data source for the number of backends. + +### Scaling based on predictive analysis + +Scaling may also occur based on predictions of system state like anticipated load, historical data, etc. Hand in hand +with scaling based on traffic, predictive analysis may be used to determine anticipated system load and scale the application automatically. + +### Scaling based on arbitrary data + +Administrators may wish to scale the application based on any number of arbitrary data points such as job execution time or +duration of active sessions. There are any number of reasons an administrator may wish to increase or decrease capacity which +means the auto-scaler must be a configurable, extensible component. + +## Specification + +In order to facilitate talking about auto-scaling the following definitions are used: + +* `ReplicationController` - the first building block of auto scaling. Pods are deployed and scaled by a `ReplicationController`. +* kube proxy - The proxy handles internal inter-pod traffic, an example of a data source to drive an auto-scaler +* L3/L7 proxies - A routing layer handling outside to inside traffic requests, an example of a data source to drive an auto-scaler +* auto-scaler - scales replicas up and down by using the `resize` endpoint provided by scalable resources (`ReplicationController`) + + +### Auto-Scaler + +The Auto-Scaler is a state reconciler responsible for checking data against configured scaling thresholds +and calling the `resize` endpoint to change the number of replicas. The scaler will +use a client/cache implementation to receive watch data from the data aggregators and respond to them by +scaling the application. Auto-scalers are created and defined like other resources via REST endpoints and belong to the +namespace just as a `ReplicationController` or `Service`. + +Since an auto-scaler is a durable object it is best represented as a resource. + +```go + //The auto scaler interface + type AutoScalerInterface interface { + //ScaleApplication adjusts a resource's replica count. Calls resize endpoint. + //Args to this are based on what the endpoint + //can support. See https://github.com/GoogleCloudPlatform/kubernetes/issues/1629 + ScaleApplication(num int) error + } + + type AutoScaler struct { + //common construct + TypeMeta + //common construct + ObjectMeta + + //Spec defines the configuration options that drive the behavior for this auto-scaler + Spec AutoScalerSpec + + //Status defines the current status of this auto-scaler. + Status AutoScalerStatus + } + + type AutoScalerSpec struct { + //AutoScaleThresholds holds a collection of AutoScaleThresholds that drive the auto scaler + AutoScaleThresholds []AutoScaleThreshold + + //Enabled turns auto scaling on or off + Enabled boolean + + //MaxAutoScaleCount defines the max replicas that the auto scaler can use. + //This value must be greater than 0 and >= MinAutoScaleCount + MaxAutoScaleCount int + + //MinAutoScaleCount defines the minimum number replicas that the auto scaler can reduce to, + //0 means that the application is allowed to idle + MinAutoScaleCount int + + //TargetSelector provides the resizeable target(s). Right now this is a ReplicationController + //in the future it could be a job or any resource that implements resize. + TargetSelector map[string]string + + //MonitorSelector defines a set of capacity that the auto-scaler is monitoring + //(replication controllers). Monitored objects are used by thresholds to examine + //statistics. Example: get statistic X for object Y to see if threshold is passed + MonitorSelector map[string]string + } + + type AutoScalerStatus struct { + // TODO: open for discussion on what meaningful information can be reported in the status + // The status may return the replica count here but we may want more information + // such as if the count reflects a threshold being passed + } + + + //AutoScaleThresholdInterface abstracts the data analysis from the auto-scaler + //example: scale by 1 (Increment) when RequestsPerSecond (Type) pass + //comparison (Comparison) of 50 (Value) for 30 seconds (Duration) + type AutoScaleThresholdInterface interface { + //called by the auto-scaler to determine if this threshold is met or not + ShouldScale() boolean + } + + + //AutoScaleThreshold is a single statistic used to drive the auto-scaler in scaling decisions + type AutoScaleThreshold struct { + // Type is the type of threshold being used, intention or value + Type AutoScaleThresholdType + + // ValueConfig holds the config for value based thresholds + ValueConfig AutoScaleValueThresholdConfig + + // IntentionConfig holds the config for intention based thresholds + IntentionConfig AutoScaleIntentionThresholdConfig + } + + // AutoScaleIntentionThresholdConfig holds configuration for intention based thresholds + // a intention based threshold defines no increment, the scaler will adjust by 1 accordingly + // and maintain once the intention is reached. Also, no selector is defined, the intention + // should dictate the selector used for statistics. Same for duration although we + // may want a configurable duration later so intentions are more customizable. + type AutoScaleIntentionThresholdConfig struct { + // Intent is the lexicon of what intention is requested + Intent AutoScaleIntentionType + + // Value is intention dependent in terms of above, below, equal and represents + // the value to check against + Value float + } + + // AutoScaleValueThresholdConfig holds configuration for value based thresholds + type AutoScaleValueThresholdConfig struct { + //Increment determines how the auot-scaler should scale up or down (positive number to + //scale up based on this threshold negative number to scale down by this threshold) + Increment int + //Selector represents the retrieval mechanism for a statistic value from statistics + //storage. Once statistics are better defined the retrieval mechanism may change. + //Ultimately, the selector returns a representation of a statistic that can be + //compared against the threshold value. + Selector map[string]string + //Duration is the time lapse after which this threshold is considered passed + Duration time.Duration + //Value is the number at which, after the duration is passed, this threshold is considered + //to be triggered + Value float + //Comparison component to be applied to the value. + Comparison string + } + + // AutoScaleThresholdType is either intention based or value based + type AutoScaleThresholdType string + + // AutoScaleIntentionType is a lexicon for intentions such as "cpu-utilization", + // "max-rps-per-endpoint" + type AutoScaleIntentionType string +``` + +#### Boundary Definitions +The `AutoScaleThreshold` definitions provide the boundaries for the auto-scaler. By defining comparisons that form a range +along with positive and negative increments you may define bi-directional scaling. For example the upper bound may be +specified as "when requests per second rise above 50 for 30 seconds scale the application up by 1" and a lower bound may +be specified as "when requests per second fall below 25 for 30 seconds scale the application down by 1 (implemented by using -1)". + +### Data Aggregator + +This section has intentionally been left empty. I will defer to folks who have more experience gathering and analyzing +time series statistics. + +Data aggregation is opaque to the the auto-scaler resource. The auto-scaler is configured to use `AutoScaleThresholds` +that know how to work with the underlying data in order to know if an application must be scaled up or down. Data aggregation +must feed a common data structure to ease the development of `AutoScaleThreshold`s but it does not matter to the +auto-scaler whether this occurs in a push or pull implementation, whether or not the data is stored at a granular level, +or what algorithm is used to determine the final statistics value. Ultimately, the auto-scaler only requires that a statistic +resolves to a value that can be checked against a configured threshold. + +Of note: If the statistics gathering mechanisms can be initialized with a registry other components storing statistics can +potentially piggyback on this registry. + +### Multi-target Scaling Policy +If multiple resizable targets satisfy the `TargetSelector` criteria the auto-scaler should be configurable as to which +target(s) are resized. To begin with, if multiple targets are found the auto-scaler will scale the largest target up +or down as appropriate. In the future this may be more configurable. + +### Interactions with a deployment + +In a deployment it is likely that multiple replication controllers must be monitored. For instance, in a [rolling deployment](https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/replication-controller.md#rolling-updates) +there will be multiple replication controllers, with one scaling up and another scaling down. This means that an +auto-scaler must be aware of the entire set of capacity that backs a service so it does not fight with the deployer. `AutoScalerSpec.MonitorSelector` +is what provides this ability. By using a selector that spans the entire service the auto-scaler can monitor capacity +of multiple replication controllers and check that capacity against the `AutoScalerSpec.MaxAutoScaleCount` and +`AutoScalerSpec.MinAutoScaleCount` while still only targeting a specific set of `ReplicationController`s with `TargetSelector`. + +In the course of a deployment it is up to the deployment orchestration to decide how to manage the labels +on the replication controllers if it needs to ensure that only specific replication controllers are targeted by +the auto-scaler. By default, the auto-scaler will scale the largest replication controller that meets the target label +selector criteria. + +During deployment orchestration the auto-scaler may be making decisions to scale its target up or down. In order to prevent +the scaler from fighting with a deployment process that is scaling one replication controller up and scaling another one +down the deployment process must assume that the current replica count may be changed by objects other than itself and +account for this in the scale up or down process. Therefore, the deployment process may no longer target an exact number +of instances to be deployed. It must be satisfied that the replica count for the deployment meets or exceeds the number +of requested instances. + +Auto-scaling down in a deployment scenario is a special case. In order for the deployment to complete successfully the +deployment orchestration must ensure that the desired number of instances that are supposed to be deployed has been met. +If the auto-scaler is trying to scale the application down (due to no traffic, or other statistics) then the deployment +process and auto-scaler are fighting to increase and decrease the count of the targeted replication controller. In order +to prevent this, deployment orchestration should notify the auto-scaler that a deployment is occurring. This will +temporarily disable negative decrement thresholds until the deployment process is completed. It is more important for +an auto-scaler to be able to grow capacity during a deployment than to shrink the number of instances precisely. + -- cgit v1.2.3 From 9f5ea46527fa41d04ef4226931730e4c61b6bf22 Mon Sep 17 00:00:00 2001 From: Quinton Hoole Date: Wed, 4 Mar 2015 17:03:55 -0800 Subject: Add documentation about the Kubernetes Github Flow. Added an animation (and a link to it) detailing the standard Kubernetes Github Flow. --- development.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/development.md b/development.md index 615b4d55..a20834e9 100644 --- a/development.md +++ b/development.md @@ -8,7 +8,7 @@ Official releases are built in Docker containers. Details are [here](../../buil Kubernetes is written in [Go](http://golang.org) programming language. If you haven't set up Go development environment, please follow [this instruction](http://golang.org/doc/code.html) to install go tool and set up GOPATH. Ensure your version of Go is at least 1.3. -## Put kubernetes into GOPATH +## Clone kubernetes into GOPATH We highly recommend to put kubernetes' code into your GOPATH. For example, the following commands will download kubernetes' code under the current user's GOPATH (Assuming there's only one directory in GOPATH.): @@ -22,7 +22,9 @@ $ git clone https://github.com/GoogleCloudPlatform/kubernetes.git The commands above will not work if there are more than one directory in ``$GOPATH``. -(Obviously, clone your own fork of Kubernetes if you plan to do development.) +If you plan to do development, read about the +[Kubernetes Github Flow](https://docs.google.com/a/google.com/presentation/d/1WDGN_ggq1Ae3eeQmbSCMyUG1UhhRH6UZTy0pePq09Xo/pub?start=false&loop=false&delayms=3000), +and then clone your own fork of Kubernetes as described there. ## godep and dependency management -- cgit v1.2.3 From c0c7a57db64ed2496ff4f0aa066e69b029fe423f Mon Sep 17 00:00:00 2001 From: markturansky Date: Thu, 5 Mar 2015 11:51:52 -0500 Subject: Added more detail and explained workflow/lifecycle of a persistent volume using examples --- persistent-storage.md | 156 ++++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 138 insertions(+), 18 deletions(-) diff --git a/persistent-storage.md b/persistent-storage.md index c29319aa..bafdb343 100644 --- a/persistent-storage.md +++ b/persistent-storage.md @@ -1,4 +1,4 @@ -# PersistentVolume +# Persistent Storage This document proposes a model for managing persistent, cluster-scoped storage for applications requiring long lived data. @@ -6,14 +6,17 @@ This document proposes a model for managing persistent, cluster-scoped storage f Two new API kinds: -A `PersistentVolume` is created by a cluster admin and is a piece of persistent storage exposed as a volume. It is analogous to a node. +A `PersistentVolume` (PV) is a storage resource provisioned by an administrator. It is analogous to a node. -A `PersistentVolumeClaim` is a user's request for a persistent volume to use in a pod. It is analogous to a pod. +A `PersistentVolumeClaim` (PVC) is a user's request for a persistent volume to use in a pod. It is analogous to a pod. One new system component: -`PersistentVolumeManager` watches for new volumes to manage in the system, analogous to the node controller. The volume manager also watches for claims by users and binds them to available volumes. This -component is a singleton that manages all persistent volumes in the cluster. +`PersistentVolumeManager` is a singleton running in master that manages all PVs in the system, analogous to the node controller. The volume manager watches the API for newly created volumes to manage. The manager also watches for claims by users and binds them to available volumes. + +One new volume: + +`PersistentVolumeClaimVolumeSource` references the user's PVC in the same namespace. This volume finds the bound PV and mounts that volume for the pod. A `PersistentVolumeClaimVolumeSource` is, essentially, a wrapper around another type of volume that is owned by someone else (the system). Kubernetes makes no guarantees at runtime that the underlying storage exists or is available. High availability is left to the storage provider. @@ -29,18 +32,12 @@ Kubernetes makes no guarantees at runtime that the underlying storage exists or #### Describe available storage -Cluster adminstrators use the API to manage *PersistentVolumes*. The singleton PersistentVolumeManager watches the Kubernetes API for new volumes and adds them to its internal cache of volumes in the system. -All persistent volumes are managed and made available by the volume manager. The manager also watches for new claims for storage and binds them to an available, matching volume. +Cluster adminstrators use the API to manage *PersistentVolumes*. The singleton PersistentVolumeManager watches the Kubernetes API for new volumes and adds them to its internal cache of volumes in the system. All persistent volumes are managed and made available by the volume manager. The manager also watches for new claims for storage and binds them to an available volume by matching the volume's characteristics (AccessModes and storage size) to the user's request. Many means of dynamic provisioning will be eventually be implemented for various storage types. -``` - - $ cluster/kubectl.sh get pv -``` - -##### API Implementation: +##### PersistentVolume API | Action | HTTP Verb | Path | Description | | ---- | ---- | ---- | ---- | @@ -52,18 +49,16 @@ Many means of dynamic provisioning will be eventually be implemented for various | WATCH | GET | /api/{version}/watch/persistentvolumes | Watch for changes to a PersistentVolume in system namespace | - #### Request Storage - -Kubernetes users request a persistent volume for their pod by creating a *PersistentVolumeClaim*. Their request for storage is described by their requirements for resource and mount capabilities. +Kubernetes users request persistent storage for their pod by creating a ```PersistentVolumeClaim```. Their request for storage is described by their requirements for resources and mount capabilities. Requests for volumes are bound to available volumes by the volume manager, if a suitable match is found. Requests for resources can go unfulfilled. -Users attach their claim to their pod using a new *PersistentVolumeClaimVolumeSource* volume source. +Users attach their claim to their pod using a new ```PersistentVolumeClaimVolumeSource``` volume source. -##### Users require a full API to manage their claims. +##### PersistentVolumeClaim API | Action | HTTP Verb | Path | Description | @@ -83,3 +78,128 @@ Scheduling constraints are to be handled similar to pod resource constraints. P TBD + +### Example + +#### Admin provisions storage + +An administrator provisions storage by posting PVs to the API. Various way to automate this task can be scripted. Dynamic provisioning is a future feature that can maintain levels of PVs. + +``` +POST: + +kind: PersistentVolume +apiVersion: v1beta3 +metadata: + name: pv0001 +spec: + capacity: + storage: 10 + persistentDisk: + pdName: "abc123" + fsType: "ext4" + +-------------------------------------------------- + +cluster/kubectl.sh get pv + +NAME LABELS CAPACITY ACCESSMODES STATUS CLAIM +pv0001 map[] 10737418240 RWO Pending + + +``` + +#### Users request storage + +A user requests storage by posting a PVC to the API. Their request contains the AccessModes they wish their volume to have and the minimum size needed. + +The user must be within a namespace to create PVCs. + +``` + +POST: +kind: PersistentVolumeClaim +apiVersion: v1beta3 +metadata: + name: myclaim-1 +spec: + accessModes: + - ReadWriteOnce + resources: + requests: + storage: 3 + +-------------------------------------------------- + +cluster/kubectl.sh get pvc + + +NAME LABELS STATUS VOLUME +myclaim-1 map[] pending + +``` + + +#### Matching and binding + + The ```PersistentVolumeManager``` attempts to find an available volume that most closely matches the user's request. If one exists, they are bound by putting a reference on the PV to the PVC. Requests can go unfulfilled if a suitable match is not found. + +``` + +cluster/kubectl.sh get pv + +NAME LABELS CAPACITY ACCESSMODES STATUS CLAIM +pv0001 map[] 10737418240 RWO Bound myclaim-1 / f4b3d283-c0ef-11e4-8be4-80e6500a981e + + +cluster/kubectl.sh get pvc + +NAME LABELS STATUS VOLUME +myclaim-1 map[] Bound b16e91d6-c0ef-11e4-8be4-80e6500a981e + + +``` + +#### Claim usage + +The claim holder can use their claim as a volume. The ```PersistentVolumeClaimVolumeSource``` knows to fetch the PV backing the claim and mount its volume for a pod. + +The claim holder owns the claim and its data for as long as the claim exists. The pod using the claim can be deleted, but the claim remains in the user's namespace. It can be used again and again by many pods. + +``` +POST: + +kind: Pod +apiVersion: v1beta3 +metadata: + name: mypod +spec: + containers: + - image: dockerfile/nginx + name: myfrontend + volumeMounts: + - mountPath: "/var/www/html" + name: mypd + volumes: + - name: mypd + source: + persistentVolumeClaim: + accessMode: ReadWriteOnce + claimRef: + name: myclaim-1 + +``` + +#### Releasing a claim and Recycling a volume + +When a claim holder is finished with their data, they can delete their claim. + +``` + +cluster/kubectl.sh delete pvc myclaim-1 + +``` + +The ```PersistentVolumeManager``` will reconcile this by removing the claim reference from the PV and change the PVs status to 'Released'. + +Admins can script the recycling of released volumes. Future dynamic provisioners will understand how a volume should be recycled. \ No newline at end of file -- cgit v1.2.3 From 46a8a0873c60de634831e14cc294b7e390e09326 Mon Sep 17 00:00:00 2001 From: Quinton Hoole Date: Thu, 5 Mar 2015 11:23:03 -0800 Subject: Make slides visible to the public, fix a typo. Moved to account quintonh@gmail.com to make it visible to the public without any login. Correct "push request" to "pull request". --- development.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/development.md b/development.md index a20834e9..7eccbcc8 100644 --- a/development.md +++ b/development.md @@ -23,7 +23,7 @@ $ git clone https://github.com/GoogleCloudPlatform/kubernetes.git The commands above will not work if there are more than one directory in ``$GOPATH``. If you plan to do development, read about the -[Kubernetes Github Flow](https://docs.google.com/a/google.com/presentation/d/1WDGN_ggq1Ae3eeQmbSCMyUG1UhhRH6UZTy0pePq09Xo/pub?start=false&loop=false&delayms=3000), +[Kubernetes Github Flow](https://docs.google.com/presentation/d/1HVxKSnvlc2WJJq8b9KCYtact5ZRrzDzkWgKEfm0QO_o/pub?start=false&loop=false&delayms=3000), and then clone your own fork of Kubernetes as described there. ## godep and dependency management -- cgit v1.2.3 From b1152d31d161c30d0f48448996018e3cf3e59440 Mon Sep 17 00:00:00 2001 From: Young Date: Sun, 8 Mar 2015 15:38:21 +0000 Subject: simple typo --- persistent-storage.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/persistent-storage.md b/persistent-storage.md index bafdb343..d9824c2a 100644 --- a/persistent-storage.md +++ b/persistent-storage.md @@ -32,7 +32,7 @@ Kubernetes makes no guarantees at runtime that the underlying storage exists or #### Describe available storage -Cluster adminstrators use the API to manage *PersistentVolumes*. The singleton PersistentVolumeManager watches the Kubernetes API for new volumes and adds them to its internal cache of volumes in the system. All persistent volumes are managed and made available by the volume manager. The manager also watches for new claims for storage and binds them to an available volume by matching the volume's characteristics (AccessModes and storage size) to the user's request. +Cluster administrators use the API to manage *PersistentVolumes*. The singleton PersistentVolumeManager watches the Kubernetes API for new volumes and adds them to its internal cache of volumes in the system. All persistent volumes are managed and made available by the volume manager. The manager also watches for new claims for storage and binds them to an available volume by matching the volume's characteristics (AccessModes and storage size) to the user's request. Many means of dynamic provisioning will be eventually be implemented for various storage types. @@ -202,4 +202,4 @@ cluster/kubectl.sh delete pvc myclaim-1 The ```PersistentVolumeManager``` will reconcile this by removing the claim reference from the PV and change the PVs status to 'Released'. -Admins can script the recycling of released volumes. Future dynamic provisioners will understand how a volume should be recycled. \ No newline at end of file +Admins can script the recycling of released volumes. Future dynamic provisioners will understand how a volume should be recycled. -- cgit v1.2.3 From f09c5510822ab57c0d08d1196fa7c4d24f0a0c37 Mon Sep 17 00:00:00 2001 From: markturansky Date: Mon, 9 Mar 2015 12:21:54 -0400 Subject: Edited to reflect that PVs have no namespace --- persistent-storage.md | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/persistent-storage.md b/persistent-storage.md index bafdb343..a4c1c9ce 100644 --- a/persistent-storage.md +++ b/persistent-storage.md @@ -34,6 +34,8 @@ Kubernetes makes no guarantees at runtime that the underlying storage exists or Cluster adminstrators use the API to manage *PersistentVolumes*. The singleton PersistentVolumeManager watches the Kubernetes API for new volumes and adds them to its internal cache of volumes in the system. All persistent volumes are managed and made available by the volume manager. The manager also watches for new claims for storage and binds them to an available volume by matching the volume's characteristics (AccessModes and storage size) to the user's request. +PVs are system objects and, thus, have no namespace. + Many means of dynamic provisioning will be eventually be implemented for various storage types. @@ -41,12 +43,12 @@ Many means of dynamic provisioning will be eventually be implemented for various | Action | HTTP Verb | Path | Description | | ---- | ---- | ---- | ---- | -| CREATE | POST | /api/{version}/persistentvolumes/ | Create instance of PersistentVolume in system namespace | -| GET | GET | /api/{version}persistentvolumes/{name} | Get instance of PersistentVolume in system namespace with {name} | -| UPDATE | PUT | /api/{version}/persistentvolumes/{name} | Update instance of PersistentVolume in system namespace with {name} | -| DELETE | DELETE | /api/{version}/persistentvolumes/{name} | Delete instance of PersistentVolume in system namespace with {name} | -| LIST | GET | /api/{version}/persistentvolumes | List instances of PersistentVolume in system namespace | -| WATCH | GET | /api/{version}/watch/persistentvolumes | Watch for changes to a PersistentVolume in system namespace | +| CREATE | POST | /api/{version}/persistentvolumes/ | Create instance of PersistentVolume | +| GET | GET | /api/{version}persistentvolumes/{name} | Get instance of PersistentVolume with {name} | +| UPDATE | PUT | /api/{version}/persistentvolumes/{name} | Update instance of PersistentVolume with {name} | +| DELETE | DELETE | /api/{version}/persistentvolumes/{name} | Delete instance of PersistentVolume with {name} | +| LIST | GET | /api/{version}/persistentvolumes | List instances of PersistentVolume | +| WATCH | GET | /api/{version}/watch/persistentvolumes | Watch for changes to a PersistentVolume | #### Request Storage -- cgit v1.2.3 From 2b45ccdae8899976b76378c0aa3216fc258f2c14 Mon Sep 17 00:00:00 2001 From: Tim Hockin Date: Mon, 9 Mar 2015 21:38:51 -0700 Subject: Add a doc on making PRs easier to review --- faster_reviews.md | 177 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 177 insertions(+) create mode 100644 faster_reviews.md diff --git a/faster_reviews.md b/faster_reviews.md new file mode 100644 index 00000000..142ac946 --- /dev/null +++ b/faster_reviews.md @@ -0,0 +1,177 @@ +# How to get faster PR reviews + +Most of what is written here is not at all specific to Kubernetes, but it bears +being written down in the hope that it will occasionally remind people of "best +practices" around code reviews. + +You've just had a brilliant idea on how to make Kubernetes better. Let's call +that idea "FeatureX". Feature X is not even that complicated. You have a +pretty good idea of how to implement it. You jump in and implement it, fixing a +bunch of stuff along the way. You send your PR - this is awesome! And it sits. +And sits. A week goes by and nobody reviews it. Finally someone offers a few +comments, which you fix up and wait for more review. And you wait. Another +week or two goes by. This is horrible. + +What went wrong? One particular problem that comes up frequently is this - your +PR is too big to review. You've touched 39 files and have 8657 insertions. +When your would-be reviewers pull up the diffs they run away - this PR is going +to take 4 hours to review and they don't have 4 hours right now. They'll get to it +later, just as soon as they have more free time (ha!). + +Let's talk about how to avoid this. + +## 1. Don't build a cathedral in one PR + +Are you sure FeatureX is something the Kubernetes team wants or will accept, or +that it is implemented to fit with other changes in flight? Are you willing to +bet a few days or weeks of work on it? If you have any doubt at all about the +usefulness of your feature or the design - make a proposal doc or a sketch PR +or both. Write or code up just enough to express the idea and the design and +why you made those choices, then get feedback on this. Now, when we ask you to +change a bunch of facets of the design, you don't have to re-write it all. + +## 2. Smaller diffs are exponentially better + +Small PRs get reviewed faster and are more likely to be correct than big ones. +Let's face it - attention wanes over time. If your PR takes 60 minutes to +review, I almost guarantee that the reviewer's eye for details is not as keen in +the last 30 minutes as it was in the first. This leads to multiple rounds of +review when one might have sufficed. In some cases the review is delayed in its +entirety by the need for a large contiguous block of time to sit and read your +code. + +Whenever possible, break up your PRs into multiple commits. Making a series of +discrete commits is a powerful way to express the evolution of an idea or the +different ideas that make up a single feature. There's a balance to be struck, +obviously. If your commits are too small they become more cumbersome to deal +with. Strive to group logically distinct ideas into commits. + +For example, if you found that FeatureX needed some "prefactoring" to fit in, +make a commit that JUST does that prefactoring. Then make a new commit for +FeatureX. Don't lump unrelated things together just because you didn't think +about prefactoring. If you need to, fork a new branch, do the prefactoring +there and send a PR for that. If you can explain why you are doing seemingly +no-op work ("it makes the FeatureX change easier, I promise") we'll probably be +OK with it. + +Obviously, a PR with 25 commits is still very cumbersome to review, so use +common sense. + +## 3. Multiple small PRs are often better than multiple commits + +If you can extract whole ideas from your PR and send those as PRs of their own, +you can avoid the painful problem of continually rebasing. Kubernetes is a +fast-moving codebase - lock in your changes ASAP, and make merges be someone +else's problem. + +Obviously, we want every PR to be useful on its own, so you'll have to use +common sense in deciding what can be a PR vs what should be a commit in a larger +PR. Rule of thumb - if this commit or set of commits is directly related to +FeatureX and nothing else, it should probably be part of the FeatureX PR. If +you can plausibly imagine someone finding value in this commit outside of +FeatureX, try it as a PR. + +Don't worry about flooding us with PRs. We'd rather have 100 small, obvious PRs +than 10 unreviewable monoliths. + +## 4. Don't rename, reformat, comment, etc in the same PR + +Often, as you are implementing FeatureX, you find things that are just wrong. +Bad comments, poorly named functions, bad structure, weak type-safety. You +should absolutely fix those things (or at least file issues, please) - but not +in this PR. See the above points - break unrelated changes out into different +PRs or commits. Otherwise your diff will have WAY too many changes, and your +reviewer won't see the forest because of all the trees. + +## 5. Comments matter + +Read up on GoDoc - follow those general rules. If you're writing code and you +think there is any possible chance that someone might not understand why you did +something (or that you won't remember what you yourself did), comment it. If +you think there's something pretty obvious that we could follow up on, add a +TODO. Many code-review comments are about this exact issue. + +## 5. Tests are almost always required + +Nothing is more frustrating than doing a review, only to find that the tests are +inadequate or even entirely absent. Very few PRs can touch code and NOT touch +tests. If you don't know how to test FeatureX - ask! We'll be happy to help +you design things for easy testing or to suggest appropriate test cases. + +## 6. Look for opportunities to generify + +If you find yourself writing something that touches a lot of modules, think hard +about the dependencies you are introducing between packages. Can some of what +you're doing be made more generic and moved up and out of the FeatureX package? +Do you need to use a function or type from an otherwise unrelated package? If +so, promote! We have places specifically for hosting more generic code. + +Likewise if FeatureX is similar in form to FeatureW which was checked in last +month and it happens to exactly duplicate some tricky stuff from FeatureW, +consider prefactoring core logic out and using it in both FeatureW and FeatureX. +But do that in a different commit or PR, please. + +## 7. Fix feedback in a new commit + +Your reviewer has finally sent you some feedback on FeatureX. You make a bunch +of changes and ... what? You could patch those into your commits with git +"squash" or "fixup" logic. But that makes your changes hard to verify. Unless +your whole PR is pretty trivial, you should instead put your fixups into a new +commit and re-push. Your reviewer can then look at that commit on its own - so +much faster to review than starting over. + +We might still ask you to squash commits at the very end, for the sake of a clean +history. + +## 8. KISS, YAGNI, MVP, etc + +Sometimes we need to remind each other of core tenets of software design - Keep +It Simple, You Aren't Gonna Need It, Minimum Viable Product, and so on. Adding +features "because we might need it later" is antithetical to software that +ships. Add the things you need NOW and (ideally) leave room for things you +might need later - but don't implement them now. + +## 9. Push back + +We understand that it is hard to imagine, but sometimes we make mistakes. It's +OK to push back on changes requested during a review. If you have a good reason +for doing something a certain way, you are absolutley allowed to debate the +merits of a requested change. You might be overruled, but you might also +prevail. We're mostly pretty reasonable people. Mostly. + +## 10. I'm still getting stalled - help?! + +So, you've done all that and you still aren't getting any PR love? Here's some +things you can do that might help kick a stalled process along: + + * Make sure that your PR has an assigned reviewer (assignee in GitHub). If + this is not the case, reply to the PR comment stream asking for one to be + assigned. + + * Ping the assignee (@username) on the PR comment stream asking for an + estimate of when they can get to it. + + * Ping the assigneed by email (many of us have email addresses that are well + published or are the same as our GitHub handle @google.com or @redhat.com). + +If you think you have fixed all the issues in a round of review, and you haven't +heard back, you should ping the reviewer (assignee) on the comment stream with a +"please take another look" (PTAL) or similar comment indicating you are done and +you think it is ready for re-review. In fact, this is probably a good habit for +all PRs. + +One phenomenon of open-source projects (where anyone can comment on any issue) +is the dog-pile - your PR gets so many comments from so many people it becomes +hard to follow. In this situation you can ask the primary reviewer +(assignee) whether they want you to fork a new PR to clear out all the comments. +Remember: you don't HAVE to fix every issue raised by every person who feels +like commenting, but you should at least answer reasonable comments with an +explanation. + +## Final: Use common sense + +Obviously, none of these points are hard rules. There is no document that can +take the place of common sense and good taste. Use your best judgement, but put +a bit of thought into how your work can be made easier to review. If you do +these things your PRs will flow much more easily. + -- cgit v1.2.3 From 9249c265badd7e73f415fa8a539c4f6b9ed07b5a Mon Sep 17 00:00:00 2001 From: markturansky Date: Tue, 10 Mar 2015 10:18:24 -0400 Subject: Added verbiage about events --- persistent-storage.md | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/persistent-storage.md b/persistent-storage.md index a4c1c9ce..586f75bf 100644 --- a/persistent-storage.md +++ b/persistent-storage.md @@ -81,6 +81,13 @@ Scheduling constraints are to be handled similar to pod resource constraints. P TBD +#### Events + +The implementation of persistent storage will not require events to communicate to the user the state of their claim. The CLI for bound claims contains a reference to the backing persistent volume. This is always present in the API and CLI, making an event to communicate the same unnecessary. + +Events that communicate the state of a mounted volume are left to the volume plugins. + + ### Example #### Admin provisions storage -- cgit v1.2.3 From aa00e8e7f157bbf96f06ea4c92f1e69b866eca34 Mon Sep 17 00:00:00 2001 From: Salvatore Dario Minonne Date: Mon, 9 Mar 2015 18:44:31 +0100 Subject: updating labels.md and design/labels.md --- labels.md | 76 --------------------------------------------------------------- 1 file changed, 76 deletions(-) delete mode 100644 labels.md diff --git a/labels.md b/labels.md deleted file mode 100644 index bc151f7c..00000000 --- a/labels.md +++ /dev/null @@ -1,76 +0,0 @@ -# Labels - -_Labels_ are key/value pairs identifying client/user-defined attributes (and non-primitive system-generated attributes) of API objects, which are stored and returned as part of the [metadata of those objects](/docs/api-conventions.md). Labels can be used to organize and to select subsets of objects according to these attributes. - -Each object can have a set of key/value labels set on it, with at most one label with a particular key. -``` -"labels": { - "key1" : "value1", - "key2" : "value2" -} -``` - -Unlike [names and UIDs](/docs/identifiers.md), labels do not provide uniqueness. In general, we expect many objects to carry the same label(s). - -Via a _label selector_, the client/user can identify a set of objects. The label selector is the core grouping primitive in Kubernetes. - -Label selectors may also be used to associate policies with sets of objects. - -We also [plan](https://github.com/GoogleCloudPlatform/kubernetes/issues/560) to make labels available inside pods and [lifecycle hooks](/docs/container-environment.md). - -Valid label keys are comprised of two segments - prefix and name - separated -by a slash (`/`). The name segment is required and must be a DNS label: 63 -characters or less, all lowercase, beginning and ending with an alphanumeric -character (`[a-z0-9]`), with dashes (`-`) and alphanumerics between. The -prefix and slash are optional. If specified, the prefix must be a DNS -subdomain (a series of DNS labels separated by dots (`.`), not longer than 253 -characters in total. - -If the prefix is omitted, the label key is presumed to be private to the user. -System components which use labels must specify a prefix. The `kubernetes.io` -prefix is reserved for kubernetes core components. - -## Motivation - -Service deployments and batch processing pipelines are often multi-dimensional entities (e.g., multiple partitions or deployments, multiple release tracks, multiple tiers, multiple micro-services per tier). Management often requires cross-cutting operations, which breaks encapsulation of strictly hierarchical representations, especially rigid hierarchies determined by the infrastructure rather than by users. Labels enable users to map their own organizational structures onto system objects in a loosely coupled fashion, without requiring clients to store these mappings. - -## Label selectors - -Label selectors permit very simple filtering by label keys and values. The simplicity of label selectors is deliberate. It is intended to facilitate transparency for humans, easy set overlap detection, efficient indexing, and reverse-indexing (i.e., finding all label selectors matching an object's labels - https://github.com/GoogleCloudPlatform/kubernetes/issues/1348). - -Currently the system supports selection by exact match of a map of keys and values. Matching objects must have all of the specified labels (both keys and values), though they may have additional labels as well. - -We are in the process of extending the label selection specification (see [selector.go](/pkg/labels/selector.go) and https://github.com/GoogleCloudPlatform/kubernetes/issues/341) to support conjunctions of requirements of the following forms: -``` -key1 in (value11, value12, ...) -key1 not in (value11, value12, ...) -key1 exists -``` - -LIST and WATCH operations may specify label selectors to filter the sets of objects returned using a query parameter: `?labels=key1%3Dvalue1,key2%3Dvalue2,...`. We may extend such filtering to DELETE operations in the future. - -Kubernetes also currently supports two objects that use label selectors to keep track of their members, `service`s and `replicationController`s: -- `service`: A [service](/docs/services.md) is a configuration unit for the proxies that run on every worker node. It is named and points to one or more pods. -- `replicationController`: A [replication controller](/docs/replication-controller.md) ensures that a specified number of pod "replicas" are running at any one time. If there are too many, it'll kill some. If there are too few, it'll start more. - -The set of pods that a `service` targets is defined with a label selector. Similarly, the population of pods that a `replicationController` is monitoring is also defined with a label selector. - -For management convenience and consistency, `services` and `replicationControllers` may themselves have labels and would generally carry the labels their corresponding pods have in common. - -In the future, label selectors will be used to identify other types of distributed service workers, such as worker pool members or peers in a distributed application. - -Individual labels are used to specify identifying metadata, and to convey the semantic purposes/roles of pods of containers. Examples of typical pod label keys include `service`, `environment` (e.g., with values `dev`, `qa`, or `production`), `tier` (e.g., with values `frontend` or `backend`), and `track` (e.g., with values `daily` or `weekly`), but you are free to develop your own conventions. - -Sets identified by labels and label selectors could be overlapping (think Venn diagrams). For instance, a service might target all pods with `tier in (frontend), environment in (prod)`. Now say you have 10 replicated pods that make up this tier. But you want to be able to 'canary' a new version of this component. You could set up a `replicationController` (with `replicas` set to 9) for the bulk of the replicas with labels `tier=frontend, environment=prod, track=stable` and another `replicationController` (with `replicas` set to 1) for the canary with labels `tier=frontend, environment=prod, track=canary`. Now the service is covering both the canary and non-canary pods. But you can mess with the `replicationControllers` separately to test things out, monitor the results, etc. - -Note that the superset described in the previous example is also heterogeneous. In long-lived, highly available, horizontally scaled, distributed, continuously evolving service applications, heterogeneity is inevitable, due to canaries, incremental rollouts, live reconfiguration, simultaneous updates and auto-scaling, hardware upgrades, and so on. - -Pods (and other objects) may belong to multiple sets simultaneously, which enables representation of service substructure and/or superstructure. In particular, labels are intended to facilitate the creation of non-hierarchical, multi-dimensional deployment structures. They are useful for a variety of management purposes (e.g., configuration, deployment) and for application introspection and analysis (e.g., logging, monitoring, alerting, analytics). Without the ability to form sets by intersecting labels, many implicitly related, overlapping flat sets would need to be created, for each subset and/or superset desired, which would lose semantic information and be difficult to keep consistent. Purely hierarchically nested sets wouldn't readily support slicing sets across different dimensions. - -Pods may be removed from these sets by changing their labels. This flexibility may be used to remove pods from service for debugging, data recovery, etc. - -Since labels can be set at pod creation time, no separate set add/remove operations are necessary, which makes them easier to use than manual set management. Additionally, since labels are directly attached to pods and label selectors are fairly simple, it's easy for users and for clients and tools to determine what sets they belong to (i.e., they are reversible). OTOH, with sets formed by just explicitly enumerating members, one would (conceptually) need to search all sets to determine which ones a pod belonged to. - -## Labels vs. annotations - -We'll eventually index and reverse-index labels for efficient queries and watches, use them to sort and group in UIs and CLIs, etc. We don't want to pollute labels with non-identifying, especially large and/or structured, data. Non-identifying information should be recorded using [annotations](/docs/annotations.md). -- cgit v1.2.3 From 472bf52e671efb9ef69fc5de2776bd2a7ea1cb8a Mon Sep 17 00:00:00 2001 From: Phaneendra Chiruvella Date: Sun, 15 Mar 2015 22:20:26 +0530 Subject: update link to common golang style mistakes --- collab.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/collab.md b/collab.md index 633b7682..f9f12e25 100644 --- a/collab.md +++ b/collab.md @@ -20,7 +20,7 @@ If a PR has gone 2 work days without an owner emerging, please poke the PR threa Except for rare cases, such as trivial changes (e.g. typos, comments) or emergencies (e.g. broken builds), maintainers should not merge their own changes. -Expect reviewers to request that you avoid [common go style mistakes](https://code.google.com/p/go-wiki/wiki/CodeReviewComments) in your PRs. +Expect reviewers to request that you avoid [common go style mistakes](https://github.com/golang/go/wiki/CodeReviewComments) in your PRs. ## Assigned reviews -- cgit v1.2.3 From d2499d4bdc149cfc2744cf3d9a54ee7be8c4841e Mon Sep 17 00:00:00 2001 From: Tim Hockin Date: Fri, 13 Mar 2015 13:06:20 -0700 Subject: Add a doc explaining how to make API changes Covers compatibility, internal API, versioned APIs, tests, fuzzer, semantic deep equal, etc. I wrote this as I worked on the next big multi-port service change. --- api_changes.md | 289 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 289 insertions(+) create mode 100644 api_changes.md diff --git a/api_changes.md b/api_changes.md new file mode 100644 index 00000000..c1005278 --- /dev/null +++ b/api_changes.md @@ -0,0 +1,289 @@ +# So you want to change the API? + +The Kubernetes API has two major components - the internal structures and +the versioned APIs. The versioned APIs are intended to be stable, while the +internal structures are implemented to best reflect the needs of the Kubernetes +code itself. + +What this means for API changes is that you have to be somewhat thoughtful in +how you approach changes, and that you have to touch a number of pieces to make +a complete change. This document aims to guide you through the process, though +not all API changes will need all of these steps. + +## Operational overview + +It's important to have a high level understanding of the API system used in +Kubernetes in order to navigate the rest of this document. + +As mentioned above, the internal representation of an API object is decoupled +from any one API version. This provides a lot of freedom to evolve the code, +but it requires robust infrastructure to convert between representations. There +are multiple steps in processing an API operation - even something as simple as +a GET involves a great deal of machinery. + +The conversion process is logically a "star" with the internal form at the +center. Every versioned API can be converted to the internal form (and +vice-versa), but versioned APIs do not convert to other versioned APIs directly. +This sounds like a heavy process, but in reality we don't intend to keep more +than a small number of versions alive at once. While all of the Kubernetes code +operates on the internal structures, they are always converted to a versioned +form before being written to storage (disk or etcd) or being sent over a wire. +Clients should consume and operate on the versioned APIs exclusively. + +To demonstrate the general process, let's walk through a (hypothetical) example: + + 1. A user POSTs a `Pod` object to `/api/v7beta1/...` + 2. The JSON is unmarshalled into a `v7beta1.Pod` structure + 3. Default values are applied to the `v7beta1.Pod` + 4. The `v7beta1.Pod` is converted to an `api.Pod` structure + 5. The `api.Pod` is validated, and any errors are returned to the user + 6. The `api.Pod` is converted to a `v6.Pod` (because v6 is the latest stable + version) + 7. The `v6.Pod` is marshalled into JSON and written to etcd + +Now that we have the `Pod` object stored, a user can GET that object in any +supported api version. For example: + + 1. A user GETs the `Pod` from `/api/v5/...` + 2. The JSON is read from etcd and unmarshalled into a `v6.Pod` structure + 3. Default values are applied to the `v6.Pod` + 4. The `v6.Pod` is converted to an `api.Pod` structure + 5. The `api.Pod` is converted to a `v5.Pod` structure + 6. The `v5.Pod` is marshalled into JSON and sent to the user + +The implication of this process is that API changes must be done carefully and +backward-compatibly. + +## On compatibility + +Before talking about how to make API changes, it is worthwhile to clarify what +we mean by API compatibility. An API change is considered backward-compatible +if it: + * adds new functionality that is not required for correct behavior + * does not change existing semantics + * does not change existing defaults + +Put another way: + +1. Any API call (e.g. a structure POSTed to a REST endpoint) that worked before + your change must work the same after your change. +2. Any API call that uses your change must not cause problems (e.g. crash or + degrade behavior) when issued against servers that do not include your change. +3. It must be possible to round-trip your change (convert to different API + versions and back) with no loss of information. + +If your change does not meet these criteria, it is not considered strictly +compatible. There are times when this might be OK, but mostly we want changes +that meet this definition. If you think you need to break compatibility, you +should talk to the Kubernetes team first. + +Let's consider some examples. In a hypothetical API (assume we're at version +v6), the `Frobber` struct looks something like this: + +```go +// API v6. +type Frobber struct { + Height int `json:"height"` + Param string `json:"param"` +} +``` + +You want to add a new `Width` field. It is generally safe to add new fields +without changing the API version, so you can simply change it to: + +```go +// Still API v6. +type Frobber struct { + Height int `json:"height"` + Width int `json:"width"` + Param string `json:"param"` +} +``` + +The onus is on you to define a sane default value for `Width` such that rule #1 +above is true - API calls and stored objects that used to work must continue to +work. + +For your next change you want to allow multiple `Param` values. You can not +simply change `Param string` to `Params []string` (without creating a whole new +API version) - that fails rules #1 and #2. You can instead do something like: + +```go +// Still API v6, but kind of clumsy. +type Frobber struct { + Height int `json:"height"` + Width int `json:"width"` + Param string `json:"param"` // the first param + ExtraParams []string `json:"params"` // additional params +} +``` + +Now you can satisfy the rules: API calls that provide the old style `Param` +will still work, while servers that don't understand `ExtraParams` can ignore +it. This is somewhat unsatisfying as an API, but it is strictly compatible. + +Part of the reason for versioning APIs and for using internal structs that are +distinct from any one version is to handle growth like this. The internal +representation can be implemented as: + +```go +// Internal, soon to be v7beta1. +type Frobber struct { + Height int + Width int + Params []string +} +``` + +The code that converts to/from versioned APIs can decode this into the somewhat +uglier (but compatible!) structures. Eventually, a new API version, let's call +it v7beta1, will be forked and it can use the clean internal structure. + +We've seen how to satisfy rules #1 and #2. Rule #3 means that you can not +extend one versioned API without also extending the others. For example, an +API call might POST an object in API v7beta1 format, which uses the cleaner +`Params` field, but the API server might store that object in trusty old v6 +form (since v7beta1 is "beta"). When the user reads the object back in the +v7beta1 API it would be unacceptable to have lost all but `Params[0]`. This +means that, even though it is ugly, a compatible change must be made to the v6 +API. + +As another interesting example, enumerated values provide a unique challenge. +Adding a new value to an enumerated set is *not* a compatible change. Clients +which assume they know how to handle all possible values of a given field will +not be able to handle the new values. However, removing value from an +enumerated set *can* be a compatible change, if handled properly (treat the +removed value as deprecated but allowed). + +## Changing versioned APIs + +For most changes, you will probably find it easiest to change the versioned +APIs first. This forces you to think about how to make your change in a +compatible way. Rather than doing each step in every version, it's usually +easier to do each versioned API one at a time, or to do all of one version +before starting "all the rest". + +### Edit types.go + +The struct definitions for each API are in `pkg/api//types.go`. Edit +those files to reflect the change you want to make. Note that all non-online +fields in versioned APIs must have description tags - these are used to generate +documentation. + +### Edit defaults.go + +If your change includes new fields for which you will need default values, you +need to add cases to `pkg/api//defaults.go`. Of course, since you +have added code, you have to add a test: `pkg/api//defaults_test.go`. + +Don't forget to run the tests! + +### Edit conversion.go + +Given that you have not yet changed the internal structs, this might feel +premature, and that's because it is. You don't yet have anything to convert to +or from. We will revisit this in the "internal" section. If you're doing this +all in a different order (i.e. you started with the internal structs), then you +should jump to that topic below. In the very rare case that you are making an +incompatible change you might or might not want to do this now, but you will +have to do more later. The files you want are +`pkg/api//conversion.go` and `pkg/api//conversion_test.go`. + +## Changing the internal structures + +Now it is time to change the internal structs so your versioned changes can be +used. + +### Edit types.go + +Similar to the versioned APIs, the definitions for the internal structs are in +`pkg/api/types.go`. Edit those files to reflect the change you want to make. +Keep in mind that the internal structs must be able to express *all* of the +versioned APIs. + +## Edit validation.go + +Most changes made to the internal structs need some form of input validation. +Validation is currently done on internal objects in +`pkg/api/validation/validation.go`. This validation is the one of the first +opportunities we have to make a great user experience - good error messages and +thorough validation help ensure that users are giving you what you expect and, +when they don't, that they know why and how to fix it. Think hard about the +contents of `string` fields, the bounds of `int` fields and the +requiredness/optionalness of fields. + +Of course, code needs tests - `pkg/api/validation/validation_test.go`. + +## Edit version conversions + +At this point you have both the versioned API changes and the internal +structure changes done. If there are any notable differences - field names, +types, structural change in particular - you must add some logic to convert +versioned APIs to and from the internal representation. If you see errors from +the `serialization_test`, it may indicate the need for explicit conversions. + +The conversion code resides with each versioned API - +`pkg/api//conversion.go`. Unsurprisingly, this also requires you to +add tests to `pkg/api//conversion_test.go`. + +## Update the fuzzer + +Part of our testing regimen for APIs is to "fuzz" (fill with random values) API +objects and then convert them to and from the different API versions. This is +a great way of exposing places where you lost information or made bad +assumptions. If you have added any fields which need very careful formatting +(the test does not run validation) or if you have made assumptions such as +"this slice will always have at least 1 element", you may get an error or even +a panic from the `serialization_test`. If so, look at the diff it produces (or +the backtrace in case of a panic) and figure out what you forgot. Encode that +into the fuzzer's custom fuzz functions. + +The fuzzer can be found in `pkg/api/testing/fuzzer.go`. + +## Update the semantic comparisons + +VERY VERY rarely is this needed, but when it hits, it hurts. In some rare +cases we end up with objects (e.g. resource quantites) that have morally +equivalent values with different bitwise representations (e.g. value 10 with a +base-2 formatter is the same as value 0 with a base-10 formatter). The only way +Go knows how to do deep-equality is through field-by-field bitwise comparisons. +This is a problem for us. + +The first thing you should do is try not to do that. If you really can't avoid +this, I'd like to introduce you to our semantic DeepEqual routine. It supports +custom overrides for specific types - you can find that in `pkg/api/helpers.go`. + +There's one other time when you might have to touch this: unexported fields. +You see, while Go's `reflect` package is allowed to touch unexported fields, us +mere mortals are not - this includes semantic DeepEqual. Fortunately, most of +our API objects are "dumb structs" all the way down - all fields are exported +(start with a capital letter) and there are no unexported fields. But sometimes +you want to include an object in our API that does have unexported fields +somewhere in it (for example, `time.Time` has unexported fields). If this hits +you, you may have to touch the semantic DeepEqual customization functions. + +## Implement your change + +Now you have the API all changed - go implement whatever it is that you're +doing! + +## Write end-to-end tests + +This is, sadly, still sort of painful. Talk to us and we'll try to help you +figure out the best way to make sure your cool feature keeps working forever. + +## Examples and docs + +At last, your change is done, all unit tests pass, e2e passes, you're done, +right? Actually, no. You just changed the API. If you are touching an +existing facet of the API, you have to try *really* hard to make sure that +*all* the examples and docs are updated. There's no easy way to do this, due +in part ot JSON and YAML silently dropping unknown fields. You're clever - +you'll figure it out. Put `grep` or `ack` to good use. + +If you added functionality, you should consider documenting it and/or writing +an example to illustrate your change. + +## Adding new REST objects + +TODO(smarterclayton): write this. -- cgit v1.2.3 From 59748dcb90e9ce05eca6608fbe8c3c32898edd87 Mon Sep 17 00:00:00 2001 From: Wojciech Tyczynski Date: Mon, 16 Mar 2015 13:20:03 +0100 Subject: Remove BoundPod structure --- security_context.md | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/security_context.md b/security_context.md index 7dc10e69..cd10202e 100644 --- a/security_context.md +++ b/security_context.md @@ -83,19 +83,19 @@ The Kubelet will have an interface that points to a `SecurityContextProvider`. T ```go type SecurityContextProvider interface { - // ModifyContainerConfig is called before the Docker createContainer call. - // The security context provider can make changes to the Config with which - // the container is created. - // An error is returned if it's not possible to secure the container as - // requested with a security context. - ModifyContainerConfig(pod *api.BoundPod, container *api.Container, config *docker.Config) error + // ModifyContainerConfig is called before the Docker createContainer call. + // The security context provider can make changes to the Config with which + // the container is created. + // An error is returned if it's not possible to secure the container as + // requested with a security context. + ModifyContainerConfig(pod *api.Pod, container *api.Container, config *docker.Config) error // ModifyHostConfig is called before the Docker runContainer call. // The security context provider can make changes to the HostConfig, affecting // security options, whether the container is privileged, volume binds, etc. // An error is returned if it's not possible to secure the container as requested - // with a security context. - ModifyHostConfig(pod *api.BoundPod, container *api.Container, hostConfig *docker.HostConfig) + // with a security context. + ModifyHostConfig(pod *api.Pod, container *api.Container, hostConfig *docker.HostConfig) } ``` -- cgit v1.2.3 From 9786c3c7634b5b6a54fcf9258660ae36c694d8d2 Mon Sep 17 00:00:00 2001 From: Yu-Ju Hong Date: Tue, 17 Mar 2015 12:30:47 -0700 Subject: Add -v to `go run hack/e2e.go -ctl` commands --- development.md | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/development.md b/development.md index 7eccbcc8..ef7c7ce8 100644 --- a/development.md +++ b/development.md @@ -215,15 +215,16 @@ hack/ginkgo-e2e.sh --ginkgo.focus=Pods.*env # Flags can be combined, and their actions will take place in this order: # -build, -push|-up|-pushup, -test|-tests=..., -down # e.g.: -go run e2e.go -build -pushup -test -down +go run hack/e2e.go -build -pushup -test -down # -v (verbose) can be added if you want streaming output instead of only # seeing the output of failed commands. # -ctl can be used to quickly call kubectl against your e2e cluster. Useful for -# cleaning up after a failed test or viewing logs. -go run e2e.go -ctl='get events' -go run e2e.go -ctl='delete pod foobar' +# cleaning up after a failed test or viewing logs. Use -v to avoid supressing +# kubectl output. +go run hack/e2e.go -v -ctl='get events' +go run hack/e2e.go -v -ctl='delete pod foobar' ``` ## Testing out flaky tests -- cgit v1.2.3 From 6a22c4b38d1a80440fa462f1d11bd5f24be087e4 Mon Sep 17 00:00:00 2001 From: derekwaynecarr Date: Mon, 9 Mar 2015 14:34:12 -0400 Subject: Update namespaces design --- namespaces.md | 386 +++++++++++++++++++++++++++++++++++++++------------------- 1 file changed, 264 insertions(+), 122 deletions(-) diff --git a/namespaces.md b/namespaces.md index 761daa1a..0e89bf56 100644 --- a/namespaces.md +++ b/namespaces.md @@ -1,193 +1,335 @@ -# Kubernetes Proposal - Namespaces +# Namespaces -**Related PR:** +## Abstract -| Topic | Link | -| ---- | ---- | -| Identifiers.md | https://github.com/GoogleCloudPlatform/kubernetes/pull/1216 | -| Access.md | https://github.com/GoogleCloudPlatform/kubernetes/pull/891 | -| Indexing | https://github.com/GoogleCloudPlatform/kubernetes/pull/1183 | -| Cluster Subdivision | https://github.com/GoogleCloudPlatform/kubernetes/issues/442 | +A Namespace is a mechanism to partition resources created by users into +a logically named group. -## Background +## Motivation -High level goals: +A single cluster should be able to satisfy the needs of multiple user communities. -* Enable an easy-to-use mechanism to logically scope Kubernetes resources -* Ensure extension resources to Kubernetes can share the same logical scope as core Kubernetes resources -* Ensure it aligns with access control proposal -* Ensure system has log n scale with increasing numbers of scopes +Each user community wants to be able to work in isolation from other communities. + +Each user community has its own: + +1. resources (pods, services, replication controllers, etc.) +2. policies (who can or cannot perform actions in their community) +3. constraints (this community is allowed this much quota, etc.) + +A cluster operator may create a Namespace for each unique user community. + +The Namespace provides a unique scope for: + +1. named resources (to avoid basic naming collisions) +2. delegated management authority to trusted users +3. ability to limit community resource consumption ## Use cases -Actors: +1. As a cluster operator, I want to support multiple user communities on a single cluster. +2. As a cluster operator, I want to delegate authority to partitions of the cluster to trusted users + in those communities. +3. As a cluster operator, I want to limit the amount of resources each community can consume in order + to limit the impact to other communities using the cluster. +4. As a cluster user, I want to interact with resources that are pertinent to my user community in + isolation of what other user communities are doing on the cluster. + +## Design + +### Data Model + +A *Namespace* defines a logically named group for multiple *Kind*s of resources. + +``` +type Namespace struct { + TypeMeta `json:",inline"` + ObjectMeta `json:"metadata,omitempty"` + + Spec NamespaceSpec `json:"spec,omitempty"` + Status NamespaceStatus `json:"status,omitempty"` +} +``` + +A *Namespace* name is a DNS compatible subdomain. + +A *Namespace* must exist prior to associating content with it. + +A *Namespace* must not be deleted if there is content associated with it. -1. k8s admin - administers a kubernetes cluster -2. k8s service - k8s daemon operates on behalf of another user (i.e. controller-manager) -2. k8s policy manager - enforces policies imposed on k8s cluster -3. k8s user - uses a kubernetes cluster to schedule pods +To associate a resource with a *Namespace* the following conditions must be satisfied: -User stories: +1. The resource's *Kind* must be registered as having *RESTScopeNamespace* with the server +2. The resource's *TypeMeta.Namespace* field must have a value that references an existing *Namespace* -1. Ability to set immutable namespace to k8s resources -2. Ability to list k8s resource scoped to a namespace -3. Restrict a namespace identifier to a DNS-compatible string to support compound naming conventions -4. Ability for a k8s policy manager to enforce a k8s user's access to a set of namespaces -5. Ability to set/unset a default namespace for use by kubecfg client -6. Ability for a k8s service to monitor resource changes across namespaces -7. Ability for a k8s service to list resources across namespaces +The *Name* of a resource associated with a *Namespace* is unique to that *Kind* in that *Namespace*. -## Proposed Design +It is intended to be used in resource URLs; provided by clients at creation time, and encouraged to be +human friendly; intended to facilitate idempotent creation, space-uniqueness of singleton objects, +distinguish distinct entities, and reference particular entities across operations. -### Model Changes +### Authorization -Introduce a new attribute *Namespace* for each resource that must be scoped in a Kubernetes cluster. +A *Namespace* provides an authorization scope for accessing content associated with the *Namespace*. -A *Namespace* is a DNS compatible subdomain. +See [Authorization plugins](../authorization.md) + +### Limit Resource Consumption + +A *Namespace* provides a scope to limit resource consumption. + +A *LimitRange* defines min/max constraints on the amount of resources a single entity can consume in +a *Namespace*. + +See [Admission control: Limit Range](admission_control_limit_range.md) + +A *ResourceQuota* tracks aggregate usage of resources in the *Namespace* and allows cluster operators +to define *Hard* resource usage limits that a *Namespace* may consume. + +See [Admission control: Resource Quota](admission_control_resource_quota.md) + +### Finalizers + +Upon creation of a *Namespace*, the creator may provide a list of *Finalizer* objects. ``` -// TypeMeta is shared by all objects sent to, or returned from the client -type TypeMeta struct { - Kind string `json:"kind,omitempty"` - Uid string `json:"uid,omitempty"` - CreationTimestamp util.Time `json:"creationTimestamp,omitempty"` - SelfLink string `json:"selfLink,omitempty"` - ResourceVersion uint64 `json:"resourceVersion,omitempty"` - APIVersion string `json:"apiVersion,omitempty"` - Namespace string `json:"namespace,omitempty"` - Name string `json:"name,omitempty"` +type FinalizerName string + +// These are internal finalizers to Kubernetes, must be qualified name unless defined here +const ( + FinalizerKubernetes FinalizerName = "kubernetes" +) + +// NamespaceSpec describes the attributes on a Namespace +type NamespaceSpec struct { + // Finalizers is an opaque list of values that must be empty to permanently remove object from storage + Finalizers []FinalizerName } ``` -An identifier, *UID*, is unique across time and space intended to distinguish between historical occurences of similar entities. +A *FinalizerName* is a qualified name. -A *Name* is unique within a given *Namespace* at a particular time, used in resource URLs; provided by clients at creation time -and encouraged to be human friendly; intended to facilitate creation idempotence and space-uniqueness of singleton objects, distinguish -distinct entities, and reference particular entities across operations. +The API Server enforces that a *Namespace* can only be deleted from storage if and only if +it's *Namespace.Spec.Finalizers* is empty. -As of this writing, the following resources MUST have a *Namespace* and *Name* +A *finalize* operation is the only mechanism to modify the *Namespace.Spec.Finalizers* field post creation. -* pod -* service -* replicationController -* endpoint +Each *Namespace* created has *kubernetes* as an item in its list of initial *Namespace.Spec.Finalizers* +set by default. -A *policy* MAY be associated with a *Namespace*. +### Phases -If a *policy* has an associated *Namespace*, the resource paths it enforces are scoped to a particular *Namespace*. +A *Namespace* may exist in the following phases. -## k8s API server +``` +type NamespacePhase string +const( + NamespaceActive NamespacePhase = "Active" + NamespaceTerminating NamespaceTerminating = "Terminating" +) + +type NamespaceStatus struct { + ... + Phase NamespacePhase +} +``` -In support of namespace isolation, the Kubernetes API server will address resources by the following conventions: +A *Namespace* is in the **Active** phase if it does not have a *ObjectMeta.DeletionTimestamp*. -The typical actors for the following requests are the k8s user or the k8s service. +A *Namespace* is in the **Terminating** phase if it has a *ObjectMeta.DeletionTimestamp*. -| Action | HTTP Verb | Path | Description | -| ---- | ---- | ---- | ---- | -| CREATE | POST | /api/{version}/ns/{ns}/{resourceType}/ | Create instance of {resourceType} in namespace {ns} | -| GET | GET | /api/{version}/ns/{ns}/{resourceType}/{name} | Get instance of {resourceType} in namespace {ns} with {name} | -| UPDATE | PUT | /api/{version}/ns/{ns}/{resourceType}/{name} | Update instance of {resourceType} in namespace {ns} with {name} | -| DELETE | DELETE | /api/{version}/ns/{ns}/{resourceType}/{name} | Delete instance of {resourceType} in namespace {ns} with {name} | -| LIST | GET | /api/{version}/ns/{ns}/{resourceType} | List instances of {resourceType} in namespace {ns} | -| WATCH | GET | /api/{version}/watch/ns/{ns}/{resourceType} | Watch for changes to a {resourceType} in namespace {ns} | +**Active** -The typical actor for the following requests are the k8s service or k8s admin as enforced by k8s Policy. +Upon creation, a *Namespace* goes in the *Active* phase. This means that content may be associated with +a namespace, and all normal interactions with the namespace are allowed to occur in the cluster. -| Action | HTTP Verb | Path | Description | -| ---- | ---- | ---- | ---- | -| WATCH | GET | /api/{version}/watch/{resourceType} | Watch for changes to a {resourceType} across all namespaces | -| LIST | GET | /api/{version}/list/{resourceType} | List instances of {resourceType} across all namespaces | +If a DELETE request occurs for a *Namespace*, the *Namespace.ObjectMeta.DeletionTimestamp* is set +to the current server time. A *namespace controller* observes the change, and sets the *Namespace.Status.Phase* +to *Terminating*. -The legacy API patterns for k8s are an alias to interacting with the *default* namespace as follows. +**Terminating** -| Action | HTTP Verb | Path | Description | -| ---- | ---- | ---- | ---- | -| CREATE | POST | /api/{version}/{resourceType}/ | Create instance of {resourceType} in namespace *default* | -| GET | GET | /api/{version}/{resourceType}/{name} | Get instance of {resourceType} in namespace *default* | -| UPDATE | PUT | /api/{version}/{resourceType}/{name} | Update instance of {resourceType} in namespace *default* | -| DELETE | DELETE | /api/{version}/{resourceType}/{name} | Delete instance of {resourceType} in namespace *default* | +A *namespace controller* watches for *Namespace* objects that have a *Namespace.ObjectMeta.DeletionTimestamp* +value set in order to know when to initiate graceful termination of the *Namespace* associated content that +are known to the cluster. -The k8s API server verifies the *Namespace* on resource creation matches the *{ns}* on the path. +The *namespace controller* enumerates each known resource type in that namespace and deletes it one by one. -The k8s API server will enable efficient mechanisms to filter model resources based on the *Namespace*. This may require -the creation of an index on *Namespace* that could support query by namespace with optional label selectors. +Admission control blocks creation of new resources in that namespace in order to prevent a race-condition +where the controller could believe all of a given resource type had been deleted from the namespace, +when in fact some other rogue client agent had created new objects. Using admission control in this +scenario allows each of registry implementations for the individual objects to not need to take into account Namespace life-cycle. -The k8s API server will associate a resource with a *Namespace* if not populated by the end-user based on the *Namespace* context -of the incoming request. If the *Namespace* of the resource being created, or updated does not match the *Namespace* on the request, -then the k8s API server will reject the request. +Once all objects known to the *namespace controller* have been deleted, the *namespace controller* +executes a *finalize* operation on the namespace that removes the *kubernetes* value from +the *Namespace.Spec.Finalizers* list. -TODO: Update to discuss k8s api server proxy patterns +If the *namespace controller* sees a *Namespace* whose *ObjectMeta.DeletionTimestamp* is set, and +whose *Namespace.Spec.Finalizers* list is empty, it will signal the server to permanently remove +the *Namespace* from storage by sending a final DELETE action to the API server. -## k8s storage +### REST API -A namespace provides a unique identifier space and therefore must be in the storage path of a resource. +To interact with the Namespace API: -In etcd, we want to continue to still support efficient WATCH across namespaces. +| Action | HTTP Verb | Path | Description | +| ------ | --------- | ---- | ----------- | +| CREATE | POST | /api/{version}/namespaces | Create a namespace | +| LIST | GET | /api/{version}/namespaces | List all namespaces | +| UPDATE | PUT | /api/{version}/namespaces/{namespace} | Update namespace {namespace} | +| DELETE | DELETE | /api/{version}/namespaces/{namespace} | Delete namespace {namespace} | +| FINALIZE | POST | /api/{version}/namespaces/{namespace}/finalize | Finalize namespace {namespace} | +| WATCH | GET | /api/{version}/watch/namespaces | Watch all namespaces | -Resources that persist content in etcd will have storage paths as follows: +This specification reserves the name *finalize* as a sub-resource to namespace. -/registry/{resourceType}/{resource.Namespace}/{resource.Name} +As a consequence, it is invalid to have a *resourceType* managed by a namespace whose kind is *finalize*. -This enables k8s service to WATCH /registry/{resourceType} for changes across namespace of a particular {resourceType}. +To interact with content associated with a Namespace: -Upon scheduling a pod to a particular host, the pod's namespace must be in the key path as follows: +| Action | HTTP Verb | Path | Description | +| ---- | ---- | ---- | ---- | +| CREATE | POST | /api/{version}/namespaces/{namespace}/{resourceType}/ | Create instance of {resourceType} in namespace {namespace} | +| GET | GET | /api/{version}/namespaces/{namespace}/{resourceType}/{name} | Get instance of {resourceType} in namespace {namespace} with {name} | +| UPDATE | PUT | /api/{version}/namespaces/{namespace}/{resourceType}/{name} | Update instance of {resourceType} in namespace {namespace} with {name} | +| DELETE | DELETE | /api/{version}/namespaces/{namespace}/{resourceType}/{name} | Delete instance of {resourceType} in namespace {namespace} with {name} | +| LIST | GET | /api/{version}/namespaces/{namespace}/{resourceType} | List instances of {resourceType} in namespace {namespace} | +| WATCH | GET | /api/{version}/watch/namespaces/{namespace}/{resourceType} | Watch for changes to a {resourceType} in namespace {namespace} | +| WATCH | GET | /api/{version}/watch/{resourceType} | Watch for changes to a {resourceType} across all namespaces | +| LIST | GET | /api/{version}/list/{resourceType} | List instances of {resourceType} across all namespaces | -/host/{host}/pod/{pod.Namespace}/{pod.Name} +The API server verifies the *Namespace* on resource creation matches the *{namespace}* on the path. -## k8s Authorization service +The API server will associate a resource with a *Namespace* if not populated by the end-user based on the *Namespace* context +of the incoming request. If the *Namespace* of the resource being created, or updated does not match the *Namespace* on the request, +then the API server will reject the request. -This design assumes the existence of an authorization service that filters incoming requests to the k8s API Server in order -to enforce user authorization to a particular k8s resource. It performs this action by associating the *subject* of a request -with a *policy* to an associated HTTP path and verb. This design encodes the *namespace* in the resource path in order to enable -external policy servers to function by resource path alone. If a request is made by an identity that is not allowed by -policy to the resource, the request is terminated. Otherwise, it is forwarded to the apiserver. +### Storage -## k8s controller-manager +A namespace provides a unique identifier space and therefore must be in the storage path of a resource. -The controller-manager will provision pods in the same namespace as the associated replicationController. +In etcd, we want to continue to still support efficient WATCH across namespaces. -## k8s Kubelet +Resources that persist content in etcd will have storage paths as follows: -There is no major change to the kubelet introduced by this proposal. +/{k8s_storage_prefix}/{resourceType}/{resource.Namespace}/{resource.Name} -### kubecfg client +This enables consumers to WATCH /registry/{resourceType} for changes across namespace of a particular {resourceType}. -kubecfg supports following: +### Kubelet -``` -kubecfg [OPTIONS] ns {namespace} -``` +The kubelet will register pod's it sources from a file or http source with a namespace associated with the +*cluster-id* -To set a namespace to use across multiple operations: +### Example: OpenShift Origin managing a Kubernetes Namespace -``` -$ kubecfg ns ns1 -``` +In this example, we demonstrate how the design allows for agents built on-top of +Kubernetes that manage their own set of resource types associated with a *Namespace* +to take part in Namespace termination. -To view the current namespace: +OpenShift creates a Namespace in Kubernetes ``` -$ kubecfg ns -Using namespace ns1 +{ + "apiVersion":"v1beta3", + "kind": "Namespace", + "metadata": { + "name": "development", + }, + "spec": { + "finalizers": ["openshift.com/origin", "kubernetes"], + }, + "status": { + "phase": "Active", + }, + "labels": { + "name": "development" + }, +} ``` -To reset to the default namespace: +OpenShift then goes and creates a set of resources (pods, services, etc) associated +with the "development" namespace. It also creates its own set of resources in its +own storage associated with the "development" namespace unknown to Kubernetes. + +User deletes the Namespace in Kubernetes, and Namespace now has following state: ``` -$ kubecfg ns default +{ + "apiVersion":"v1beta3", + "kind": "Namespace", + "metadata": { + "name": "development", + "deletionTimestamp": "..." + }, + "spec": { + "finalizers": ["openshift.com/origin", "kubernetes"], + }, + "status": { + "phase": "Terminating", + }, + "labels": { + "name": "development" + }, +} ``` -In addition, each kubecfg request may explicitly specify a namespace for the operation via the following OPTION +The Kubernetes *namespace controller* observes the namespace has a *deletionTimestamp* +and begins to terminate all of the content in the namespace that it knows about. Upon +success, it executes a *finalize* action that modifies the *Namespace* by +removing *kubernetes* from the list of finalizers: ---ns +``` +{ + "apiVersion":"v1beta3", + "kind": "Namespace", + "metadata": { + "name": "development", + "deletionTimestamp": "..." + }, + "spec": { + "finalizers": ["openshift.com/origin"], + }, + "status": { + "phase": "Terminating", + }, + "labels": { + "name": "development" + }, +} +``` -When loading resource files specified by the -c OPTION, the kubecfg client will ensure the namespace is set in the -message body to match the client specified default. +OpenShift Origin has its own *namespace controller* that is observing cluster state, and +it observes the same namespace had a *deletionTimestamp* assigned to it. It too will go +and purge resources from its own storage that it manages associated with that namespace. +Upon completion, it executes a *finalize* action and removes the reference to "openshift.com/origin" +from the list of finalizers. -If no default namespace is applied, the client will assume the following default namespace: +This results in the following state: -* default +``` +{ + "apiVersion":"v1beta3", + "kind": "Namespace", + "metadata": { + "name": "development", + "deletionTimestamp": "..." + }, + "spec": { + "finalizers": [], + }, + "status": { + "phase": "Terminating", + }, + "labels": { + "name": "development" + }, +} +``` -The kubecfg client would store default namespace information in the same manner it caches authentication information today -as a file on user's file system. +At this point, the Kubernetes *namespace controller* in its sync loop will see that the namespace +has a deletion timestamp and that its list of finalizers is empty. As a result, it knows all +content associated from that namespace has been purged. It performs a final DELETE action +to remove that Namespace from the storage. +At this point, all content associated with that Namespace, and the Namespace itself are gone. \ No newline at end of file -- cgit v1.2.3 From 3fe373e83ae17bf4a84c6632b91be7ad61f7b97b Mon Sep 17 00:00:00 2001 From: Rohit Jnagal Date: Fri, 13 Mar 2015 00:30:32 +0000 Subject: Update vagrant documentation to use get.k8s.io for setup. --- developer-guides/vagrant.md | 321 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 321 insertions(+) create mode 100644 developer-guides/vagrant.md diff --git a/developer-guides/vagrant.md b/developer-guides/vagrant.md new file mode 100644 index 00000000..47236381 --- /dev/null +++ b/developer-guides/vagrant.md @@ -0,0 +1,321 @@ +## Getting started with Vagrant + +Running kubernetes with Vagrant (and VirtualBox) is an easy way to run/test/develop on your local machine (Linux, Mac OS X). + +### Prerequisites +1. Install latest version >= 1.6.2 of vagrant from http://www.vagrantup.com/downloads.html +2. Install latest version of Virtual Box from https://www.virtualbox.org/wiki/Downloads +3. Get or build a [binary release](../../getting-started-guides/binary_release.md) + +### Setup + +By default, the Vagrant setup will create a single kubernetes-master and 1 kubernetes-minion. Each VM will take 1 GB, so make sure you have at least 2GB to 4GB of free memory (plus appropriate free disk space). To start your local cluster, open a shell and run: + +``` +cd kubernetes + +export KUBERNETES_PROVIDER=vagrant +cluster/kube-up.sh +``` + +The `KUBERNETES_PROVIDER` environment variable tells all of the various cluster management scripts which variant to use. If you forget to set this, the assumption is you are running on Google Compute Engine. + +Vagrant will provision each machine in the cluster with all the necessary components to run Kubernetes. The initial setup can take a few minutes to complete on each machine. + +By default, each VM in the cluster is running Fedora, and all of the Kubernetes services are installed into systemd. + +To access the master or any minion: + +``` +vagrant ssh master +vagrant ssh minion-1 +``` + +If you are running more than one minion, you can access the others by: + +``` +vagrant ssh minion-2 +vagrant ssh minion-3 +``` + +To view the service status and/or logs on the kubernetes-master: +``` +vagrant ssh master +[vagrant@kubernetes-master ~] $ sudo systemctl status kube-apiserver +[vagrant@kubernetes-master ~] $ sudo journalctl -r -u kube-apiserver + +[vagrant@kubernetes-master ~] $ sudo systemctl status kube-controller-manager +[vagrant@kubernetes-master ~] $ sudo journalctl -r -u kube-controller-manager + +[vagrant@kubernetes-master ~] $ sudo systemctl status etcd +[vagrant@kubernetes-master ~] $ sudo systemctl status nginx +``` + +To view the services on any of the kubernetes-minion(s): +``` +vagrant ssh minion-1 +[vagrant@kubernetes-minion-1] $ sudo systemctl status docker +[vagrant@kubernetes-minion-1] $ sudo journalctl -r -u docker +[vagrant@kubernetes-minion-1] $ sudo systemctl status kubelet +[vagrant@kubernetes-minion-1] $ sudo journalctl -r -u kubelet +``` + +### Interacting with your Kubernetes cluster with Vagrant. + +With your Kubernetes cluster up, you can manage the nodes in your cluster with the regular Vagrant commands. + +To push updates to new Kubernetes code after making source changes: +``` +cluster/kube-push.sh +``` + +To stop and then restart the cluster: +``` +vagrant halt +cluster/kube-up.sh +``` + +To destroy the cluster: +``` +vagrant destroy +``` + +Once your Vagrant machines are up and provisioned, the first thing to do is to check that you can use the `kubectl.sh` script. + +You may need to build the binaries first, you can do this with ```make``` + +``` +$ ./cluster/kubectl.sh get minions + +NAME LABELS +10.245.1.4 +10.245.1.5 +10.245.1.3 + +``` + +### Interacting with your Kubernetes cluster with the `kube-*` scripts. + +Alternatively to using the vagrant commands, you can also use the `cluster/kube-*.sh` scripts to interact with the vagrant based provider just like any other hosting platform for kubernetes. + +All of these commands assume you have set `KUBERNETES_PROVIDER` appropriately: + +``` +export KUBERNETES_PROVIDER=vagrant +``` + +Bring up a vagrant cluster + +``` +cluster/kube-up.sh +``` + +Destroy the vagrant cluster + +``` +cluster/kube-down.sh +``` + +Update the vagrant cluster after you make changes (only works when building your own releases locally): + +``` +cluster/kube-push.sh +``` + +Interact with the cluster + +``` +cluster/kubectl.sh +``` + +### Authenticating with your master + +When using the vagrant provider in Kubernetes, the `cluster/kubectl.sh` script will cache your credentials in a `~/.kubernetes_vagrant_auth` file so you will not be prompted for them in the future. + +``` +cat ~/.kubernetes_vagrant_auth +{ "User": "vagrant", + "Password": "vagrant" + "CAFile": "/home/k8s_user/.kubernetes.vagrant.ca.crt", + "CertFile": "/home/k8s_user/.kubecfg.vagrant.crt", + "KeyFile": "/home/k8s_user/.kubecfg.vagrant.key" +} +``` + +You should now be set to use the `cluster/kubectl.sh` script. For example try to list the minions that you have started with: + +``` +cluster/kubectl.sh get minions +``` + +### Running containers + +Your cluster is running, you can list the minions in your cluster: + +``` +$ cluster/kubectl.sh get minions + +NAME LABELS +10.245.2.4 +10.245.2.3 +10.245.2.2 + +``` + +Now start running some containers! + +You can now use any of the cluster/kube-*.sh commands to interact with your VM machines. +Before starting a container there will be no pods, services and replication controllers. + +``` +$ cluster/kubectl.sh get pods +NAME IMAGE(S) HOST LABELS STATUS + +$ cluster/kubectl.sh get services +NAME LABELS SELECTOR IP PORT + +$ cluster/kubectl.sh get replicationControllers +NAME IMAGE(S SELECTOR REPLICAS +``` + +Start a container running nginx with a replication controller and three replicas + +``` +$ cluster/kubectl.sh run-container my-nginx --image=dockerfile/nginx --replicas=3 --port=80 +``` + +When listing the pods, you will see that three containers have been started and are in Waiting state: + +``` +$ cluster/kubectl.sh get pods +NAME IMAGE(S) HOST LABELS STATUS +781191ff-3ffe-11e4-9036-0800279696e1 dockerfile/nginx 10.245.2.4/10.245.2.4 name=myNginx Waiting +7813c8bd-3ffe-11e4-9036-0800279696e1 dockerfile/nginx 10.245.2.2/10.245.2.2 name=myNginx Waiting +78140853-3ffe-11e4-9036-0800279696e1 dockerfile/nginx 10.245.2.3/10.245.2.3 name=myNginx Waiting +``` + +You need to wait for the provisioning to complete, you can monitor the minions by doing: + +``` +$ sudo salt '*minion-1' cmd.run 'docker images' +kubernetes-minion-1: + REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE + 96864a7d2df3 26 hours ago 204.4 MB + google/cadvisor latest e0575e677c50 13 days ago 12.64 MB + kubernetes/pause latest 6c4579af347b 8 weeks ago 239.8 kB +``` + +Once the docker image for nginx has been downloaded, the container will start and you can list it: + +``` +$ sudo salt '*minion-1' cmd.run 'docker ps' +kubernetes-minion-1: + CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES + dbe79bf6e25b dockerfile/nginx:latest "nginx" 21 seconds ago Up 19 seconds k8s--mynginx.8c5b8a3a--7813c8bd_-_3ffe_-_11e4_-_9036_-_0800279696e1.etcd--7813c8bd_-_3ffe_-_11e4_-_9036_-_0800279696e1--fcfa837f + fa0e29c94501 kubernetes/pause:latest "/pause" 8 minutes ago Up 8 minutes 0.0.0.0:8080->80/tcp k8s--net.a90e7ce4--7813c8bd_-_3ffe_-_11e4_-_9036_-_0800279696e1.etcd--7813c8bd_-_3ffe_-_11e4_-_9036_-_0800279696e1--baf5b21b + aa2ee3ed844a google/cadvisor:latest "/usr/bin/cadvisor - 38 minutes ago Up 38 minutes k8s--cadvisor.9e90d182--cadvisor_-_agent.file--4626b3a2 + 65a3a926f357 kubernetes/pause:latest "/pause" 39 minutes ago Up 39 minutes 0.0.0.0:4194->8080/tcp k8s--net.c5ba7f0e--cadvisor_-_agent.file--342fd561 +``` + +Going back to listing the pods, services and replicationControllers, you now have: + +``` +$ cluster/kubectl.sh get pods +NAME IMAGE(S) HOST LABELS STATUS +781191ff-3ffe-11e4-9036-0800279696e1 dockerfile/nginx 10.245.2.4/10.245.2.4 name=myNginx Running +7813c8bd-3ffe-11e4-9036-0800279696e1 dockerfile/nginx 10.245.2.2/10.245.2.2 name=myNginx Running +78140853-3ffe-11e4-9036-0800279696e1 dockerfile/nginx 10.245.2.3/10.245.2.3 name=myNginx Running + +$ cluster/kubectl.sh get services +NAME LABELS SELECTOR IP PORT + +$ cluster/kubectl.sh get replicationControllers +NAME IMAGE(S SELECTOR REPLICAS +myNginx dockerfile/nginx name=my-nginx 3 +``` + +We did not start any services, hence there are none listed. But we see three replicas displayed properly. +Check the [guestbook](../../examples/guestbook/README.md) application to learn how to create a service. +You can already play with resizing the replicas with: + +``` +$ cluster/kubectl.sh resize rc my-nginx --replicas=2 +$ cluster/kubectl.sh get pods +NAME IMAGE(S) HOST LABELS STATUS +7813c8bd-3ffe-11e4-9036-0800279696e1 dockerfile/nginx 10.245.2.2/10.245.2.2 name=myNginx Running +78140853-3ffe-11e4-9036-0800279696e1 dockerfile/nginx 10.245.2.3/10.245.2.3 name=myNginx Running +``` + +Congratulations! + +### Testing + +The following will run all of the end-to-end testing scenarios assuming you set your environment in cluster/kube-env.sh + +``` +NUM_MINIONS=3 hack/e2e-test.sh +``` + +### Troubleshooting + +#### I keep downloading the same (large) box all the time! + +By default the Vagrantfile will download the box from S3. You can change this (and cache the box locally) by providing an alternate URL when calling `kube-up.sh` + +```bash +export KUBERNETES_BOX_URL=path_of_your_kuber_box +export KUBERNETES_PROVIDER=vagrant +cluster/kube-up.sh +``` + + +#### I just created the cluster, but I am getting authorization errors! + +You probably have an incorrect ~/.kubernetes_vagrant_auth file for the cluster you are attempting to contact. + +``` +rm ~/.kubernetes_vagrant_auth +``` + +After using kubectl.sh make sure that the correct credentials are set: + +``` +cat ~/.kubernetes_vagrant_auth +{ + "User": "vagrant", + "Password": "vagrant" +} +``` + +#### I just created the cluster, but I do not see my container running ! + +If this is your first time creating the cluster, the kubelet on each minion schedules a number of docker pull requests to fetch prerequisite images. This can take some time and as a result may delay your initial pod getting provisioned. + +#### I changed Kubernetes code, but it's not running ! + +Are you sure there was no build error? After running `$ vagrant provision`, scroll up and ensure that each Salt state was completed successfully on each box in the cluster. +It's very likely you see a build error due to an error in your source files! + +#### I have brought Vagrant up but the minions won't validate ! + +Are you sure you built a release first? Did you install `net-tools`? For more clues, login to one of the minions (`vagrant ssh minion-1`) and inspect the salt minion log (`sudo cat /var/log/salt/minion`). + +#### I want to change the number of minions ! + +You can control the number of minions that are instantiated via the environment variable `NUM_MINIONS` on your host machine. If you plan to work with replicas, we strongly encourage you to work with enough minions to satisfy your largest intended replica size. If you do not plan to work with replicas, you can save some system resources by running with a single minion. You do this, by setting `NUM_MINIONS` to 1 like so: + +``` +export NUM_MINIONS=1 +``` + +#### I want my VMs to have more memory ! + +You can control the memory allotted to virtual machines with the `KUBERNETES_MEMORY` environment variable. +Just set it to the number of megabytes you would like the machines to have. For example: + +``` +export KUBERNETES_MEMORY=2048 +``` + +#### I ran vagrant suspend and nothing works! +```vagrant suspend``` seems to mess up the network. It's not supported at this time. -- cgit v1.2.3 From 8a901730fe7348d9bb207233f51a9713b77791b2 Mon Sep 17 00:00:00 2001 From: Adam Dymitruk Date: Mon, 23 Mar 2015 23:51:46 -0700 Subject: Better wording for clean up. Encouraging squashing by default leads to important history being lost. People new to different git flows may be doing themselves and the project a disservice without knowing. --- faster_reviews.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/faster_reviews.md b/faster_reviews.md index 142ac946..a2d00465 100644 --- a/faster_reviews.md +++ b/faster_reviews.md @@ -120,8 +120,8 @@ your whole PR is pretty trivial, you should instead put your fixups into a new commit and re-push. Your reviewer can then look at that commit on its own - so much faster to review than starting over. -We might still ask you to squash commits at the very end, for the sake of a clean -history. +We might still ask you to clean up your commits at the very end, for the sake +of a more readable history. ## 8. KISS, YAGNI, MVP, etc -- cgit v1.2.3 From 1569ae19e6a36a99ef7a70a1b9f1d937deecfee7 Mon Sep 17 00:00:00 2001 From: Maciej Szulik Date: Tue, 24 Mar 2015 12:01:41 +0100 Subject: Fixed markdown --- service_accounts.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/service_accounts.md b/service_accounts.md index 5d86f244..a3a1bb49 100644 --- a/service_accounts.md +++ b/service_accounts.md @@ -1,6 +1,6 @@ #Service Accounts -## Motivation +## Motivation Processes in Pods may need to call the Kubernetes API. For example: - scheduler @@ -20,7 +20,7 @@ They also may interact with services other than the Kubernetes API, such as: ## Design Overview A service account binds together several things: - a *name*, understood by users, and perhaps by peripheral systems, for an identity - - a *principal* that can be authenticated and (authorized)[../authorization.md] + - a *principal* that can be authenticated and [authorized](../authorization.md) - a [security context](./security_contexts.md), which defines the Linux Capabilities, User IDs, Groups IDs, and other capabilities and controls on interaction with the file system and OS. - a set of [secrets](./secrets.md), which a container may use to @@ -60,7 +60,7 @@ This includes a human running `kubectl` on her desktop and a container in a Pod There is already a notion of a username in kubernetes, which is populated into a request context after authentication. However, there is no API object representing a user. While this may evolve, it is expected that in mature installations, -the canonical storage of user identifiers will be handled by a system external to kubernetes. +the canonical storage of user identifiers will be handled by a system external to kubernetes. Kubernetes does not dictate how to divide up the space of user identifier strings. User names can be simple Unix-style short usernames, (e.g. `alice`), or may be qualified to allow for federated identity ( @@ -84,7 +84,7 @@ The distinction is useful for a number of reasons: - A Human typically keeps credentials on a machine that is not part of the cluster and so not subject to automatic management. A VM with a role/service-account can have its credentials automatically managed. - the identity of a Pod cannot in general be mapped to a single human. - - If policy allows, it may be created by one human, and then updated by another, and another, until its behavior cannot be attributed to a single human. + - If policy allows, it may be created by one human, and then updated by another, and another, until its behavior cannot be attributed to a single human. **TODO**: consider getting rid of separate serviceAccount object and just rolling its parts into the SecurityContext or Pod Object. @@ -106,7 +106,7 @@ might have some types that do not do anything on apiserver but just get pushed t ### Pods The `PodSpec` is extended to have a `Pods.Spec.ServiceAccountUsername` field. If this is unset, then a default value is chosen. If it is set, then the corresponding value of `Pods.Spec.SecurityContext` is set by the -Service Account Finalizer (see below). +Service Account Finalizer (see below). TBD: how policy limits which users can make pods with which service accounts. @@ -122,7 +122,7 @@ Service Account Finalizer is one place where this can happen (see below). ### Kubelet The kubelet will treat as "not ready to run" (needing a finalizer to act on it) any Pod which has an empty -SecurityContext. +SecurityContext. The kubelet will set a default, restrictive, security context for any pods created from non-Apiserver config sources (http, file). @@ -141,7 +141,7 @@ like this: **TODO**: example of pod with explicit refs. Another way is with the *Service Account Finalizer*, a plugin process which is optional, and which handles -business logic around service accounts. +business logic around service accounts. The Service Account Finalizer watches Pods, Namespaces, and ServiceAccount definitions. -- cgit v1.2.3 From 337cdac032efaa215feb2d37215d34b1b4bc77ca Mon Sep 17 00:00:00 2001 From: Wojciech Tyczynski Date: Tue, 24 Mar 2015 13:00:26 +0100 Subject: Change "/ns" to "/namespaces" in few remaining places. --- persistent-storage.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/persistent-storage.md b/persistent-storage.md index 5b84ddd2..5907e11d 100644 --- a/persistent-storage.md +++ b/persistent-storage.md @@ -65,12 +65,12 @@ Users attach their claim to their pod using a new ```PersistentVolumeClaimVolume | Action | HTTP Verb | Path | Description | | ---- | ---- | ---- | ---- | -| CREATE | POST | /api/{version}/ns/{ns}/persistentvolumeclaims/ | Create instance of PersistentVolumeClaim in namespace {ns} | -| GET | GET | /api/{version}/ns/{ns}/persistentvolumeclaims/{name} | Get instance of PersistentVolumeClaim in namespace {ns} with {name} | -| UPDATE | PUT | /api/{version}/ns/{ns}/persistentvolumeclaims/{name} | Update instance of PersistentVolumeClaim in namespace {ns} with {name} | -| DELETE | DELETE | /api/{version}/ns/{ns}/persistentvolumeclaims/{name} | Delete instance of PersistentVolumeClaim in namespace {ns} with {name} | -| LIST | GET | /api/{version}/ns/{ns}/persistentvolumeclaims | List instances of PersistentVolumeClaim in namespace {ns} | -| WATCH | GET | /api/{version}/watch/ns/{ns}/persistentvolumeclaims | Watch for changes to PersistentVolumeClaim in namespace {ns} | +| CREATE | POST | /api/{version}/namespaces/{ns}/persistentvolumeclaims/ | Create instance of PersistentVolumeClaim in namespace {ns} | +| GET | GET | /api/{version}/namespaces/{ns}/persistentvolumeclaims/{name} | Get instance of PersistentVolumeClaim in namespace {ns} with {name} | +| UPDATE | PUT | /api/{version}/namespaces/{ns}/persistentvolumeclaims/{name} | Update instance of PersistentVolumeClaim in namespace {ns} with {name} | +| DELETE | DELETE | /api/{version}/namespaces/{ns}/persistentvolumeclaims/{name} | Delete instance of PersistentVolumeClaim in namespace {ns} with {name} | +| LIST | GET | /api/{version}/namespaces/{ns}/persistentvolumeclaims | List instances of PersistentVolumeClaim in namespace {ns} | +| WATCH | GET | /api/{version}/watch/namespaces/{ns}/persistentvolumeclaims | Watch for changes to PersistentVolumeClaim in namespace {ns} | -- cgit v1.2.3 From 4d946b3353672a2b27cde2aed92b0ab7abbd7c10 Mon Sep 17 00:00:00 2001 From: Rohit Jnagal Date: Wed, 25 Mar 2015 17:54:23 +0000 Subject: Add a pointer to kubernetes-dev to API changes doc. --- api_changes.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/api_changes.md b/api_changes.md index c1005278..be02e16c 100644 --- a/api_changes.md +++ b/api_changes.md @@ -284,6 +284,12 @@ you'll figure it out. Put `grep` or `ack` to good use. If you added functionality, you should consider documenting it and/or writing an example to illustrate your change. +## Incompatible API changes +If your change is going to be backward incompatible or might be a breaking change for API +consumers, please send an announcement to `kubernetes-dev@googlegroups.com` before +the change gets in. If you are unsure, ask. Also make sure that the change gets documented in +`CHANGELOG.md` for the next release. + ## Adding new REST objects TODO(smarterclayton): write this. -- cgit v1.2.3 From 6e8f790f1c1def2f1cbce19ae29027008cf38b91 Mon Sep 17 00:00:00 2001 From: Mark Maglana Date: Wed, 25 Mar 2015 14:54:16 -0700 Subject: Fix confusing use of "comprise" The word "comprise" means "be composed of" or "contain" so "applications comprised of multiple containers" would mean "applications composed of of multiple containers" or "applications contained of multiple containers" which is confusing. I understand that this is nitpicking and that "comprise" has a new meaning which is the opposite of its original definition just like how "literally" now means "figuratively" to some people. However, I believe that clarity is of utmost importance in technical documentation which is why I'm proposing this change. --- access.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/access.md b/access.md index 8a2f1edd..9de4d6c8 100644 --- a/access.md +++ b/access.md @@ -15,7 +15,7 @@ Each of these can act as normal users or attackers. - External Users: People who are accessing applications running on K8s (e.g. a web site served by webserver running in a container on K8s), but who do not have K8s API access. - K8s Users : People who access the K8s API (e.g. create K8s API objects like Pods) - K8s Project Admins: People who manage access for some K8s Users - - K8s Cluster Admins: People who control the machines, networks, or binaries that comprise a K8s cluster. + - K8s Cluster Admins: People who control the machines, networks, or binaries that make up a K8s cluster. - K8s Admin means K8s Cluster Admins and K8s Project Admins taken together. ### Threats -- cgit v1.2.3 From 636062818feee072bf5eac1636bff2df1f9e4848 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ant=C3=B3nio=20Meireles?= Date: Mon, 30 Mar 2015 14:42:20 +0100 Subject: remove remaining references to containerized cadvisor. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit since GoogleCloudPlatform/kubernetes#5308 got merged cadvisor facilities are built-in in kubelet, so time to update the 'screenshots'... Signed-off-by: António Meireles --- developer-guides/vagrant.md | 3 --- 1 file changed, 3 deletions(-) diff --git a/developer-guides/vagrant.md b/developer-guides/vagrant.md index 47236381..8e439009 100644 --- a/developer-guides/vagrant.md +++ b/developer-guides/vagrant.md @@ -201,7 +201,6 @@ $ sudo salt '*minion-1' cmd.run 'docker images' kubernetes-minion-1: REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE 96864a7d2df3 26 hours ago 204.4 MB - google/cadvisor latest e0575e677c50 13 days ago 12.64 MB kubernetes/pause latest 6c4579af347b 8 weeks ago 239.8 kB ``` @@ -213,8 +212,6 @@ kubernetes-minion-1: CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES dbe79bf6e25b dockerfile/nginx:latest "nginx" 21 seconds ago Up 19 seconds k8s--mynginx.8c5b8a3a--7813c8bd_-_3ffe_-_11e4_-_9036_-_0800279696e1.etcd--7813c8bd_-_3ffe_-_11e4_-_9036_-_0800279696e1--fcfa837f fa0e29c94501 kubernetes/pause:latest "/pause" 8 minutes ago Up 8 minutes 0.0.0.0:8080->80/tcp k8s--net.a90e7ce4--7813c8bd_-_3ffe_-_11e4_-_9036_-_0800279696e1.etcd--7813c8bd_-_3ffe_-_11e4_-_9036_-_0800279696e1--baf5b21b - aa2ee3ed844a google/cadvisor:latest "/usr/bin/cadvisor - 38 minutes ago Up 38 minutes k8s--cadvisor.9e90d182--cadvisor_-_agent.file--4626b3a2 - 65a3a926f357 kubernetes/pause:latest "/pause" 39 minutes ago Up 39 minutes 0.0.0.0:4194->8080/tcp k8s--net.c5ba7f0e--cadvisor_-_agent.file--342fd561 ``` Going back to listing the pods, services and replicationControllers, you now have: -- cgit v1.2.3 From d60aa36171ee57c3a2d0b02a8285c5f0e6107e9f Mon Sep 17 00:00:00 2001 From: Eric Tune Date: Fri, 6 Mar 2015 09:54:46 -0800 Subject: Proposed guidelines for new Getting-started-guides. # *** ERROR: *** docs are out of sync between cli and markdown # run hack/run-gendocs.sh > docs/kubectl.md to regenerate # # Your commit will be aborted unless you regenerate docs. COMMIT_BLOCKED_ON_GENDOCS --- development.md | 11 +++++ writing-a-getting-started-guide.md | 99 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 110 insertions(+) create mode 100644 writing-a-getting-started-guide.md diff --git a/development.md b/development.md index ef7c7ce8..7972eef6 100644 --- a/development.md +++ b/development.md @@ -227,6 +227,17 @@ go run hack/e2e.go -v -ctl='get events' go run hack/e2e.go -v -ctl='delete pod foobar' ``` +## Conformance testing +End-to-end testing, as described above, is for [development +distributions](../../docs/devel/writing-a-getting-started-guide.md). A conformance test is used on +a [versioned distro](../../docs/devel/writing-a-getting-started-guide.md). + +The conformance test runs a subset of the e2e-tests against a manually-created cluster. It does not +require support for up/push/down and other operations. To run a conformance test, you need to know the +IP of the master for your cluster and the authorization arguments to use. The conformance test is +intended to run against a cluster at a specific binary release of Kubernetes. +See [conformance-test.sh](../../hack/conformance-test.sh). + ## Testing out flaky tests [Instructions here](flaky-tests.md) diff --git a/writing-a-getting-started-guide.md b/writing-a-getting-started-guide.md new file mode 100644 index 00000000..7c837351 --- /dev/null +++ b/writing-a-getting-started-guide.md @@ -0,0 +1,99 @@ +# Writing a Getting Started Guide +This page gives some advice for anyone planning to write or update a Getting Started Guide for Kubernetes. +It also gives some guidelines which reviewers should follow when reviewing a pull request for a +guide. + +A Getting Started Guide is instructions on how to create a Kubernetes cluster on top of a particular +type(s) of infrastructure. Infrastructure includes: the IaaS provider for VMs; +the node OS; inter-node networking; and node Configuration Management system. +A guide refers to scripts, Configuration Manangement files, and/or binary assets such as RPMs. We call +the combination of all these things needed to run on a particular type of infrastructure a +**distro**. + +[The Matrix](../../docs/getting-started-guides/README.md) lists the distros. If there is already a guide +which is similar to the one you have planned, consider improving that one. + + +Distros fall into two categories: + - **versioned distros** are tested to work with a particular binary release of Kubernetes. These + come in a wide variety, reflecting a wide range of ideas and preferences in how to run a cluster. + - **development distros** are tested work with the latest Kubernetes source code. But, there are + relatively few of these and the bar is much higher for creating one. + +There are different guidelines for each. + +## Versioned Distro Guidelines +These guidelines say *what* to do. See the Rationale section for *why*. + - Send us a PR. + - Put the instructions in `docs/getting-started-guides/...`. Scripts go there too. This helps devs easily + search for uses of flags by guides. + - We may ask that you host binary assets or large amounts of code in our `contrib` directory or on your + own repo. + - Setup a cluster and run the [conformance test](../../docs/devel/conformance-test.md) against it, and report the + results in your PR. + - Add or update a row in [The Matrix](../../docs/getting-started-guides/README.md). + - State the binary version of kubernetes that you tested clearly in your Guide doc and in The Matrix. + - Even if you are just updating the binary version used, please still do a conformance test. + - If it worked before and now fails, you can ask on IRC, + check the release notes since your last tested version, or look at git -logs for files in other distros + that are updated to the new version. + - Versioned distros should typically not modify or add code in `cluster/`. That is just scripts for developer + distros. + - If a versioned distro has not been updated for many binary releases, it may be dropped frome the Matrix. + +If you have a cluster partially working, but doing all the above steps seems like too much work, +we still want to hear from you. We suggest you write a blog post or a Gist, and we will link to it on our wiki page. +Just file an issue or chat us on IRC and one of the committers will link to it from the wiki. + +## Development Distro Guidelines +These guidelines say *what* to do. See the Rationale section for *why*. + - the main reason to add a new development distro is to support a new IaaS provider (VM and + network management). This means implementing a new `pkg/cloudprovider/$IAAS_NAME`. + - Development distros should use Saltstack for Configuration Management. + - development distros need to support automated cluster creation, deletion, upgrading, etc. + This mean writing scripts in `cluster/$IAAS_NAME`. + - all commits to the tip of this repo need to not break any of the development distros + - the author of the change is responsible for making changes necessary on all the cloud-providers if the + change affects any of them, and reverting the change if it breaks any of the CIs. + - a development distro needs to have an organization which owns it. This organization needs to: + - Setting up and maintaining Continuous Integration that runs e2e frequently (multiple times per day) against the + Distro at head, and which notifies all devs of breakage. + - being reasonably available for questions and assiting with + refactoring and feature additions that affect code for their IaaS. + +## Rationale + - We want want people to create Kubernetes clusters with whatever IaaS, Node OS, + configuration management tools, and so on, which they are familiar with. The + guidelines for **versioned distros** are designed for flexiblity. + - We want developers to be able to work without understanding all the permutations of + IaaS, NodeOS, and configuration management. The guidelines for **developer distros** are designed + for consistency. + - We want users to have a uniform experience with Kubernetes whenever they follow instructions anywhere + in our Github repository. So, we ask that versioned distros pass a **conformance test** to make sure + really work. + - We ask versioned distros to **clearly state a version**. People pulling from Github may + expect any instructions there to work at Head, so stuff that has not been tested at Head needs + to be called out. We are still changing things really fast, and, while the REST API is versioned, + it is not practical at this point to version or limit changes that affect distros. We still change + flags at the Kubernetes/Infrastructure interface. + - We want to **limit the number of development distros** for several reasons. Developers should + only have to change a limited number of places to add a new feature. Also, since we will + gate commits on passing CI for all distros, and since end-to-end tests are typically somewhat + flaky, it would be highly likely for there to be false positives and CI backlogs with many CI pipelines. + - We do not require versioned distros to do **CI** for several reasons. It is a steep + learning curve to understand our our automated testing scripts. And it is considerable effort + to fully automate setup and teardown of a cluster, which is needed for CI. And, not everyone + has the time and money to run CI. We do not want to + discourage people from writing and sharing guides because of this. + - Versioned distro authors are free to run their own CI and let us know if there is breakage, but we + will not include them as commit hooks -- there cannot be so many commit checks that it is impossible + to pass them all. + - We prefer a single Configuration Management tool for development distros. If there were more + than one, the core developers would have to learn multiple tools and update config in multiple + places. **Saltstack** happens to be the one we picked when we started the project. We + welcome versioned distros that use any tool; there are already examples of + CoreOS Fleet, Ansible, and others. + - You can still run code from head or your own branch + if you use another Configuration Management tool -- you just have to do some manual steps + during testing and deployment. + -- cgit v1.2.3 From 73ec8632c4acb601abe0fd66903ce1ceacecf578 Mon Sep 17 00:00:00 2001 From: Piotr Szczesniak Date: Fri, 27 Mar 2015 11:15:47 +0100 Subject: Changed merge policy --- collab.md | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/collab.md b/collab.md index f9f12e25..b8781519 100644 --- a/collab.md +++ b/collab.md @@ -6,13 +6,9 @@ Kubernetes is open source, but many of the people working on it do so as their d First and foremost: as a potential contributor, your changes and ideas are welcome at any hour of the day or night, weekdays, weekends, and holidays. Please do not ever hesitate to ask a question or send a PR. -## Timezones and calendars - -For the time being, most of the people working on this project are in the US and on Pacific time. Any times mentioned henceforth will refer to this timezone. Any references to "work days" will refer to the US calendar. - ## Code reviews -All changes must be code reviewed. For non-maintainers this is obvious, since you can't commit anyway. But even for maintainers, we want all changes to get at least one review, preferably from someone who knows the areas the change touches. For non-trivial changes we may want two reviewers. The primary reviewer will make this decision and nominate a second reviewer, if needed. Except for trivial changes, PRs should sit for at least 2 hours to allow for wider review. +All changes must be code reviewed. For non-maintainers this is obvious, since you can't commit anyway. But even for maintainers, we want all changes to get at least one review, preferably (for non-trivial changes obligately) from someone who knows the areas the change touches. For non-trivial changes we may want two reviewers. The primary reviewer will make this decision and nominate a second reviewer, if needed. Except for trivial changes, PRs should not be committed until relevant parties (e.g. owners of the subsystem affected by the PR) have had a reasonable chance to look at PR in their local business hours. Most PRs will find reviewers organically. If a maintainer intends to be the primary reviewer of a PR they should set themselves as the assignee on GitHub and say so in a reply to the PR. Only the primary reviewer of a change should actually do the merge, except in rare cases (e.g. they are unavailable in a reasonable timeframe). @@ -28,7 +24,7 @@ Maintainers can assign reviews to other maintainers, when appropriate. The assig ## Merge hours -Maintainers will do merges between the hours of 7:00 am Monday and 7:00 pm (19:00h) Friday. PRs that arrive over the weekend or on holidays will only be merged if there is a very good reason for it and if the code review requirements have been met. +Maintainers will do merges of appropriately reviewed-and-approved changes during their local "business hours" (typically 7:00 am Monday to 5:00 pm (17:00h) Friday). PRs that arrive over the weekend or on holidays will only be merged if there is a very good reason for it and if the code review requirements have been met. Concretely this means that nobody should merge changes immediately before going to bed for the night. There may be discussion an even approvals granted outside of the above hours, but merges will generally be deferred. -- cgit v1.2.3 From 2108ead7d3ed6cdcea54aa73e6fb913010d7849c Mon Sep 17 00:00:00 2001 From: Tamer Tas Date: Wed, 1 Apr 2015 00:56:20 +0300 Subject: Fix typo in Secrets --- secrets.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/secrets.md b/secrets.md index d47d6092..3c61de68 100644 --- a/secrets.md +++ b/secrets.md @@ -277,7 +277,7 @@ type Secret struct { // representing the arbitrary (possibly non-string) data value here. Data map[string][]byte `json:"data,omitempty"` - // Used to facilitate programatic handling of secret data. + // Used to facilitate programmatic handling of secret data. Type SecretType `json:"type,omitempty"` } -- cgit v1.2.3 From f08e73cb56a68974b3be06c97bce7ec8dab1c786 Mon Sep 17 00:00:00 2001 From: Tamer Tas Date: Wed, 1 Apr 2015 01:18:49 +0300 Subject: Fix typo in Secrets design document --- secrets.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/secrets.md b/secrets.md index 3c61de68..965f6e90 100644 --- a/secrets.md +++ b/secrets.md @@ -21,7 +21,7 @@ Goals of this design: ## Constraints and Assumptions * This design does not prescribe a method for storing secrets; storage of secrets should be - pluggable to accomodate different use-cases + pluggable to accommodate different use-cases * Encryption of secret data and node security are orthogonal concerns * It is assumed that node and master are secure and that compromising their security could also compromise secrets: @@ -375,7 +375,7 @@ a tmpfs file system of that size to store secret data. Rough accounting of spec For use-cases where the Kubelet's behavior is affected by the secrets associated with a pod's `ServiceAccount`, the Kubelet will need to be changed. For example, if secrets of type `docker-reg-auth` affect how the pod's images are pulled, the Kubelet will need to be changed -to accomodate this. Subsequent proposals can address this on a type-by-type basis. +to accommodate this. Subsequent proposals can address this on a type-by-type basis. ## Examples -- cgit v1.2.3 From fcd666c840cb67c79ecdd8b0ef5116272644fb48 Mon Sep 17 00:00:00 2001 From: goltermann Date: Wed, 1 Apr 2015 13:00:37 -0700 Subject: Update issues.md Updating priority definitions - open for discussion if there are other opinions. --- issues.md | 14 ++++++-------- 1 file changed, 6 insertions(+), 8 deletions(-) diff --git a/issues.md b/issues.md index 491dba49..f2db3277 100644 --- a/issues.md +++ b/issues.md @@ -8,14 +8,12 @@ Priorities We will use GitHub issue labels for prioritization. The absence of a priority label means the bug has not been reviewed and prioritized yet. -Priorities are "moment in time" labels, and what is low priority today, could be high priority tomorrow, and vice versa. As we move to v1.0, we may decide certain bugs aren't actually needed yet, or that others really do need to be pulled in. - -Here we define the priorities for up until v1.0. Once the Kubernetes project hits 1.0, we will revisit the scheme and update as appropriate. - Definitions ----------- * P0 - something broken for users, build broken, or critical security issue. Someone must drop everything and work on it. -* P1 - must fix for earliest possible OSS binary release (every two weeks) -* P2 - must fix for v1.0 release - will block the release -* P3 - post v1.0 -* untriaged - anything without a Priority/PX label will be considered untriaged \ No newline at end of file +* P1 - must fix for earliest possible binary release (every two weeks) +* P2 - should be fixed in next major relase version +* P3 - default priority for lower importance bugs that we still want to track and plan to fix at some point +* design - priority/design is for issues that are used to track design discussions +* support - priority/support is used for issues tracking user support requests +* untriaged - anything without a priority/X label will be considered untriaged -- cgit v1.2.3 From 149d7ab358aa8c6f190e865daf0cdb846de8b2d0 Mon Sep 17 00:00:00 2001 From: derekwaynecarr Date: Wed, 1 Apr 2015 16:40:27 -0400 Subject: Update design doc for limit range change --- admission_control_limit_range.md | 22 ++++++++++++++++------ 1 file changed, 16 insertions(+), 6 deletions(-) diff --git a/admission_control_limit_range.md b/admission_control_limit_range.md index e3a56c87..3f2ccd7b 100644 --- a/admission_control_limit_range.md +++ b/admission_control_limit_range.md @@ -25,6 +25,8 @@ type LimitRangeItem struct { Max ResourceList `json:"max,omitempty"` // Min usage constraints on this kind by resource name Min ResourceList `json:"min,omitempty"` + // Default usage constraints on this kind by resource name + Default ResourceList `json:"default,omitempty"` } // LimitRangeSpec defines a min/max usage limit for resources that match on kind @@ -74,6 +76,14 @@ The following min/max limits are imposed: | cpu | Min/Max amount of cpu per pod | | memory | Min/Max amount of memory per pod | +If a resource specifies a default value, it may get applied on the incoming resource. For example, if a default +value is provided for container cpu, it is set on the incoming container if and only if the incoming container +does not specify a resource requirements limit field. + +If a resource specifies a min value, it may get applied on the incoming resource. For example, if a min +value is provided for container cpu, it is set on the incoming container if and only if the incoming container does +not specify a resource requirements requests field. + If the incoming object would cause a violation of the enumerated constraints, the request is denied with a set of messages explaining what constraints were the source of the denial. @@ -105,12 +115,12 @@ NAME limits $ kubectl describe limits limits Name: limits -Type Resource Min Max ----- -------- --- --- -Pod memory 1Mi 1Gi -Pod cpu 250m 2 -Container memory 1Mi 1Gi -Container cpu 250m 2 +Type Resource Min Max Default +---- -------- --- --- --- +Pod memory 1Mi 1Gi - +Pod cpu 250m 2 - +Container memory 1Mi 1Gi 1Mi +Container cpu 250m 250m 250m ``` ## Future Enhancements: Define limits for a particular pod or container. -- cgit v1.2.3 From 58542e4f17c95567849639312ac58e822f251853 Mon Sep 17 00:00:00 2001 From: Kris Rousey Date: Wed, 1 Apr 2015 14:49:33 -0700 Subject: Changing the case of API to be consistent with surrounding uses. --- identifiers.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/identifiers.md b/identifiers.md index 260c237a..d2e5d5c7 100644 --- a/identifiers.md +++ b/identifiers.md @@ -39,7 +39,7 @@ Name 1. When an object is created via an API, a Name string (a DNS_SUBDOMAIN) must be specified. Name must be non-empty and unique within the apiserver. This enables idempotent and space-unique creation operations. Parts of the system (e.g. replication controller) may join strings (e.g. a base name and a random suffix) to create a unique Name. For situations where generating a name is impractical, some or all objects may support a param to auto-generate a name. Generating random names will defeat idempotency. * Examples: "guestbook.user", "backend-x4eb1" -2. When an object is created via an api, a Namespace string (a DNS_SUBDOMAIN? format TBD via #1114) may be specified. Depending on the API receiver, namespaces might be validated (e.g. apiserver might ensure that the namespace actually exists). If a namespace is not specified, one will be assigned by the API receiver. This assignment policy might vary across API receivers (e.g. apiserver might have a default, kubelet might generate something semi-random). +2. When an object is created via an API, a Namespace string (a DNS_SUBDOMAIN? format TBD via #1114) may be specified. Depending on the API receiver, namespaces might be validated (e.g. apiserver might ensure that the namespace actually exists). If a namespace is not specified, one will be assigned by the API receiver. This assignment policy might vary across API receivers (e.g. apiserver might have a default, kubelet might generate something semi-random). * Example: "api.k8s.example.com" 3. Upon acceptance of an object via an API, the object is assigned a UID (a UUID). UID must be non-empty and unique across space and time. -- cgit v1.2.3 From 5d31ce87c823910d4b40a4be65bc54fa267372b6 Mon Sep 17 00:00:00 2001 From: goltermann Date: Wed, 1 Apr 2015 16:37:42 -0700 Subject: Update issues.md --- issues.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/issues.md b/issues.md index f2db3277..51395cae 100644 --- a/issues.md +++ b/issues.md @@ -12,7 +12,7 @@ Definitions ----------- * P0 - something broken for users, build broken, or critical security issue. Someone must drop everything and work on it. * P1 - must fix for earliest possible binary release (every two weeks) -* P2 - should be fixed in next major relase version +* P2 - should be fixed in next major release version * P3 - default priority for lower importance bugs that we still want to track and plan to fix at some point * design - priority/design is for issues that are used to track design discussions * support - priority/support is used for issues tracking user support requests -- cgit v1.2.3 From 52f4cee414f94ee5fc58cf943b443094e6773094 Mon Sep 17 00:00:00 2001 From: Brendan Burns Date: Thu, 2 Apr 2015 12:05:49 -0700 Subject: Add some more clarity around "controversial" or "complex" PRs and merging. --- collab.md | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/collab.md b/collab.md index b8781519..dd7b8059 100644 --- a/collab.md +++ b/collab.md @@ -28,6 +28,13 @@ Maintainers will do merges of appropriately reviewed-and-approved changes during There may be discussion an even approvals granted outside of the above hours, but merges will generally be deferred. +If a PR is considered complex or controversial, the merge of that PR should be delayed to give all interested parties in all timezones the opportunity to provide feedback. Concretely, this means that such PRs should be held for 24 +hours before merging. Of course "complex" and "controversial" are left to the judgement of the people involved, but we trust that part of being a committer is the judgement required to evaluate such things honestly, and not be +motivated by your desire (or your cube-mate's desire) to get their code merged. Also see "Holds" below, any reviewer can issue a "hold" to indicate that the PR is in fact complicated or complex and deserves further review. + +PRs that are incorrectly judged to be merge-able, may be reverted and subject to re-review, if subsequent reviewers believe that they in fact are controversial or complex. + + ## Holds Any maintainer or core contributor who wants to review a PR but does not have time immediately may put a hold on a PR simply by saying so on the PR discussion and offering an ETA measured in single-digit days at most. Any PR that has a hold shall not be merged until the person who requested the hold acks the review, withdraws their hold, or is overruled by a preponderance of maintainers. -- cgit v1.2.3 From 655bbc697f92fe1229534c80a97a56862a4eb440 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ant=C3=B3nio=20Meireles?= Date: Mon, 6 Apr 2015 20:29:32 +0100 Subject: adding release notes guidelines to the (new) releases policy. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit per the ongoing conversation at GoogleCloudPlatform/kubernetes#6213 Signed-off-by: António Meireles --- releasing.md | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/releasing.md b/releasing.md index 4cdf8827..125355c3 100644 --- a/releasing.md +++ b/releasing.md @@ -150,3 +150,16 @@ not present in Docker `v1.2.0`: (Non-empty output here means the commit is not present on v1.2.0.) ``` +## Release Notes + +No official release should be made final without properly matching release notes. + +There should be made available, per release, a small summary, preamble, of the +major changes, both in terms of feature improvements/bug fixes and notes about +functional feature changes (if any) regarding the previous released version so +that the BOM regarding updating to it gets as obvious and trouble free as possible. + +After this summary, preamble, all the relevant PRs/issues that got in that +version should be listed and linked together with a small summary understandable +by plain mortals (in a perfect world PR/issue's title would be enough but often +it is just too cryptic/geeky/domain-specific that it isn't). -- cgit v1.2.3 From 6b437917d47ad0393745a949c5c24918c9dc0cde Mon Sep 17 00:00:00 2001 From: Quinton Hoole Date: Mon, 9 Mar 2015 19:22:34 -0700 Subject: Cluster Federation RFC. # *** ERROR: *** docs are out of sync between cli and markdown # run hack/run-gendocs.sh > docs/kubectl.md to regenerate # # Your commit will be aborted unless you regenerate docs. COMMIT_BLOCKED_ON_GENDOCS --- federation-high-level-arch.png | Bin 0 -> 31793 bytes federation.md | 425 +++++++++++++++++++++++++++++++++++++++++ 2 files changed, 425 insertions(+) create mode 100644 federation-high-level-arch.png create mode 100644 federation.md diff --git a/federation-high-level-arch.png b/federation-high-level-arch.png new file mode 100644 index 00000000..8a416cc1 Binary files /dev/null and b/federation-high-level-arch.png differ diff --git a/federation.md b/federation.md new file mode 100644 index 00000000..6086b13f --- /dev/null +++ b/federation.md @@ -0,0 +1,425 @@ +#Kubernetes Cluster Federation +##(a.k.a. "Ubernetes") + +## Requirements Analysis and Product Proposal + +## _by Quinton Hoole ([quinton@google.com](mailto:quinton@google.com))_ +_Initial revision: 2015-03-05_ +_Last updated: 2015-03-09_ +This doc: [tinyurl.com/ubernetes](http://tinyurl.com/ubernetes) +Slides: [tinyurl.com/ubernetes-slides](http://tinyurl.com/ubernetes-slides) + +## Introduction + +Today, each Kubernetes cluster is a relatively self-contained unit, +which typically runs in a single "on-premise" data centre or single +availability zone of a cloud provider (Google's GCE, Amazon's AWS, +etc). + +Several current and potential Kubernetes users and customers have +expressed a keen interest in tying together ("federating") multiple +clusters in some sensible way in order to enable the following kinds +of use cases (intentionally vague): + +1. _"Preferentially run my workloads in my on-premise cluster(s), but + automatically overflow to my cloud-hosted cluster(s) if I run out + of on-premise capacity"_. +1. _"Most of my workloads should run in my preferred cloud-hosted + cluster(s), but some are privacy-sensitive, and should be + automatically diverted to run in my secure, on-premise + cluster(s)"_. +1. _"I want to avoid vendor lock-in, so I want my workloads to run + across multiple cloud providers all the time. I change my set of + such cloud providers, and my pricing contracts with them, + periodically"_. +1. _"I want to be immune to any single data centre or cloud + availability zone outage, so I want to spread my service across + multiple such zones (and ideally even across multiple cloud + providers)."_ + +The above use cases are by necessity left imprecisely defined. The +rest of this document explores these use cases and their implications +in further detail, and compares a few alternative high level +approaches to addressing them. The idea of cluster federation has +informally become known as_ "Ubernetes"_. + +## Summary/TL;DR + +TBD + +## What exactly is a Kubernetes Cluster? + +A central design concept in Kubernetes is that of a _cluster_. While +loosely speaking, a cluster can be thought of as running in a single +data center, or cloud provider availability zone, a more precise +definition is that each cluster provides: + +1. a single Kubernetes API entry point, +1. a consistent, cluster-wide resource naming scheme +1. a scheduling/container placement domain +1. a service network routing domain +1. (in future) an authentication and authorization model. +1. .... + +The above in turn imply the need for a relatively performant, reliable +and cheap network within each cluster. + +There is also assumed to be some degree of failure correlation across +a cluster, i.e. whole clusters are expected to fail, at least +occasionally (due to cluster-wide power and network failures, natural +disasters etc). Clusters are often relatively homogenous in that all +compute nodes are typically provided by a single cloud provider or +hardware vendor, and connected by a common, unified network fabric. +But these are not hard requirements of Kubernetes. + +Other classes of Kubernetes deployments than the one sketched above +are technically feasible, but come with some challenges of their own, +and are not yet common or explicitly supported. + +More specifically, having a Kubernetes cluster span multiple +well-connected availability zones within a single geographical region +(e.g. US North East, UK, Japan etc) is worthy of further +consideration, in particular because it potentially addresses + +## What use cases require Cluster Federation? + +Let's name a few concrete use cases to aid the discussion: + +## 1.Capacity Overflow + +_"I want to preferentially run my workloads in my on-premise cluster(s), but automatically "overflow" to my cloud-hosted cluster(s) when I run out of on-premise capacity."_ + +This idea is known in some circles as "[cloudbursting](http://searchcloudcomputing.techtarget.com/definition/cloud-bursting)". + +**Clarifying questions:** What is the unit of overflow? Individual + pods? Probably not always. Replication controllers and their + associated sets of pods? Groups of replication controllers + (a.k.a. distributed applications)? How are persistent disks + overflowed? Can the "overflowed" pods communicate with their + brethren and sistren pods and services in the other cluster(s)? + Presumably yes, at higher cost and latency, provided that they use + external service discovery. Is "overflow" enabled only when creating + new workloads/replication controllers, or are existing workloads + dynamically migrated between clusters based on fluctuating available + capacity? If so, what is the desired behaviour, and how is it + achieved? How, if at all, does this relate to quota enforcement + (e.g. if we run out of on-premise capacity, can all or only some + quotas transfer to other, potentially more expensive off-premise + capacity?) + +It seems that most of this boils down to: + +1. **location affinity** (pods relative to each other, and to other + stateful services like persistent storage - how is this expressed + and enforced?) +1. **cross-cluster scheduling** (given location affinity constraints + and other scheduling policy, which resources are assigned to which + clusters, and by what?) +1. **cross-cluster service discovery** (how do pods in one cluster + discover and communicate with pods in another cluster?) +1. **cross-cluster migration** (how do compute and storage resources, + and the distributed applications to which they belong, move from + one cluster to another) + +## 2. Sensitive Workloads + +_"I want most of my workloads to run in my preferred cloud-hosted +cluster(s), but some are privacy-sensitive, and should be +automatically diverted to run in my secure, on-premise cluster(s). The +list of privacy-sensitive workloads changes over time, and they're +subject to external auditing."_ + +**Clarifying questions:** What kinds of rules determine which + workloads go where? Is a static mapping from container (or more + typically, replication controller) to cluster maintained and + enforced? If so, is it only enforced on startup, or are things + migrated between clusters when the mappings change? This starts to + look quite similar to "1. Capacity Overflow", and again seems to + boil down to: + +1. location affinity +1. cross-cluster scheduling +1. cross-cluster service discovery +1. cross-cluster migration +with the possible addition of: + ++ cross-cluster monitoring and auditing (which is conveniently deemed + to be outside the scope of this document, for the time being at + least) + +## 3. Vendor lock-in avoidance + +_"My CTO wants us to avoid vendor lock-in, so she wants our workloads +to run across multiple cloud providers at all times. She changes our +set of preferred cloud providers and pricing contracts with them +periodically, and doesn't want to have to communicate and manually +enforce these policy changes across the organization every time this +happens. She wants it centrally and automatically enforced, monitored +and audited."_ + +**Clarifying questions:** Again, I think that this can potentially be + reformulated as a Capacity Overflow problem - the fundamental + principles seem to be the same or substantially similar to those + above. + +## 4. "Unavailability Zones" + +_"I want to be immune to any single data centre or cloud availability +zone outage, so I want to spread my service across multiple such zones +(and ideally even across multiple cloud providers), and have my +service remain available even if one of the availability zones or +cloud providers "goes down"_. + +It seems useful to split this into two sub use cases: + +1. Multiple availability zones within a single cloud provider (across + which feature sets like private networks, load balancing, + persistent disks, data snapshots etc are typically consistent and + explicitly designed to inter-operate). +1. Multiple cloud providers (typically with inconsistent feature sets + and more limited interoperability). + +The single cloud provider case might be easier to implement (although +the multi-cloud provider implementation should just work for a single +cloud provider). Propose high-level design catering for both, with +initial implementation targeting single cloud provider only. + +**Clarifying questions:** +**How does global external service discovery work?** In the steady + state, which external clients connect to which clusters? GeoDNS or + similar? What is the tolerable failover latency if a cluster goes + down? Maybe something like (make up some numbers, notwithstanding + some buggy DNS resolvers, TTL's, caches etc) ~3 minutes for ~90% of + clients to re-issue DNS lookups and reconnect to a new cluster when + their home cluster fails is good enough for most Kubernetes users + (or at least way better than the status quo), given that these sorts + of failure only happen a small number of times a year? + +**How does dynamic load balancing across clusters work, if at all?** + One simple starting point might be "it doesn't". i.e. if a service + in a cluster is deemed to be "up", it receives as much traffic as is + generated "nearby" (even if it overloads). If the service is deemed + to "be down" in a given cluster, "all" nearby traffic is redirected + to some other cluster within some number of seconds (failover could + be automatic or manual). Failover is essentially binary. An + improvement would be to detect when a service in a cluster reaches + maximum serving capacity, and dynamically divert additional traffic + to other clusters. But how exactly does all of this work, and how + much of it is provided by Kubernetes, as opposed to something else + bolted on top (e.g. external monitoring and manipulation of GeoDNS)? + +**How does this tie in with auto-scaling of services?** More + specifically, if I run my service across _n_ clusters globally, and + one (or more) of them fail, how do I ensure that the remaining _n-1_ + clusters have enough capacity to serve the additional, failed-over + traffic? Either: + +1. I constantly over-provision all clusters by 1/n (potentially expensive), or +1. I "manually" update my replica count configurations in the + remaining clusters by 1/n when the failure occurs, and Kubernetes + takes care of the rest for me, or +1. Auto-scaling (not yet available) in the remaining clusters takes + care of it for me automagically as the additional failed-over + traffic arrives (with some latency). + +Doing nothing (i.e. forcing users to choose between 1 and 2 on their +own) is probably an OK starting point. Kubernetes autoscaling can get +us to 3 at some later date. + +Up to this point, this use case ("Unavailability Zones") seems materially different from all the others above. It does not require dynamic cross-cluster service migration (we assume that the service is already running in more than one cluster when the failure occurs). Nor does it necessarily involve cross-cluster service discovery or location affinity. As a result, I propose that we address this use case somewhat independently of the others (although I strongly suspect that it will become substantially easier once we've solved the others). + +All of the above (regarding "Unavailibility Zones") refers primarily +to already-running user-facing services, and minimizing the impact on +end users of those services becoming unavailable in a given cluster. +What about the people and systems that deploy Kubernetes services +(devops etc)? Should they be automatically shielded from the impact +of the cluster outage? i.e. have their new resource creation requests +automatically diverted to another cluster during the outage? While +this specific requirement seems non-critical (manual fail-over seems +relatively non-arduous, ignoring the user-facing issues above), it +smells a lot like the first three use cases listed above ("Capacity +Overflow, Sensitive Services, Vendor lock-in..."), so if we address +those, we probably get this one free of charge. + +## Core Challenges of Cluster Federation + +As we saw above, a few common challenges fall out of most of the use +cases considered above, namely: + +## Location Affinity + +Can the pods comprising a single distributed application be +partitioned across more than one cluster? More generally, how far +apart, in network terms, can a given client and server within a +distributed application reasonably be? A server need not necessarily +be a pod, but could instead be a persistent disk housing data, or some +other stateful network service. What is tolerable is typically +application-dependent, primarily influenced by network bandwidth +consumption, latency requirements and cost sensitivity. + +For simplicity, lets assume that all Kubernetes distributed +applications fall into one of 3 categories with respect to relative +location affinity: + +1. **"Strictly Coupled"**: Those applications that strictly cannot be + partitioned between clusters. They simply fail if they are + partitioned. When scheduled, all pods _must_ be scheduled to the + same cluster. To move them, we need to shut the whole distributed + application down (all pods) in one cluster, possibly move some + data, and then bring the up all of the pods in another cluster. To + avoid downtime, we might bring up the replacement cluster and + divert traffic there before turning down the original, but the + principle is much the same. In some cases moving the data might be + prohibitively expensive or time-consuming, in which case these + applications may be effectively _immovable_. +1. **"Strictly Decoupled"**: Those applications that can be + indefinitely partitioned across more than one cluster, to no + disadvantage. An embarrassingly parallel YouTube porn detector, + where each pod repeatedly dequeues a video URL from a remote work + queue, downloads and chews on the video for a few hours, and + arrives at a binary verdict, might be one such example. The pods + derive no benefit from being close to each other, or anything else + (other than the source of YouTube videos, which is assumed to be + equally remote from all clusters in this example). Each pod can be + scheduled independently, in any cluster, and moved at any time. +1. **"Preferentially Coupled"**: Somewhere between Coupled and Decoupled. These applications prefer to have all of their pods located in the same cluster (e.g. for failure correlation, network latency or bandwidth cost reasons), but can tolerate being partitioned for "short" periods of time (for example while migrating the application from one cluster to another). Most small to medium sized LAMP stacks with not-very-strict latency goals probably fall into this category (provided that they use sane service discovery and reconnect-on-fail, which they need to do anyway to run effectively, even in a single Kubernetes cluster). + +And then there's what I'll call _absolute_ location affinity. Some +applications are required to run in bounded geographical or network +topology locations. The reasons for this are typically +political/legislative (data privacy laws etc), or driven by network +proximity to consumers (or data providers) of the application ("most +of our users are in Western Europe, U.S. West Coast" etc). + +**Proposal:** First tackle Strictly Decoupled applications (which can + be trivially scheduled, partitioned or moved, one pod at a time). + Then tackle Preferentially Coupled applications (which must be + scheduled in totality in a single cluster, and can be moved, but + ultimately in total, and necessarily within some bounded time). + Leave strictly coupled applications to be manually moved between + clusters as required for the foreseeable future. + +## Cross-cluster service discovery + +I propose having pods use standard discovery methods used by external clients of Kubernetes applications (i.e. use DNS). DNS might resolve to a public endpoint in the local or a remote cluster. Other than Strictly Coupled applications, software should be largely oblivious of which of the two occurs. +_Aside:_ How do we avoid "tromboning" through an external VIP when DNS +resolves to a public IP on the local cluster? Strictly speaking this +would be an optimization, and probably only matters to high bandwidth, +low latency communications. We could potentially eliminate the +trombone with some kube-proxy magic if necessary. More detail to be +added here, but feel free to shoot down the basic DNS idea in the mean +time. + +## Cross-cluster Scheduling + +This is closely related to location affinity above, and also discussed +there. The basic idea is that some controller, logically outside of +the basic kubernetes control plane of the clusters in question, needs +to be able to: + +1. Receive "global" resource creation requests. +1. Make policy-based decisions as to which cluster(s) should be used + to fulfill each given resource request. In a simple case, the + request is just redirected to one cluster. In a more complex case, + the request is "demultiplexed" into multiple sub-requests, each to + a different cluster. Knowledge of the (albeit approximate) + available capacity in each cluster will be required by the + controller to sanely split the request. Similarly, knowledge of + the properties of the application (Location Affinity class -- + Strictly Coupled, Strictly Decoupled etc, privacy class etc) will + be required. +1. Multiplex the responses from the individual clusters into an + aggregate response. + +## Cross-cluster Migration + +Again this is closely related to location affinity discussed above, +and is in some sense an extension of Cross-cluster Scheduling. When +certain events occur, it becomes necessary or desirable for the +cluster federation system to proactively move distributed applications +(either in part or in whole) from one cluster to another. Examples of +such events include: + +1. A low capacity event in a cluster (or a cluster failure). +1. A change of scheduling policy ("we no longer use cloud provider X"). +1. A change of resource pricing ("cloud provider Y dropped their prices - lets migrate there"). + +Strictly Decoupled applications can be trivially moved, in part or in whole, one pod at a time, to one or more clusters. +For Preferentially Decoupled applications, the federation system must first locate a single cluster with sufficient capacity to accommodate the entire application, then reserve that capacity, and incrementally move the application, one (or more) resources at a time, over to the new cluster, within some bounded time period (and possibly within a predefined "maintenance" window). +Strictly Coupled applications (with the exception of those deemed +completely immovable) require the federation system to: + +1. start up an entire replica application in the destination cluster +1. copy persistent data to the new application instance +1. switch traffic across +1. tear down the original application instance It is proposed that +support for automated migration of Strictly Coupled applications be +deferred to a later date. + +## Other Requirements + +These are often left implicit by customers, but are worth calling out explicitly: + +1. Software failure isolation between Kubernetes clusters should be + retained as far as is practically possible. The federation system + should not materially increase the failure correlation across + clusters. For this reason the federation system should ideally be + completely independent of the Kubernetes cluster control software, + and look just like any other Kubernetes API client, with no special + treatment. If the federation system fails catastrophically, the + underlying Kubernetes clusters should remain independently usable. +1. Unified monitoring, alerting and auditing across federated Kubernetes clusters. +1. Unified authentication, authorization and quota management across + clusters (this is in direct conflict with failure isolation above, + so there are some tough trade-offs to be made here). + +## Proposed High-Level Architecture + +TBD: All very hand-wavey still, but some initial thoughts to get the conversation going... + +![image](federation-high-level-arch.png) + +## Ubernetes API + +This looks a lot like the existing Kubernetes API but is explicitly multi-cluster. + ++ Clusters become first class objects, which can be registered, listed, described, deregistered etc via the API. ++ Compute resources can be explicitly requested in specific clusters, or automatically scheduled to the "best" cluster by Ubernetes (by a pluggable Policy Engine). ++ There is a federated equivalent of a replication controller type, which is multicluster-aware, and delegates to cluster-specific replication controllers as required (e.g. a federated RC for n replicas might simply spawn multiple replication controllers in different clusters to do the hard work). ++ These federated replication controllers (and in fact all the + services comprising the Ubernetes Control Plane) have to run + somewhere. For high availability Ubernetes deployments, these + services may run in a dedicated Kubernetes cluster, not physically + co-located with any of the federated clusters. But for simpler + deployments, they may be run in one of the federated clusters (but + when that cluster goes down, Ubernetes is down, obviously). + +## Policy Engine and Migration/Replication Controllers + +The Policy Engine decides which parts of each application go into each +cluster at any point in time, and stores this desired state in the +Desired Federation State store (an etcd or +similar). Migration/Replication Controllers reconcile this against the +desired states stored in the underlying Kubernetes clusters (by +watching both, and creating or updating the underlying Replication +Controllers and related Services accordingly). + +## Authentication and Authorization + +This should ideally be delegated to some external auth system, shared +by the underlying clusters, to avoid duplication and inconsistency. +Either that, or we end up with multilevel auth. Local readonly +eventually consistent auth slaves in each cluster and in Ubernetes +could potentially cache auth, to mitigate an SPOF auth system. + +## Proposed Next Steps + +Identify concrete applications of each use case and configure a proof +of concept service that exercises the use case. For example, cluster +failure tolerance seems popular, so set up an apache frontend with +replicas in each of 3 availability zones with either an Amazon Elastic +Load Balancer or Google Cloud Load Balancer pointing at them? What +does the zookeeper config look like for N=3 across 3 AZs -- and how +does each replica find the other replicas and how do clients find +their primary zookeeper replica? And now how do I do a shared, highly +available redis database? -- cgit v1.2.3 From 2375fb9e51bf693abe62554e0c1ca8c3f0719328 Mon Sep 17 00:00:00 2001 From: Robert Bailey Date: Wed, 15 Apr 2015 20:50:00 -0700 Subject: Add documentation to help new contributors with write access from accidentally pushing upstream. --- development.md | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/development.md b/development.md index 7972eef6..bbd94fef 100644 --- a/development.md +++ b/development.md @@ -256,6 +256,13 @@ git fetch upstream git rebase upstream/master ``` +If you have write access to the main repository, you should modify your git configuration so that +you can't accidentally push to upstream: + +``` +git remote set-url --push upstream no_push +``` + ## Regenerating the CLI documentation ``` -- cgit v1.2.3 From a1a9378784e8171b03faa946bdeb4ee7cbf2f490 Mon Sep 17 00:00:00 2001 From: Ian Miell Date: Thu, 16 Apr 2015 14:24:29 +0100 Subject: Minor pit-nicks --- federation.md | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/federation.md b/federation.md index 6086b13f..df9f37eb 100644 --- a/federation.md +++ b/federation.md @@ -80,6 +80,7 @@ More specifically, having a Kubernetes cluster span multiple well-connected availability zones within a single geographical region (e.g. US North East, UK, Japan etc) is worthy of further consideration, in particular because it potentially addresses +some of these requirements. ## What use cases require Cluster Federation? @@ -224,7 +225,7 @@ initial implementation targeting single cloud provider only. Doing nothing (i.e. forcing users to choose between 1 and 2 on their own) is probably an OK starting point. Kubernetes autoscaling can get -us to 3 at some later date. +us to three at some later date. Up to this point, this use case ("Unavailability Zones") seems materially different from all the others above. It does not require dynamic cross-cluster service migration (we assume that the service is already running in more than one cluster when the failure occurs). Nor does it necessarily involve cross-cluster service discovery or location affinity. As a result, I propose that we address this use case somewhat independently of the others (although I strongly suspect that it will become substantially easier once we've solved the others). @@ -258,7 +259,7 @@ application-dependent, primarily influenced by network bandwidth consumption, latency requirements and cost sensitivity. For simplicity, lets assume that all Kubernetes distributed -applications fall into one of 3 categories with respect to relative +applications fall into one of three categories with respect to relative location affinity: 1. **"Strictly Coupled"**: Those applications that strictly cannot be @@ -301,7 +302,7 @@ of our users are in Western Europe, U.S. West Coast" etc). ## Cross-cluster service discovery -I propose having pods use standard discovery methods used by external clients of Kubernetes applications (i.e. use DNS). DNS might resolve to a public endpoint in the local or a remote cluster. Other than Strictly Coupled applications, software should be largely oblivious of which of the two occurs. +I propose having pods use standard discovery methods used by external clients of Kubernetes applications (i.e. DNS). DNS might resolve to a public endpoint in the local or a remote cluster. Other than Strictly Coupled applications, software should be largely oblivious of which of the two occurs. _Aside:_ How do we avoid "tromboning" through an external VIP when DNS resolves to a public IP on the local cluster? Strictly speaking this would be an optimization, and probably only matters to high bandwidth, @@ -352,8 +353,9 @@ completely immovable) require the federation system to: 1. start up an entire replica application in the destination cluster 1. copy persistent data to the new application instance 1. switch traffic across -1. tear down the original application instance It is proposed that -support for automated migration of Strictly Coupled applications be +1. tear down the original application instance + +It is proposed that support for automated migration of Strictly Coupled applications be deferred to a later date. ## Other Requirements @@ -417,7 +419,7 @@ could potentially cache auth, to mitigate an SPOF auth system. Identify concrete applications of each use case and configure a proof of concept service that exercises the use case. For example, cluster failure tolerance seems popular, so set up an apache frontend with -replicas in each of 3 availability zones with either an Amazon Elastic +replicas in each of three availability zones with either an Amazon Elastic Load Balancer or Google Cloud Load Balancer pointing at them? What does the zookeeper config look like for N=3 across 3 AZs -- and how does each replica find the other replicas and how do clients find -- cgit v1.2.3 From 95d56057379a06eb74fe16a4992729eaa844d38f Mon Sep 17 00:00:00 2001 From: Tim Hockin Date: Thu, 16 Apr 2015 09:11:47 -0700 Subject: Stop using dockerfile/* images As per http://blog.docker.com/2015/03/updates-available-to-popular-repos-update-your-images/ docker has stopped answering dockerfile/redis and dockerfile/nginx. Fix all users in our tree. Sadly this means a lot of published examples are now broken. --- persistent-storage.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/persistent-storage.md b/persistent-storage.md index 5907e11d..45ab8d42 100644 --- a/persistent-storage.md +++ b/persistent-storage.md @@ -184,7 +184,7 @@ metadata: name: mypod spec: containers: - - image: dockerfile/nginx + - image: nginx name: myfrontend volumeMounts: - mountPath: "/var/www/html" -- cgit v1.2.3 From 457acee81e7566f1baa3ecc9056032eef7317b5e Mon Sep 17 00:00:00 2001 From: Tim Hockin Date: Thu, 16 Apr 2015 09:11:47 -0700 Subject: Stop using dockerfile/* images As per http://blog.docker.com/2015/03/updates-available-to-popular-repos-update-your-images/ docker has stopped answering dockerfile/redis and dockerfile/nginx. Fix all users in our tree. Sadly this means a lot of published examples are now broken. --- developer-guides/vagrant.md | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/developer-guides/vagrant.md b/developer-guides/vagrant.md index 8e439009..ab0ef274 100644 --- a/developer-guides/vagrant.md +++ b/developer-guides/vagrant.md @@ -181,7 +181,7 @@ NAME IMAGE(S SELECTOR REPLICAS Start a container running nginx with a replication controller and three replicas ``` -$ cluster/kubectl.sh run-container my-nginx --image=dockerfile/nginx --replicas=3 --port=80 +$ cluster/kubectl.sh run-container my-nginx --image=nginx --replicas=3 --port=80 ``` When listing the pods, you will see that three containers have been started and are in Waiting state: @@ -189,9 +189,9 @@ When listing the pods, you will see that three containers have been started and ``` $ cluster/kubectl.sh get pods NAME IMAGE(S) HOST LABELS STATUS -781191ff-3ffe-11e4-9036-0800279696e1 dockerfile/nginx 10.245.2.4/10.245.2.4 name=myNginx Waiting -7813c8bd-3ffe-11e4-9036-0800279696e1 dockerfile/nginx 10.245.2.2/10.245.2.2 name=myNginx Waiting -78140853-3ffe-11e4-9036-0800279696e1 dockerfile/nginx 10.245.2.3/10.245.2.3 name=myNginx Waiting +781191ff-3ffe-11e4-9036-0800279696e1 nginx 10.245.2.4/10.245.2.4 name=myNginx Waiting +7813c8bd-3ffe-11e4-9036-0800279696e1 nginx 10.245.2.2/10.245.2.2 name=myNginx Waiting +78140853-3ffe-11e4-9036-0800279696e1 nginx 10.245.2.3/10.245.2.3 name=myNginx Waiting ``` You need to wait for the provisioning to complete, you can monitor the minions by doing: @@ -210,7 +210,7 @@ Once the docker image for nginx has been downloaded, the container will start an $ sudo salt '*minion-1' cmd.run 'docker ps' kubernetes-minion-1: CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES - dbe79bf6e25b dockerfile/nginx:latest "nginx" 21 seconds ago Up 19 seconds k8s--mynginx.8c5b8a3a--7813c8bd_-_3ffe_-_11e4_-_9036_-_0800279696e1.etcd--7813c8bd_-_3ffe_-_11e4_-_9036_-_0800279696e1--fcfa837f + dbe79bf6e25b nginx:latest "nginx" 21 seconds ago Up 19 seconds k8s--mynginx.8c5b8a3a--7813c8bd_-_3ffe_-_11e4_-_9036_-_0800279696e1.etcd--7813c8bd_-_3ffe_-_11e4_-_9036_-_0800279696e1--fcfa837f fa0e29c94501 kubernetes/pause:latest "/pause" 8 minutes ago Up 8 minutes 0.0.0.0:8080->80/tcp k8s--net.a90e7ce4--7813c8bd_-_3ffe_-_11e4_-_9036_-_0800279696e1.etcd--7813c8bd_-_3ffe_-_11e4_-_9036_-_0800279696e1--baf5b21b ``` @@ -219,16 +219,16 @@ Going back to listing the pods, services and replicationControllers, you now hav ``` $ cluster/kubectl.sh get pods NAME IMAGE(S) HOST LABELS STATUS -781191ff-3ffe-11e4-9036-0800279696e1 dockerfile/nginx 10.245.2.4/10.245.2.4 name=myNginx Running -7813c8bd-3ffe-11e4-9036-0800279696e1 dockerfile/nginx 10.245.2.2/10.245.2.2 name=myNginx Running -78140853-3ffe-11e4-9036-0800279696e1 dockerfile/nginx 10.245.2.3/10.245.2.3 name=myNginx Running +781191ff-3ffe-11e4-9036-0800279696e1 nginx 10.245.2.4/10.245.2.4 name=myNginx Running +7813c8bd-3ffe-11e4-9036-0800279696e1 nginx 10.245.2.2/10.245.2.2 name=myNginx Running +78140853-3ffe-11e4-9036-0800279696e1 nginx 10.245.2.3/10.245.2.3 name=myNginx Running $ cluster/kubectl.sh get services NAME LABELS SELECTOR IP PORT $ cluster/kubectl.sh get replicationControllers NAME IMAGE(S SELECTOR REPLICAS -myNginx dockerfile/nginx name=my-nginx 3 +myNginx nginx name=my-nginx 3 ``` We did not start any services, hence there are none listed. But we see three replicas displayed properly. @@ -239,8 +239,8 @@ You can already play with resizing the replicas with: $ cluster/kubectl.sh resize rc my-nginx --replicas=2 $ cluster/kubectl.sh get pods NAME IMAGE(S) HOST LABELS STATUS -7813c8bd-3ffe-11e4-9036-0800279696e1 dockerfile/nginx 10.245.2.2/10.245.2.2 name=myNginx Running -78140853-3ffe-11e4-9036-0800279696e1 dockerfile/nginx 10.245.2.3/10.245.2.3 name=myNginx Running +7813c8bd-3ffe-11e4-9036-0800279696e1 nginx 10.245.2.2/10.245.2.2 name=myNginx Running +78140853-3ffe-11e4-9036-0800279696e1 nginx 10.245.2.3/10.245.2.3 name=myNginx Running ``` Congratulations! -- cgit v1.2.3 From cabc8404af6b7f84cefb286d161d3ea6204eb7b0 Mon Sep 17 00:00:00 2001 From: Brian Grant Date: Thu, 16 Apr 2015 21:41:07 +0000 Subject: Update docs. Add design principles. Fixes #6133. Fixes #4182. # *** ERROR: *** docs are out of sync between cli and markdown # run hack/run-gendocs.sh > docs/kubectl.md to regenerate # # Your commit will be aborted unless you regenerate docs. COMMIT_BLOCKED_ON_GENDOCS --- README.md | 17 +++++++++++++++++ architecture.md | 44 ++++++++++++++++++++++++++++++++++++++++++++ principles.md | 55 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 116 insertions(+) create mode 100644 README.md create mode 100644 architecture.md create mode 100644 principles.md diff --git a/README.md b/README.md new file mode 100644 index 00000000..cda831a4 --- /dev/null +++ b/README.md @@ -0,0 +1,17 @@ +# Kubernetes Design Overview + +Kubernetes is a system for managing containerized applications across multiple hosts, providing basic mechanisms for deployment, maintenance, and scaling of applications. + +Kubernetes establishes robust declarative primitives for maintaining the desired state requested by the user. We see these primitives as the main value added by Kubernetes. Self-healing mechanisms, such as auto-restarting, re-scheduling, and replicating containers require active controllers, not just imperative orchestration. + +Kubernetes is primarily targeted at applications composed of multiple containers, such as elastic, distributed micro-services. It is also designed to facilitate migration of non-containerized application stacks to Kubernetes. It therefore includes abstractions for grouping containers in both loosely coupled and tightly coupled formations, and provides ways for containers to find and communicate with each other in relatively familiar ways. + +Kubernetes enables users to ask a cluster to run a set of containers. The system automatically chooses hosts to run those containers on. While Kubernetes's scheduler is currently very simple, we expect it to grow in sophistication over time. Scheduling is a policy-rich, topology-aware, workload-specific function that significantly impacts availability, performance, and capacity. The scheduler needs to take into account individual and collective resource requirements, quality of service requirements, hardware/software/policy constraints, affinity and anti-affinity specifications, data locality, inter-workload interference, deadlines, and so on. Workload-specific requirements will be exposed through the API as necessary. + +Kubernetes is intended to run on a number of cloud providers, as well as on physical hosts. + +A single Kubernetes cluster is not intended to span multiple availability zones. Instead, we recommend building a higher-level layer to replicate complete deployments of highly available applications across multiple zones (see [the availability doc](../availability.md) and [cluster federation proposal](../proposals/federation.md) for more details). + +Finally, Kubernetes aspires to be an extensible, pluggable, building-block OSS platform and toolkit. Therefore, architecturally, we want Kubernetes to be built as a collection of pluggable components and layers, with the ability to use alternative schedulers, controllers, storage systems, and distribution mechanisms, and we're evolving its current code in that direction. Furthermore, we want others to be able to extend Kubernetes functionality, such as with higher-level PaaS functionality or multi-cluster layers, without modification of core Kubernetes source. Therefore, its API isn't just (or even necessarily mainly) targeted at end users, but at tool and extension developers. Its APIs are intended to serve as the foundation for an open ecosystem of tools, automation systems, and higher-level API layers. Consequently, there are no "internal" inter-component APIs. All APIs are visible and available, including the APIs used by the scheduler, the node controller, the replication-controller manager, Kubelet's API, etc. There's no glass to break -- in order to handle more complex use cases, one can just access the lower-level APIs in a fully transparent, composable manner. + +For more about the Kubernetes architecture, see [architecture](architecture.md). diff --git a/architecture.md b/architecture.md new file mode 100644 index 00000000..06a0a0ef --- /dev/null +++ b/architecture.md @@ -0,0 +1,44 @@ +# Kubernetes architecture + +A running Kubernetes cluster contains node agents (kubelet) and master components (APIs, scheduler, etc), on top of a distributed storage solution. This diagram shows our desired eventual state, though we're still working on a few things, like making kubelet itself (all our components, really) run within containers, and making the scheduler 100% pluggable. + +![Architecture Diagram](../architecture.png?raw=true "Architecture overview") + +## The Kubernetes Node + +When looking at the architecture of the system, we'll break it down to services that run on the worker node and services that compose the cluster-level control plane. + +The Kubernetes node has the services necessary to run application containers and be managed from the master systems. + +Each node runs Docker, of course. Docker takes care of the details of downloading images and running containers. + +### Kubelet +The **Kubelet** manages [pods](../pods.md) and their containers, their images, their volumes, etc. + +### Kube-Proxy + +Each node also runs a simple network proxy and load balancer (see the [services FAQ](https://github.com/GoogleCloudPlatform/kubernetes/wiki/Services-FAQ) for more details). This reflects `services` (see [the services doc](../docs/services.md) for more details) as defined in the Kubernetes API on each node and can do simple TCP and UDP stream forwarding (round robin) across a set of backends. + +Service endpoints are currently found via [DNS](../dns.md) or through environment variables (both [Docker-links-compatible](https://docs.docker.com/userguide/dockerlinks/) and Kubernetes {FOO}_SERVICE_HOST and {FOO}_SERVICE_PORT variables are supported). These variables resolve to ports managed by the service proxy. + +## The Kubernetes Control Plane + +The Kubernetes control plane is split into a set of components. Currently they all run on a single _master_ node, but that is expected to change soon in order to support high-availability clusters. These components work together to provide a unified view of the cluster. + +### etcd + +All persistent master state is stored in an instance of `etcd`. This provides a great way to store configuration data reliably. With `watch` support, coordinating components can be notified very quickly of changes. + +### Kubernetes API Server + +The apiserver serves up the [Kubernetes API](../api.md). It is intended to be a CRUD-y server, with most/all business logic implemented in separate components or in plug-ins. It mainly processes REST operations, validates them, and updates the corresponding objects in `etcd` (and eventually other stores). + +### Scheduler + +The scheduler binds unscheduled pods to nodes via the `/binding` API. The scheduler is pluggable, and we expect to support multiple cluster schedulers and even user-provided schedulers in the future. + +### Kubernetes Controller Manager Server + +All other cluster-level functions are currently performed by the Controller Manager. For instance, `Endpoints` objects are created and updated by the endpoints controller, and nodes are discovered, managed, and monitored by the node controller. These could eventually be split into separate components to make them independently pluggable. + +The [`replicationController`](../replication-controller.md) is a mechanism that is layered on top of the simple [`pod`](../pods.md) API. We eventually plan to port it to a generic plug-in mechanism, once one is implemented. diff --git a/principles.md b/principles.md new file mode 100644 index 00000000..499b540b --- /dev/null +++ b/principles.md @@ -0,0 +1,55 @@ +# Design Principles + +Principles to follow when extending Kubernetes. + +## API + +See also the [API conventions](../api-conventions.md). + +* All APIs should be declarative. +* API objects should be complementary and composable, not opaque wrappers. +* The control plane should be transparent -- there are no hidden internal APIs. +* The cost of API operations should be proportional to the number of objects intentionally operated upon. Therefore, common filtered lookups must be indexed. Beware of patterns of multiple API calls that would incur quadratic behavior. +* Object status must be 100% reconstructable by observation. Any history kept must be just an optimization and not required for correct operation. +* Cluster-wide invariants are difficult to enforce correctly. Try not to add them. If you must have them, don't enforce them atomically in master components, that is contention-prone and doesn't provide a recovery path in the case of a bug allowing the invariant to be violated. Instead, provide a series of checks to reduce the probability of a violation, and make every component involved able to recover from an invariant violation. +* Low-level APIs should be designed for control by higher-level systems. Higher-level APIs should be intent-oriented (think SLOs) rather than implementation-oriented (think control knobs). + +## Control logic + +* Functionality must be *level-based*, meaning the system must operate correctly given the desired state and the current/observed state, regardless of how many intermediate state updates may have been missed. Edge-triggered behavior must be just an optimization. +* Assume an open world: continually verify assumptions and gracefully adapt to external events and/or actors. Example: we allow users to kill pods under control of a replication controller; it just replaces them. +* Do not define comprehensive state machines for objects with behaviors associated with state transitions and/or "assumed" states that cannot be ascertained by observation. +* Don't assume a component's decisions will not be overridden or rejected, nor for the component to always understand why. For example, etcd may reject writes. Kubelet may reject pods. The scheduler may not be able to schedule pods. Retry, but back off and/or make alternative decisions. +* Components should be self-healing. For example, if you must keep some state (e.g., cache) the content needs to be periodically refreshed, so that if an item does get erroneously stored or a deletion event is missed etc, it will be soon fixed, ideally on timescales that are shorter than what will attract attention from humans. +* Component behavior should degrade gracefully. Prioritize actions so that the most important activities can continue to function even when overloaded and/or in states of partial failure. + +## Architecture + +* Only the apiserver should communicate with etcd/store, and not other components (scheduler, kubelet, etc.). +* Compromising a single node shouldn't compromise the cluster. +* Components should continue to do what they were last told in the absence of new instructions (e.g., due to network partition or component outage). +* All components should keep all relevant state in memory all the time. The apiserver should write through to etcd/store, other components should write through to the apiserver, and they should watch for updates made by other clients. +* Watch is preferred over polling. + +## Extensibility + +TODO: pluggability + +## Bootstrapping + +* [Self-hosting](https://github.com/GoogleCloudPlatform/kubernetes/issues/246) of all components is a goal. +* Minimize the number of dependencies, particularly those required for steady-state operation. +* Stratify the dependencies that remain via principled layering. +* Break any circular dependencies by converting hard dependencies to soft dependencies. + * Also accept that data from other components from another source, such as local files, which can then be manually populated at bootstrap time and then continuously updated once those other components are available. + * State should be rediscoverable and/or reconstructable. + * Make it easy to run temporary, bootstrap instances of all components in order to create the runtime state needed to run the components in the steady state; use a lock (master election for distributed components, file lock for local components like Kubelet) to coordinate handoff. We call this technique "pivoting". + * Have a solution to restart dead components. For distributed components, replication works well. For local components such as Kubelet, a process manager or even a simple shell loop works. + +## Availability + +TODO + +## General principles + +* [Eric Raymond's 17 UNIX rules](https://en.wikipedia.org/wiki/Unix_philosophy#Eric_Raymond.E2.80.99s_17_Unix_Rules) -- cgit v1.2.3 From 77e469b2870d2fda54fa2555d63edf5965cc26b8 Mon Sep 17 00:00:00 2001 From: Matt Bogosian Date: Wed, 15 Apr 2015 16:07:50 -0700 Subject: Fix #2741. Add support for alternate Vagrant providers: VMWare Fusion, VMWare Workstation, and Parallels. --- developer-guides/vagrant.md | 115 +++++++++++++++++++++++++------------------- 1 file changed, 66 insertions(+), 49 deletions(-) diff --git a/developer-guides/vagrant.md b/developer-guides/vagrant.md index ab0ef274..baf40b97 100644 --- a/developer-guides/vagrant.md +++ b/developer-guides/vagrant.md @@ -4,42 +4,54 @@ Running kubernetes with Vagrant (and VirtualBox) is an easy way to run/test/deve ### Prerequisites 1. Install latest version >= 1.6.2 of vagrant from http://www.vagrantup.com/downloads.html -2. Install latest version of Virtual Box from https://www.virtualbox.org/wiki/Downloads +2. Install one of: + 1. The latest version of Virtual Box from https://www.virtualbox.org/wiki/Downloads + 2. [VMWare Fusion](https://www.vmware.com/products/fusion/) version 5 or greater as well as the appropriate [Vagrant VMWare Fusion provider](https://www.vagrantup.com/vmware) + 3. [VMWare Workstation](https://www.vmware.com/products/workstation/) version 9 or greater as well as the [Vagrant VMWare Workstation provider](https://www.vagrantup.com/vmware) + 4. [Parallels Desktop](https://www.parallels.com/products/desktop/) version 9 or greater as well as the [Vagrant Parallels provider](https://parallels.github.io/vagrant-parallels/) 3. Get or build a [binary release](../../getting-started-guides/binary_release.md) ### Setup By default, the Vagrant setup will create a single kubernetes-master and 1 kubernetes-minion. Each VM will take 1 GB, so make sure you have at least 2GB to 4GB of free memory (plus appropriate free disk space). To start your local cluster, open a shell and run: -``` +```sh cd kubernetes export KUBERNETES_PROVIDER=vagrant -cluster/kube-up.sh +./cluster/kube-up.sh ``` The `KUBERNETES_PROVIDER` environment variable tells all of the various cluster management scripts which variant to use. If you forget to set this, the assumption is you are running on Google Compute Engine. +If you installed more than one Vagrant provider, Kubernetes will usually pick the appropriate one. However, you can override which one Kubernetes will use by setting the [`VAGRANT_DEFAULT_PROVIDER`](https://docs.vagrantup.com/v2/providers/default.html) environment variable: + +```sh +export VAGRANT_DEFAULT_PROVIDER=parallels +export KUBERNETES_PROVIDER=vagrant +./cluster/kube-up.sh +``` + Vagrant will provision each machine in the cluster with all the necessary components to run Kubernetes. The initial setup can take a few minutes to complete on each machine. By default, each VM in the cluster is running Fedora, and all of the Kubernetes services are installed into systemd. To access the master or any minion: -``` +```sh vagrant ssh master vagrant ssh minion-1 ``` If you are running more than one minion, you can access the others by: -``` +```sh vagrant ssh minion-2 vagrant ssh minion-3 ``` To view the service status and/or logs on the kubernetes-master: -``` +```sh vagrant ssh master [vagrant@kubernetes-master ~] $ sudo systemctl status kube-apiserver [vagrant@kubernetes-master ~] $ sudo journalctl -r -u kube-apiserver @@ -52,7 +64,7 @@ vagrant ssh master ``` To view the services on any of the kubernetes-minion(s): -``` +```sh vagrant ssh minion-1 [vagrant@kubernetes-minion-1] $ sudo systemctl status docker [vagrant@kubernetes-minion-1] $ sudo journalctl -r -u docker @@ -65,18 +77,18 @@ vagrant ssh minion-1 With your Kubernetes cluster up, you can manage the nodes in your cluster with the regular Vagrant commands. To push updates to new Kubernetes code after making source changes: -``` -cluster/kube-push.sh +```sh +./cluster/kube-push.sh ``` To stop and then restart the cluster: -``` +```sh vagrant halt -cluster/kube-up.sh +./cluster/kube-up.sh ``` To destroy the cluster: -``` +```sh vagrant destroy ``` @@ -84,14 +96,13 @@ Once your Vagrant machines are up and provisioned, the first thing to do is to c You may need to build the binaries first, you can do this with ```make``` -``` +```sh $ ./cluster/kubectl.sh get minions NAME LABELS 10.245.1.4 10.245.1.5 10.245.1.3 - ``` ### Interacting with your Kubernetes cluster with the `kube-*` scripts. @@ -100,39 +111,39 @@ Alternatively to using the vagrant commands, you can also use the `cluster/kube- All of these commands assume you have set `KUBERNETES_PROVIDER` appropriately: -``` +```sh export KUBERNETES_PROVIDER=vagrant ``` Bring up a vagrant cluster -``` -cluster/kube-up.sh +```sh +./cluster/kube-up.sh ``` Destroy the vagrant cluster -``` -cluster/kube-down.sh +```sh +./cluster/kube-down.sh ``` Update the vagrant cluster after you make changes (only works when building your own releases locally): -``` -cluster/kube-push.sh +```sh +./cluster/kube-push.sh ``` Interact with the cluster -``` -cluster/kubectl.sh +```sh +./cluster/kubectl.sh ``` ### Authenticating with your master When using the vagrant provider in Kubernetes, the `cluster/kubectl.sh` script will cache your credentials in a `~/.kubernetes_vagrant_auth` file so you will not be prompted for them in the future. -``` +```sh cat ~/.kubernetes_vagrant_auth { "User": "vagrant", "Password": "vagrant" @@ -144,22 +155,21 @@ cat ~/.kubernetes_vagrant_auth You should now be set to use the `cluster/kubectl.sh` script. For example try to list the minions that you have started with: -``` -cluster/kubectl.sh get minions +```sh +./cluster/kubectl.sh get minions ``` ### Running containers Your cluster is running, you can list the minions in your cluster: -``` -$ cluster/kubectl.sh get minions +```sh +$ ./cluster/kubectl.sh get minions NAME LABELS 10.245.2.4 10.245.2.3 10.245.2.2 - ``` Now start running some containers! @@ -196,7 +206,7 @@ NAME IMAGE(S) HOST You need to wait for the provisioning to complete, you can monitor the minions by doing: -``` +```sh $ sudo salt '*minion-1' cmd.run 'docker images' kubernetes-minion-1: REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE @@ -206,7 +216,7 @@ kubernetes-minion-1: Once the docker image for nginx has been downloaded, the container will start and you can list it: -``` +```sh $ sudo salt '*minion-1' cmd.run 'docker ps' kubernetes-minion-1: CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES @@ -235,9 +245,9 @@ We did not start any services, hence there are none listed. But we see three rep Check the [guestbook](../../examples/guestbook/README.md) application to learn how to create a service. You can already play with resizing the replicas with: -``` -$ cluster/kubectl.sh resize rc my-nginx --replicas=2 -$ cluster/kubectl.sh get pods +```sh +$ ./cluster/kubectl.sh resize rc my-nginx --replicas=2 +$ ./cluster/kubectl.sh get pods NAME IMAGE(S) HOST LABELS STATUS 7813c8bd-3ffe-11e4-9036-0800279696e1 nginx 10.245.2.2/10.245.2.2 name=myNginx Running 78140853-3ffe-11e4-9036-0800279696e1 nginx 10.245.2.3/10.245.2.3 name=myNginx Running @@ -247,9 +257,9 @@ Congratulations! ### Testing -The following will run all of the end-to-end testing scenarios assuming you set your environment in cluster/kube-env.sh +The following will run all of the end-to-end testing scenarios assuming you set your environment in `cluster/kube-env.sh`: -``` +```sh NUM_MINIONS=3 hack/e2e-test.sh ``` @@ -257,26 +267,26 @@ NUM_MINIONS=3 hack/e2e-test.sh #### I keep downloading the same (large) box all the time! -By default the Vagrantfile will download the box from S3. You can change this (and cache the box locally) by providing an alternate URL when calling `kube-up.sh` +By default the Vagrantfile will download the box from S3. You can change this (and cache the box locally) by providing a name and an alternate URL when calling `kube-up.sh` -```bash +```sh +export KUBERNETES_BOX_NAME=choose_your_own_name_for_your_kuber_box export KUBERNETES_BOX_URL=path_of_your_kuber_box export KUBERNETES_PROVIDER=vagrant -cluster/kube-up.sh +./cluster/kube-up.sh ``` - #### I just created the cluster, but I am getting authorization errors! You probably have an incorrect ~/.kubernetes_vagrant_auth file for the cluster you are attempting to contact. -``` +```sh rm ~/.kubernetes_vagrant_auth ``` After using kubectl.sh make sure that the correct credentials are set: -``` +```sh cat ~/.kubernetes_vagrant_auth { "User": "vagrant", @@ -284,35 +294,42 @@ cat ~/.kubernetes_vagrant_auth } ``` -#### I just created the cluster, but I do not see my container running ! +#### I just created the cluster, but I do not see my container running! If this is your first time creating the cluster, the kubelet on each minion schedules a number of docker pull requests to fetch prerequisite images. This can take some time and as a result may delay your initial pod getting provisioned. -#### I changed Kubernetes code, but it's not running ! +#### I changed Kubernetes code, but it's not running! Are you sure there was no build error? After running `$ vagrant provision`, scroll up and ensure that each Salt state was completed successfully on each box in the cluster. It's very likely you see a build error due to an error in your source files! -#### I have brought Vagrant up but the minions won't validate ! +#### I have brought Vagrant up but the minions won't validate! Are you sure you built a release first? Did you install `net-tools`? For more clues, login to one of the minions (`vagrant ssh minion-1`) and inspect the salt minion log (`sudo cat /var/log/salt/minion`). -#### I want to change the number of minions ! +#### I want to change the number of minions! You can control the number of minions that are instantiated via the environment variable `NUM_MINIONS` on your host machine. If you plan to work with replicas, we strongly encourage you to work with enough minions to satisfy your largest intended replica size. If you do not plan to work with replicas, you can save some system resources by running with a single minion. You do this, by setting `NUM_MINIONS` to 1 like so: -``` +```sh export NUM_MINIONS=1 ``` -#### I want my VMs to have more memory ! +#### I want my VMs to have more memory! You can control the memory allotted to virtual machines with the `KUBERNETES_MEMORY` environment variable. Just set it to the number of megabytes you would like the machines to have. For example: -``` +```sh export KUBERNETES_MEMORY=2048 ``` +If you need more granular control, you can set the amount of memory for the master and minions independently. For example: + +```sh +export KUBERNETES_MASTER_MEMORY=1536 +export KUBERNETES_MASTER_MINION=2048 +``` + #### I ran vagrant suspend and nothing works! ```vagrant suspend``` seems to mess up the network. It's not supported at this time. -- cgit v1.2.3 From 9b39e755ee66af95b9be368805b06316b93ddf5f Mon Sep 17 00:00:00 2001 From: Robert Rati Date: Fri, 17 Apr 2015 14:28:25 -0400 Subject: Proposal for High Availability of Daemons #6993 --- high-availability.md | 34 ++++++++++++++++++++++++++++++++++ 1 file changed, 34 insertions(+) create mode 100644 high-availability.md diff --git a/high-availability.md b/high-availability.md new file mode 100644 index 00000000..afd12a9f --- /dev/null +++ b/high-availability.md @@ -0,0 +1,34 @@ +# High Availability of Daemons in Kubernetes +This document serves as a proposal for high availability of the master daemons in kubernetes. + +## Design Options +1. Hot Standby Daemons: In this scenario, data and state are shared between the two deamons such that an immediate failure in one daemon causes the the standby deamon to take over exactly where the failed daemon had left off. This would be an ideal solution for kubernetes, however it poses a series of challenges in the case of controllers where daemon-state is cached locally and not persisted in a transactional way to a storage facility. As a result, we are **NOT** planning on this approach. + +2. **Cold Standby Daemons**: In this scenario there is only one active daemon acting as the master and additional daemons in a standby mode. Data and state are not shared between the active and standby daemons, so when a failure occurs the standby daemon that becomes the master must determine the current state of the system before resuming functionality. + +3. Stateless load-balanced Daemons: Stateless daemons, such as the apiserver, can simply load-balance across any number of servers that are currently running. Their general availability can be continuously updated, or published, such that load balancing only occurs across active participants. This aspect of HA is outside of the scope of *this* proposal because there is already a partial implementation in the apiserver. + + +## Design Discussion Notes on Leader Election +For a very simple example of proposed behavior see: +* https://github.com/rrati/etcd-ha +* go get github.com/rrati/etcd-ha + +In HA, the apiserver will be a gateway to etcd. It will provide an api for becoming master, updating the master lease, and releasing the lease. This api is daemon agnostic, so to become the master the client will need to provide the daemon type and the lease duration when attemping to become master. The apiserver will attempt to create a key in etcd based on the daemon type that contains the client's hostname/ip and port information. This key will be created with a ttl from the lease duration provided in the request. Failure to create this key means there is already a master of that daemon type, and the error from etcd will propigate to the client. Successfully creating the key means the client making the request is the master. When updating the lease, the apiserver will update the existing key with a new ttl. The location in etcd for the HA keys is TBD. + +Leader election is first come, first serve. The first daemon of a specific type to request leadership will become the master. All other daemons of that type will fail until the current leader releases the lease or fails to update the lease within the expiration time. On startup, all daemons should attempt to become master. The daemon that succeeds is the master and should perform all functions of that daemon. The daemons that fail to become the master should not perform any tasks and sleep for their lease duration and then attempt to become the master again. + +The daemon that becomes master should create a Go routine to manage the lease. This process should be created with a channel that the main daemon process can use to release the master lease. Otherwise, this process will update the lease and sleep, waiting for the next update time or notification to release the lease. If there is a failure to update the lease, this process should force the entire daemon to exit. Daemon exit is meant to prevent potential split-brain conditions. Daemon restart is implied in this scenario, by either the init system (systemd), or possible watchdog processes. (See Design Discussion Notes) + +## Options added to daemons with HA functionality +Some command line options would be added to daemons that can do HA: + +* Lease Duration - How long a daemon can be master + +* Number of Missed Lease Updates - How many updates can be missed before the lease as the master is lost + +## Design Discussion Notes on Scheduler/Controller +Some daemons, such as the controller-manager, may fork numerous go routines to perform tasks in parallel. Trying to keep track of all these processes and shut them down cleanly is untenable. If a master daemon loses leadership then the whole daemon should exit with an exit code indicating that the daemon is not the master. The daemon should be restarted by a monitoring system, such as systemd, or a software watchdog. + +## Open Questions: +* Is there a desire to keep track of all nodes for a specific daemon type? -- cgit v1.2.3 From 8e7529849a47e5dc5f7a7b92f501a70474150e89 Mon Sep 17 00:00:00 2001 From: Robert Bailey Date: Mon, 20 Apr 2015 21:33:03 -0700 Subject: Remove old design file (that has been fully implemented). --- isolation_between_nodes_and_master.md | 48 ----------------------------------- 1 file changed, 48 deletions(-) delete mode 100644 isolation_between_nodes_and_master.md diff --git a/isolation_between_nodes_and_master.md b/isolation_between_nodes_and_master.md deleted file mode 100644 index a91927d8..00000000 --- a/isolation_between_nodes_and_master.md +++ /dev/null @@ -1,48 +0,0 @@ -# Design: Limit direct access to etcd from within Kubernetes - -All nodes have effective access of "root" on the entire Kubernetes cluster today because they have access to etcd, the central data store. The kubelet, the service proxy, and the nodes themselves have a connection to etcd that can be used to read or write any data in the system. In a cluster with many hosts, any container or user that gains the ability to write to the network device that can reach etcd, on any host, also gains that access. - -* The Kubelet and Kube Proxy currently rely on an efficient "wait for changes over HTTP" interface get their current state and avoid missing changes - * This interface is implemented by etcd as the "watch" operation on a given key containing useful data - - -## Options: - -1. Do nothing -2. Introduce an HTTP proxy that limits the ability of nodes to access etcd - 1. Prevent writes of data from the kubelet - 2. Prevent reading data not associated with the client responsibilities - 3. Introduce a security token granting access -3. Introduce an API on the apiserver that returns the data a node Kubelet and Kube Proxy needs - 1. Remove the ability of nodes to access etcd via network configuration - 2. Provide an alternate implementation for the event writing code Kubelet - 3. Implement efficient "watch for changes over HTTP" to offer comparable function with etcd - 4. Ensure that the apiserver can scale at or above the capacity of the etcd system. - 5. Implement authorization scoping for the nodes that limits the data they can view -4. Implement granular access control in etcd - 1. Authenticate HTTP clients with client certificates, tokens, or BASIC auth and authorize them for read only access - 2. Allow read access of certain subpaths based on what the requestor's tokens are - - -## Evaluation: - -Option 1 would be considered unacceptable for deployment in a multi-tenant or security conscious environment. It would be acceptable in a low security deployment where all software is trusted. It would be acceptable in proof of concept environments on a single machine. - -Option 2 would require implementing an http proxy that for 2-1 could block POST/PUT/DELETE requests (and potentially HTTP method tunneling parameters accepted by etcd). 2-2 would be more complicated and would require filtering operations based on deep understanding of the etcd API *and* the underlying schema. It would be possible, but involve extra software. - -Option 3 would involve extending the existing apiserver to return pods associated with a given node over an HTTP "watch for changes" mechanism, which is already implemented. Proper security would involve checking that the caller is authorized to access that data - one imagines a per node token, key, or SSL certificate that could be used to authenticate and then authorize access to only the data belonging to that node. The current event publishing mechanism from the kubelet would also need to be replaced with a secure API endpoint or a change to a polling model. The apiserver would also need to be able to function in a horizontally scalable mode by changing or fixing the "operations" queue to work in a stateless, scalable model. In practice, the amount of traffic even a large Kubernetes deployment would drive towards an apiserver would be tens of requests per second (500 hosts, 1 request per host every minute) which is negligible if well implemented. Implementing this would also decouple the data store schema from the nodes, allowing a different data store technology to be added in the future without affecting existing nodes. This would also expose that data to other consumers for their own purposes (monitoring, implementing service discovery). - -Option 4 would involve extending etcd to [support access control](https://github.com/coreos/etcd/issues/91). Administrators would need to authorize nodes to connect to etcd, and expose network routability directly to etcd. The mechanism for handling this authentication and authorization would be different than the authorization used by Kubernetes controllers and API clients. It would not be possible to completely replace etcd as a data store without also implementing a new Kubelet config endpoint. - - -## Preferred solution: - -Implement the first parts of option 3 - an efficient watch API for the pod, service, and endpoints data for the Kubelet and Kube Proxy. Authorization and authentication are planned in the future - when a solution is available, implement a custom authorization scope that allows API access to be restricted to only the data about a single node or the service endpoint data. - -In general, option 4 is desirable in addition to option 3 as a mechanism to further secure the store to infrastructure components that must access it. - - -## Caveats - -In all four options, compromise of a host will allow an attacker to imitate that host. For attack vectors that are reproducible from inside containers (privilege escalation), an attacker can distribute himself to other hosts by requesting new containers be spun up. In scenario 1, the cluster is totally compromised immediately. In 2-1, the attacker can view all information about the cluster including keys or authorization data defined with pods. In 2-2 and 3, the attacker must still distribute himself in order to get access to a large subset of information, and cannot see other data that is potentially located in etcd like side storage or system configuration. For attack vectors that are not exploits, but instead allow network access to etcd, an attacker in 2ii has no ability to spread his influence, and is instead restricted to the subset of information on the host. For 3-5, they can do nothing they could not do already (request access to the nodes / services endpoint) because the token is not visible to them on the host. - -- cgit v1.2.3 From 295ae03fcbeca3bdb94617de2fc020cc388e297f Mon Sep 17 00:00:00 2001 From: Robert Rati Date: Tue, 21 Apr 2015 16:58:45 -0400 Subject: Updated HA proposal based upon comments. #6993 --- high-availability.md | 38 +++++++++++++++++++------------------- 1 file changed, 19 insertions(+), 19 deletions(-) diff --git a/high-availability.md b/high-availability.md index afd12a9f..9a2367bd 100644 --- a/high-availability.md +++ b/high-availability.md @@ -1,34 +1,34 @@ -# High Availability of Daemons in Kubernetes -This document serves as a proposal for high availability of the master daemons in kubernetes. +# High Availability of Scheduling and Controller Components in Kubernetes +This document serves as a proposal for high availability of the scheduler and controller components in kubernetes. This proposal is intended to provide a simple High Availability api for kubernertes components only. Extensibility beyond that scope will be subject to other constraints. ## Design Options -1. Hot Standby Daemons: In this scenario, data and state are shared between the two deamons such that an immediate failure in one daemon causes the the standby deamon to take over exactly where the failed daemon had left off. This would be an ideal solution for kubernetes, however it poses a series of challenges in the case of controllers where daemon-state is cached locally and not persisted in a transactional way to a storage facility. As a result, we are **NOT** planning on this approach. +For complete reference see [this](https://www.ibm.com/developerworks/community/blogs/RohitShetty/entry/high_availability_cold_warm_hot?lang=en) -2. **Cold Standby Daemons**: In this scenario there is only one active daemon acting as the master and additional daemons in a standby mode. Data and state are not shared between the active and standby daemons, so when a failure occurs the standby daemon that becomes the master must determine the current state of the system before resuming functionality. +1. Hot Standby: In this scenario, data and state are shared between the two components such that an immediate failure in one component causes the the standby deamon to take over exactly where the failed component had left off. This would be an ideal solution for kubernetes, however it poses a series of challenges in the case of controllers where component-state is cached locally and not persisted in a transactional way to a storage facility. This would also introduce additional load on the apiserver, which is not desireable. As a result, we are **NOT** planning on this approach at this time. -3. Stateless load-balanced Daemons: Stateless daemons, such as the apiserver, can simply load-balance across any number of servers that are currently running. Their general availability can be continuously updated, or published, such that load balancing only occurs across active participants. This aspect of HA is outside of the scope of *this* proposal because there is already a partial implementation in the apiserver. +2. **Warm Standby**: In this scenario there is only one active component acting as the master and additional components running by not providing service or responding to requests. Data and state are not shared between the active and standby components. When a failure occurs, the standby component that becomes the master must determine the current state of the system before resuming functionality. +3. Active-Active (Load Balanced): Components, such as the apiserver, can simply load-balance across any number of servers that are currently running. Their general availability can be continuously updated, or published, such that load balancing only occurs across active participants. This aspect of HA is outside of the scope of *this* proposal because there is already a partial implementation in the apiserver. ## Design Discussion Notes on Leader Election -For a very simple example of proposed behavior see: -* https://github.com/rrati/etcd-ha -* go get github.com/rrati/etcd-ha +Implementation References: +* [zookeeper](http://zookeeper.apache.org/doc/trunk/recipes.html#sc_leaderElection) +* [etcd](https://groups.google.com/forum/#!topic/etcd-dev/EbAa4fjypb4) +* [initialPOC](https://github.com/rrati/etcd-ha) -In HA, the apiserver will be a gateway to etcd. It will provide an api for becoming master, updating the master lease, and releasing the lease. This api is daemon agnostic, so to become the master the client will need to provide the daemon type and the lease duration when attemping to become master. The apiserver will attempt to create a key in etcd based on the daemon type that contains the client's hostname/ip and port information. This key will be created with a ttl from the lease duration provided in the request. Failure to create this key means there is already a master of that daemon type, and the error from etcd will propigate to the client. Successfully creating the key means the client making the request is the master. When updating the lease, the apiserver will update the existing key with a new ttl. The location in etcd for the HA keys is TBD. +In HA, the apiserver will provide an api for sets of replicated clients to do master election: become master, update the lease, and release the lease. This api is component agnostic, so a client will need to provide the component type and the lease duration when attemping to become master. The lease duration should be tuned per component. The apiserver will attempt to create a key in etcd based on the component type that contains the client's hostname/ip and port information. This key will be created with a ttl from the lease duration provided in the request. Failure to create this key means there is already a master of that component type, and the error from etcd will propigate to the client. Successfully creating the key means the client making the request is the master. When updating the lease, the apiserver will update the existing key with a new ttl. The location in etcd for the HA keys is TBD. -Leader election is first come, first serve. The first daemon of a specific type to request leadership will become the master. All other daemons of that type will fail until the current leader releases the lease or fails to update the lease within the expiration time. On startup, all daemons should attempt to become master. The daemon that succeeds is the master and should perform all functions of that daemon. The daemons that fail to become the master should not perform any tasks and sleep for their lease duration and then attempt to become the master again. +The first component to request leadership will become the master. All other components of that type will fail until the current leader releases the lease, or fails to update the lease within the expiration time. On startup, all components should attempt to become master. The component that succeeds becomes the master, and should perform all functions of that component. The components that fail to become the master should not perform any tasks and sleep for their lease duration and then attempt to become the master again. A clean shutdown of the leader will cause a release of the lease and a new master will be elected. -The daemon that becomes master should create a Go routine to manage the lease. This process should be created with a channel that the main daemon process can use to release the master lease. Otherwise, this process will update the lease and sleep, waiting for the next update time or notification to release the lease. If there is a failure to update the lease, this process should force the entire daemon to exit. Daemon exit is meant to prevent potential split-brain conditions. Daemon restart is implied in this scenario, by either the init system (systemd), or possible watchdog processes. (See Design Discussion Notes) +The component that becomes master should create a thread to manage the lease. This thread should be created with a channel that the main process can use to release the master lease. The master should release the lease in cases of an unrecoverable error and clean shutdown. Otherwise, this process will update the lease and sleep, waiting for the next update time or notification to release the lease. If there is a failure to update the lease, this process should force the entire component to exit. Daemon exit is meant to prevent potential split-brain conditions. Daemon restart is implied in this scenario, by either the init system (systemd), or possible watchdog processes. (See Design Discussion Notes) -## Options added to daemons with HA functionality -Some command line options would be added to daemons that can do HA: +## Options added to components with HA functionality +Some command line options would be added to components that can do HA: -* Lease Duration - How long a daemon can be master +* Lease Duration - How long a component can be master -* Number of Missed Lease Updates - How many updates can be missed before the lease as the master is lost - -## Design Discussion Notes on Scheduler/Controller -Some daemons, such as the controller-manager, may fork numerous go routines to perform tasks in parallel. Trying to keep track of all these processes and shut them down cleanly is untenable. If a master daemon loses leadership then the whole daemon should exit with an exit code indicating that the daemon is not the master. The daemon should be restarted by a monitoring system, such as systemd, or a software watchdog. +## Design Discussion Notes +Some components may run numerous threads in order to perform tasks in parallel. Upon losing master status, such components should exit instantly instead of attempting to gracefully shut down such threads. This is to ensure that, in the case there's some propagation delay in informing the threads they should stop, the lame-duck threads won't interfere with the new master. The component should exit with an exit code indicating that the component is not the master. Since all components will be run by systemd or some other monitoring system, this will just result in a restart. ## Open Questions: -* Is there a desire to keep track of all nodes for a specific daemon type? +* Is there a desire to keep track of all nodes for a specific component type? -- cgit v1.2.3 From 747bb0de5dd4f1fdb64962aebeb89c283c79e81b Mon Sep 17 00:00:00 2001 From: caesarxuchao Date: Tue, 21 Apr 2015 21:22:28 -0700 Subject: fix the link to services.md --- architecture.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/architecture.md b/architecture.md index 06a0a0ef..3f021aaf 100644 --- a/architecture.md +++ b/architecture.md @@ -17,7 +17,7 @@ The **Kubelet** manages [pods](../pods.md) and their containers, their images, t ### Kube-Proxy -Each node also runs a simple network proxy and load balancer (see the [services FAQ](https://github.com/GoogleCloudPlatform/kubernetes/wiki/Services-FAQ) for more details). This reflects `services` (see [the services doc](../docs/services.md) for more details) as defined in the Kubernetes API on each node and can do simple TCP and UDP stream forwarding (round robin) across a set of backends. +Each node also runs a simple network proxy and load balancer (see the [services FAQ](https://github.com/GoogleCloudPlatform/kubernetes/wiki/Services-FAQ) for more details). This reflects `services` (see [the services doc](../services.md) for more details) as defined in the Kubernetes API on each node and can do simple TCP and UDP stream forwarding (round robin) across a set of backends. Service endpoints are currently found via [DNS](../dns.md) or through environment variables (both [Docker-links-compatible](https://docs.docker.com/userguide/dockerlinks/) and Kubernetes {FOO}_SERVICE_HOST and {FOO}_SERVICE_PORT variables are supported). These variables resolve to ports managed by the service proxy. -- cgit v1.2.3 From d493d1d200568026097e39e8b637523ea4ce3730 Mon Sep 17 00:00:00 2001 From: Robert Rati Date: Wed, 22 Apr 2015 08:18:17 -0400 Subject: More updates based on feedback. #6993 --- high-availability.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/high-availability.md b/high-availability.md index 9a2367bd..2bfa6dc0 100644 --- a/high-availability.md +++ b/high-availability.md @@ -1,5 +1,5 @@ # High Availability of Scheduling and Controller Components in Kubernetes -This document serves as a proposal for high availability of the scheduler and controller components in kubernetes. This proposal is intended to provide a simple High Availability api for kubernertes components only. Extensibility beyond that scope will be subject to other constraints. +This document serves as a proposal for high availability of the scheduler and controller components in kubernetes. This proposal is intended to provide a simple High Availability api for kubernetes components with the potential to extend to services running on kubernetes. Those services would be subject to their own constraints. ## Design Options For complete reference see [this](https://www.ibm.com/developerworks/community/blogs/RohitShetty/entry/high_availability_cold_warm_hot?lang=en) @@ -8,7 +8,7 @@ For complete reference see [this](https://www.ibm.com/developerworks/community/b 2. **Warm Standby**: In this scenario there is only one active component acting as the master and additional components running by not providing service or responding to requests. Data and state are not shared between the active and standby components. When a failure occurs, the standby component that becomes the master must determine the current state of the system before resuming functionality. -3. Active-Active (Load Balanced): Components, such as the apiserver, can simply load-balance across any number of servers that are currently running. Their general availability can be continuously updated, or published, such that load balancing only occurs across active participants. This aspect of HA is outside of the scope of *this* proposal because there is already a partial implementation in the apiserver. +3. Active-Active (Load Balanced): Clients can simply load-balance across any number of servers that are currently running. Their general availability can be continuously updated, or published, such that load balancing only occurs across active participants. This aspect of HA is outside of the scope of *this* proposal because there is already a partial implementation in the apiserver. ## Design Discussion Notes on Leader Election Implementation References: @@ -16,11 +16,11 @@ Implementation References: * [etcd](https://groups.google.com/forum/#!topic/etcd-dev/EbAa4fjypb4) * [initialPOC](https://github.com/rrati/etcd-ha) -In HA, the apiserver will provide an api for sets of replicated clients to do master election: become master, update the lease, and release the lease. This api is component agnostic, so a client will need to provide the component type and the lease duration when attemping to become master. The lease duration should be tuned per component. The apiserver will attempt to create a key in etcd based on the component type that contains the client's hostname/ip and port information. This key will be created with a ttl from the lease duration provided in the request. Failure to create this key means there is already a master of that component type, and the error from etcd will propigate to the client. Successfully creating the key means the client making the request is the master. When updating the lease, the apiserver will update the existing key with a new ttl. The location in etcd for the HA keys is TBD. +In HA, the apiserver will provide an api for sets of replicated clients to do master election: acquire the lease, renew the lease, and release the lease. This api is component agnostic, so a client will need to provide the component type and the lease duration when attemping to become master. The lease duration should be tuned per component. The apiserver will attempt to create a key in etcd based on the component type that contains the client's hostname/ip and port information. This key will be created with a ttl from the lease duration provided in the request. Failure to create this key means there is already a master of that component type, and the error from etcd will propigate to the client. Successfully creating the key means the client making the request is the master. Only the current master can renew the lease. When renewing the lease, the apiserver will update the existing key with a new ttl. The location in etcd for the HA keys is TBD. -The first component to request leadership will become the master. All other components of that type will fail until the current leader releases the lease, or fails to update the lease within the expiration time. On startup, all components should attempt to become master. The component that succeeds becomes the master, and should perform all functions of that component. The components that fail to become the master should not perform any tasks and sleep for their lease duration and then attempt to become the master again. A clean shutdown of the leader will cause a release of the lease and a new master will be elected. +The first component to request leadership will become the master. All other components of that type will fail until the current leader releases the lease, or fails to renew the lease within the expiration time. On startup, all components should attempt to become master. The component that succeeds becomes the master, and should perform all functions of that component. The components that fail to become the master should not perform any tasks and sleep for their lease duration and then attempt to become the master again. A clean shutdown of the leader will cause a release of the lease and a new master will be elected. -The component that becomes master should create a thread to manage the lease. This thread should be created with a channel that the main process can use to release the master lease. The master should release the lease in cases of an unrecoverable error and clean shutdown. Otherwise, this process will update the lease and sleep, waiting for the next update time or notification to release the lease. If there is a failure to update the lease, this process should force the entire component to exit. Daemon exit is meant to prevent potential split-brain conditions. Daemon restart is implied in this scenario, by either the init system (systemd), or possible watchdog processes. (See Design Discussion Notes) +The component that becomes master should create a thread to manage the lease. This thread should be created with a channel that the main process can use to release the master lease. The master should release the lease in cases of an unrecoverable error and clean shutdown. Otherwise, this process will renew the lease and sleep, waiting for the next renewal time or notification to release the lease. If there is a failure to renew the lease, this process should force the entire component to exit. Daemon exit is meant to prevent potential split-brain conditions. Daemon restart is implied in this scenario, by either the init system (systemd), or possible watchdog processes. (See Design Discussion Notes) ## Options added to components with HA functionality Some command line options would be added to components that can do HA: -- cgit v1.2.3 From 2a117380292929bf29f390e90f8cbe57d3b8fd74 Mon Sep 17 00:00:00 2001 From: Brendan Burns Date: Fri, 10 Apr 2015 16:11:12 -0700 Subject: Suggest a simple rolling update. --- simple-rolling-update.md | 92 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 92 insertions(+) create mode 100644 simple-rolling-update.md diff --git a/simple-rolling-update.md b/simple-rolling-update.md new file mode 100644 index 00000000..43b086ae --- /dev/null +++ b/simple-rolling-update.md @@ -0,0 +1,92 @@ +## Simple rolling update +This is a lightweight design document for simple rolling update in ```kubectl``` + +Complete execution flow can be found [here](#execution-details). + +### Lightweight rollout +Assume that we have a current replication controller named ```foo``` and it is running image ```image:v1``` + +```kubectl rolling-update rc foo [foo-v2] --image=myimage:v2``` + +If the user doesn't specify a name for the 'next' controller, then the 'next' controller is renamed to +the name of the original controller. + +Obviously there is a race here, where if you kill the client between delete foo, and creating the new version of 'foo' you might be surprised about what is there, but I think that's ok. +See [Recovery](#recovery) below + +If the user does specify a name for the 'next' controller, then the 'next' controller is retained with its existing name, +and the old 'foo' controller is deleted. For the purposes of the rollout, we add a unique-ifying label ```kubernetes.io/deployment``` to both the ```foo``` and ```foo-next``` controllers. +The value of that label is the hash of the complete JSON representation of the```foo-next``` or```foo``` controller. The name of this label can be overridden by the user with the ```--deployment-label-key``` flag. + +#### Recovery +If a rollout fails or is terminated in the middle, it is important that the user be able to resume the roll out. +To facilitate recovery in the case of a crash of the updating process itself, we add the following annotations to each replicaController in the ```kubernetes.io/``` annotation namespace: + * ```desired-replicas``` The desired number of replicas for this controller (either N or zero) + * ```update-partner``` A pointer to the replicaiton controller resource that is the other half of this update (syntax `````` the namespace is assumed to be identical to the namespace of this replication controller.) + +Recovery is achieved by issuing the same command again: + +``` +kubectl rolling-update rc foo [foo-v2] --image=myimage:v2 +``` + +Whenever the rolling update command executes, the kubectl client looks for replication controllers called ```foo``` and ```foo-next```, if they exist, an attempt is +made to roll ```foo``` to ```foo-next```. If ```foo-next``` does not exist, then it is created, and the rollout is a new rollout. If ```foo``` doesn't exist, then +it is assumed that the rollout is nearly completed, and ```foo-next``` is renamed to ```foo```. Details of the execution flow are given below. + + +### Aborting a rollout +Abort is assumed to want to reverse a rollout in progress. + +```kubectl rolling-update rc foo [foo-v2] --abort``` + +This is really just semantic sugar for: + +```kubectl rolling-update rc foo-v2 foo``` + +With the added detail that it moves the ```desired-replicas``` annotation from ```foo-v2``` to ```foo``` + + +### Execution Details + +For the purposes of this example, assume that we are rolling from ```foo``` to ```foo-next``` where the only change is an image update from `v1` to `v2` + +If the user doesn't specify a ```foo-next``` name, then it is either discovered from the ```update-partner``` annotation on ```foo```. If that annotation doesn't exist, +then ```foo-next``` is synthesized using the pattern ```-``` + +#### Initialization + * If ```foo``` and ```foo-next``` do not exist: + * Exit, and indicate an error to the user, that the specified controller doesn't exist. + * Goto Rollout + * If ```foo``` exists, but ```foo-next``` does not: + * Create ```foo-next``` populate it with the ```v2``` image, set ```desired-replicas``` to ```foo.Spec.Replicas``` + * Goto Rollout + * If ```foo-next``` exists, but ```foo``` does not: + * Assume that we are in the rename phase. + * Goto Rename + * If both ```foo``` and ```foo-next``` exist: + * Assume that we are in a partial rollout + * If ```foo-next``` is missing the ```desired-replicas``` annotation + * Populate the ```desired-replicas``` annotation to ```foo-next``` using the current size of ```foo``` + * Goto Rollout + +#### Rollout + * While size of ```foo-next``` < ```desired-replicas``` annotation on ```foo-next``` + * increase size of ```foo-next``` + * if size of ```foo``` > 0 + decrease size of ```foo``` + * Goto Rename + +#### Rename + * delete ```foo``` + * create ```foo``` that is identical to ```foo-next``` + * delete ```foo-next``` + +#### Abort + * If ```foo-next``` doesn't exist + * Exit and indicate to the user that they may want to simply do a new rollout with the old version + * If ```foo``` doesn't exist + * Exit and indicate not found to the user + * Otherwise, ```foo-next``` and ```foo``` both exist + * Set ```desired-replicas``` annotation on ```foo``` to match the annotation on ```foo-next``` + * Goto Rollout with ```foo``` and ```foo-next``` trading places. -- cgit v1.2.3 From 812a7b9a63daccadcde843df358f7fb6ea9ccb76 Mon Sep 17 00:00:00 2001 From: Tim Hockin Date: Thu, 23 Apr 2015 16:36:27 -0700 Subject: Make docs links go through docs.k8s.io --- networking.md | 2 +- secrets.md | 6 +++--- security.md | 12 ++++++------ 3 files changed, 10 insertions(+), 10 deletions(-) diff --git a/networking.md b/networking.md index d90f56b1..d664806f 100644 --- a/networking.md +++ b/networking.md @@ -83,7 +83,7 @@ We want to be able to assign IP addresses externally from Docker ([Docker issue In addition to enabling self-registration with 3rd-party discovery mechanisms, we'd like to setup DDNS automatically ([Issue #146](https://github.com/GoogleCloudPlatform/kubernetes/issues/146)). hostname, $HOSTNAME, etc. should return a name for the pod ([Issue #298](https://github.com/GoogleCloudPlatform/kubernetes/issues/298)), and gethostbyname should be able to resolve names of other pods. Probably we need to set up a DNS resolver to do the latter ([Docker issue #2267](https://github.com/dotcloud/docker/issues/2267)), so that we don't need to keep /etc/hosts files up to date dynamically. -[Service](https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/services.md) endpoints are currently found through environment variables. Both [Docker-links-compatible](https://docs.docker.com/userguide/dockerlinks/) variables and kubernetes-specific variables ({NAME}_SERVICE_HOST and {NAME}_SERVICE_BAR) are supported, and resolve to ports opened by the service proxy. We don't actually use [the Docker ambassador pattern](https://docs.docker.com/articles/ambassador_pattern_linking/) to link containers because we don't require applications to identify all clients at configuration time, yet. While services today are managed by the service proxy, this is an implementation detail that applications should not rely on. Clients should instead use the [service portal IP](https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/services.md) (which the above environment variables will resolve to). However, a flat service namespace doesn't scale and environment variables don't permit dynamic updates, which complicates service deployment by imposing implicit ordering constraints. We intend to register each service portal IP in DNS, and for that to become the preferred resolution protocol. +[Service](http://docs.k8s.io/services.md) endpoints are currently found through environment variables. Both [Docker-links-compatible](https://docs.docker.com/userguide/dockerlinks/) variables and kubernetes-specific variables ({NAME}_SERVICE_HOST and {NAME}_SERVICE_BAR) are supported, and resolve to ports opened by the service proxy. We don't actually use [the Docker ambassador pattern](https://docs.docker.com/articles/ambassador_pattern_linking/) to link containers because we don't require applications to identify all clients at configuration time, yet. While services today are managed by the service proxy, this is an implementation detail that applications should not rely on. Clients should instead use the [service portal IP](http://docs.k8s.io/services.md) (which the above environment variables will resolve to). However, a flat service namespace doesn't scale and environment variables don't permit dynamic updates, which complicates service deployment by imposing implicit ordering constraints. We intend to register each service portal IP in DNS, and for that to become the preferred resolution protocol. We'd also like to accommodate other load-balancing solutions (e.g., HAProxy), non-load-balanced services ([Issue #260](https://github.com/GoogleCloudPlatform/kubernetes/issues/260)), and other types of groups (worker pools, etc.). Providing the ability to Watch a label selector applied to pod addresses would enable efficient monitoring of group membership, which could be directly consumed or synced with a discovery mechanism. Event hooks ([Issue #140](https://github.com/GoogleCloudPlatform/kubernetes/issues/140)) for join/leave events would probably make this even easier. diff --git a/secrets.md b/secrets.md index 965f6e90..e07f271d 100644 --- a/secrets.md +++ b/secrets.md @@ -72,7 +72,7 @@ service would also consume the secrets associated with the MySQL service. ### Use-Case: Secrets associated with service accounts -[Service Accounts](https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/design/service_accounts.md) are proposed as a +[Service Accounts](http://docs.k8s.io/design/service_accounts.md) are proposed as a mechanism to decouple capabilities and security contexts from individual human users. A `ServiceAccount` contains references to some number of secrets. A `Pod` can specify that it is associated with a `ServiceAccount`. Secrets should have a `Type` field to allow the Kubelet and @@ -236,7 +236,7 @@ memory overcommit on the node. #### Secret data on the node: isolation -Every pod will have a [security context](https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/design/security_context.md). +Every pod will have a [security context](http://docs.k8s.io/design/security_context.md). Secret data on the node should be isolated according to the security context of the container. The Kubelet volume plugin API will be changed so that a volume plugin receives the security context of a volume along with the volume spec. This will allow volume plugins to implement setting the @@ -248,7 +248,7 @@ Several proposals / upstream patches are notable as background for this proposal 1. [Docker vault proposal](https://github.com/docker/docker/issues/10310) 2. [Specification for image/container standardization based on volumes](https://github.com/docker/docker/issues/9277) -3. [Kubernetes service account proposal](https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/design/service_accounts.md) +3. [Kubernetes service account proposal](http://docs.k8s.io/design/service_accounts.md) 4. [Secrets proposal for docker (1)](https://github.com/docker/docker/pull/6075) 5. [Secrets proposal for docker (2)](https://github.com/docker/docker/pull/6697) diff --git a/security.md b/security.md index 7bdca440..b446f66c 100644 --- a/security.md +++ b/security.md @@ -63,14 +63,14 @@ Automated process users fall into the following categories: A pod runs in a *security context* under a *service account* that is defined by an administrator or project administrator, and the *secrets* a pod has access to is limited by that *service account*. -1. The API should authenticate and authorize user actions [authn and authz](https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/design/access.md) +1. The API should authenticate and authorize user actions [authn and authz](http://docs.k8s.io/design/access.md) 2. All infrastructure components (kubelets, kube-proxies, controllers, scheduler) should have an infrastructure user that they can authenticate with and be authorized to perform only the functions they require against the API. 3. Most infrastructure components should use the API as a way of exchanging data and changing the system, and only the API should have access to the underlying data store (etcd) -4. When containers run on the cluster and need to talk to other containers or the API server, they should be identified and authorized clearly as an autonomous process via a [service account](https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/design/service_accounts.md) +4. When containers run on the cluster and need to talk to other containers or the API server, they should be identified and authorized clearly as an autonomous process via a [service account](http://docs.k8s.io/design/service_accounts.md) 1. If the user who started a long-lived process is removed from access to the cluster, the process should be able to continue without interruption 2. If the user who started processes are removed from the cluster, administrators may wish to terminate their processes in bulk 3. When containers run with a service account, the user that created / triggered the service account behavior must be associated with the container's action -5. When container processes run on the cluster, they should run in a [security context](https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/design/security_context.md) that isolates those processes via Linux user security, user namespaces, and permissions. +5. When container processes run on the cluster, they should run in a [security context](http://docs.k8s.io/design/security_context.md) that isolates those processes via Linux user security, user namespaces, and permissions. 1. Administrators should be able to configure the cluster to automatically confine all container processes as a non-root, randomly assigned UID 2. Administrators should be able to ensure that container processes within the same namespace are all assigned the same unix user UID 3. Administrators should be able to limit which developers and project administrators have access to higher privilege actions @@ -79,7 +79,7 @@ A pod runs in a *security context* under a *service account* that is defined by 6. Developers may need to ensure their images work within higher security requirements specified by administrators 7. When available, Linux kernel user namespaces can be used to ensure 5.2 and 5.4 are met. 8. When application developers want to share filesytem data via distributed filesystems, the Unix user ids on those filesystems must be consistent across different container processes -6. Developers should be able to define [secrets](https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/design/secrets.md) that are automatically added to the containers when pods are run +6. Developers should be able to define [secrets](http://docs.k8s.io/design/secrets.md) that are automatically added to the containers when pods are run 1. Secrets are files injected into the container whose values should not be displayed within a pod. Examples: 1. An SSH private key for git cloning remote data 2. A client certificate for accessing a remote system @@ -93,11 +93,11 @@ A pod runs in a *security context* under a *service account* that is defined by ### Related design discussion -* Authorization and authentication https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/design/access.md +* Authorization and authentication http://docs.k8s.io/design/access.md * Secret distribution via files https://github.com/GoogleCloudPlatform/kubernetes/pull/2030 * Docker secrets https://github.com/docker/docker/pull/6697 * Docker vault https://github.com/docker/docker/issues/10310 -* Service Accounts: https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/design/service_accounts.md +* Service Accounts: http://docs.k8s.io/design/service_accounts.md * Secret volumes https://github.com/GoogleCloudPlatform/kubernetes/4126 ## Specific Design Points -- cgit v1.2.3 From 35bb6a1e9891657621ab3ae359e1b2518de98356 Mon Sep 17 00:00:00 2001 From: Tim Hockin Date: Thu, 23 Apr 2015 16:36:27 -0700 Subject: Make docs links go through docs.k8s.io --- autoscaling.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/autoscaling.md b/autoscaling.md index 029a6a82..c1d1578b 100644 --- a/autoscaling.md +++ b/autoscaling.md @@ -21,7 +21,7 @@ done automatically based on statistical analysis and thresholds. * This proposal is for horizontal scaling only. Vertical scaling will be handled in [issue 2072](https://github.com/GoogleCloudPlatform/kubernetes/issues/2072) * `ReplicationControllers` will not know about the auto-scaler, they are the target of the auto-scaler. The `ReplicationController` responsibilities are -constrained to only ensuring that the desired number of pods are operational per the [Replication Controller Design](https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/replication-controller.md#responsibilities-of-the-replication-controller) +constrained to only ensuring that the desired number of pods are operational per the [Replication Controller Design](http://docs.k8s.io/replication-controller.md#responsibilities-of-the-replication-controller) * Auto-scalers will be loosely coupled with data gathering components in order to allow a wide variety of input sources * Auto-scalable resources will support a resize verb ([1629](https://github.com/GoogleCloudPlatform/kubernetes/issues/1629)) such that the auto-scaler does not directly manipulate the underlying resource. @@ -42,7 +42,7 @@ applications will expose one or more network endpoints for clients to connect to balanced or situated behind a proxy - the data from those proxies and load balancers can be used to estimate client to server traffic for applications. This is the primary, but not sole, source of data for making decisions. -Within Kubernetes a [kube proxy](https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/services.md#ips-and-portals) +Within Kubernetes a [kube proxy](http://docs.k8s.io/services.md#ips-and-portals) running on each node directs service requests to the underlying implementation. While the proxy provides internal inter-pod connections, there will be L3 and L7 proxies and load balancers that manage @@ -225,7 +225,7 @@ or down as appropriate. In the future this may be more configurable. ### Interactions with a deployment -In a deployment it is likely that multiple replication controllers must be monitored. For instance, in a [rolling deployment](https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/replication-controller.md#rolling-updates) +In a deployment it is likely that multiple replication controllers must be monitored. For instance, in a [rolling deployment](http://docs.k8s.io/replication-controller.md#rolling-updates) there will be multiple replication controllers, with one scaling up and another scaling down. This means that an auto-scaler must be aware of the entire set of capacity that backs a service so it does not fight with the deployer. `AutoScalerSpec.MonitorSelector` is what provides this ability. By using a selector that spans the entire service the auto-scaler can monitor capacity -- cgit v1.2.3 From 35096d0cf099bdb84ad72eb98ac4b8ec0456aa3c Mon Sep 17 00:00:00 2001 From: Robert Rati Date: Fri, 24 Apr 2015 14:55:42 -0400 Subject: More updates from feedback. #6993 --- high-availability.md | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/high-availability.md b/high-availability.md index 2bfa6dc0..37a5eb09 100644 --- a/high-availability.md +++ b/high-availability.md @@ -6,7 +6,7 @@ For complete reference see [this](https://www.ibm.com/developerworks/community/b 1. Hot Standby: In this scenario, data and state are shared between the two components such that an immediate failure in one component causes the the standby deamon to take over exactly where the failed component had left off. This would be an ideal solution for kubernetes, however it poses a series of challenges in the case of controllers where component-state is cached locally and not persisted in a transactional way to a storage facility. This would also introduce additional load on the apiserver, which is not desireable. As a result, we are **NOT** planning on this approach at this time. -2. **Warm Standby**: In this scenario there is only one active component acting as the master and additional components running by not providing service or responding to requests. Data and state are not shared between the active and standby components. When a failure occurs, the standby component that becomes the master must determine the current state of the system before resuming functionality. +2. **Warm Standby**: In this scenario there is only one active component acting as the master and additional components running but not providing service or responding to requests. Data and state are not shared between the active and standby components. When a failure occurs, the standby component that becomes the master must determine the current state of the system before resuming functionality. This is the apprach that this proposal will leverage. 3. Active-Active (Load Balanced): Clients can simply load-balance across any number of servers that are currently running. Their general availability can be continuously updated, or published, such that load balancing only occurs across active participants. This aspect of HA is outside of the scope of *this* proposal because there is already a partial implementation in the apiserver. @@ -30,5 +30,11 @@ Some command line options would be added to components that can do HA: ## Design Discussion Notes Some components may run numerous threads in order to perform tasks in parallel. Upon losing master status, such components should exit instantly instead of attempting to gracefully shut down such threads. This is to ensure that, in the case there's some propagation delay in informing the threads they should stop, the lame-duck threads won't interfere with the new master. The component should exit with an exit code indicating that the component is not the master. Since all components will be run by systemd or some other monitoring system, this will just result in a restart. +There is a short window for a split brain condition because we cannot gate operations at the apiserver. Having the daemons exit shortens this window but does not eliminate it. A proper solution for this problem will be addressed at a later date. The proposed solution is: +1. This requires transaction support in etcd (which is already planned - see [coreos/etcd#2675](https://github.com/coreos/etcd/pull/2675)) +2. Apart from the entry in etcd that is tracking the lease for a given component and is periodically refreshed, we introduce another entry (per component) that is changed only when the master is changing - let's call it "current master" entry (we don't refresh it). +3. Master replica is aware of a version of its "current master" etcd entry. +4. Whenever a master replica is trying to write something, it also attaches a "precondition" for the version of its "current master" entry [the whole transaction cannot succeed if the version of the corresponding "current master" entry in etcd has changed]. This basically guarantees that if we elect the new master, all transactions coming from the old master will fail. + ## Open Questions: * Is there a desire to keep track of all nodes for a specific component type? -- cgit v1.2.3 From eb7b52f95ad2af7378be8ca2ab3bbba8b1a00f34 Mon Sep 17 00:00:00 2001 From: George Kuan Date: Sun, 26 Apr 2015 19:37:14 -0700 Subject: Corrected some typos --- api_changes.md | 4 ++-- collab.md | 2 +- development.md | 2 +- faster_reviews.md | 6 +++--- logging.md | 2 +- profiling.md | 4 ++-- releasing.md | 2 +- writing-a-getting-started-guide.md | 8 ++++---- 8 files changed, 15 insertions(+), 15 deletions(-) diff --git a/api_changes.md b/api_changes.md index be02e16c..6ab86ce0 100644 --- a/api_changes.md +++ b/api_changes.md @@ -243,7 +243,7 @@ The fuzzer can be found in `pkg/api/testing/fuzzer.go`. ## Update the semantic comparisons VERY VERY rarely is this needed, but when it hits, it hurts. In some rare -cases we end up with objects (e.g. resource quantites) that have morally +cases we end up with objects (e.g. resource quantities) that have morally equivalent values with different bitwise representations (e.g. value 10 with a base-2 formatter is the same as value 0 with a base-10 formatter). The only way Go knows how to do deep-equality is through field-by-field bitwise comparisons. @@ -278,7 +278,7 @@ At last, your change is done, all unit tests pass, e2e passes, you're done, right? Actually, no. You just changed the API. If you are touching an existing facet of the API, you have to try *really* hard to make sure that *all* the examples and docs are updated. There's no easy way to do this, due -in part ot JSON and YAML silently dropping unknown fields. You're clever - +in part to JSON and YAML silently dropping unknown fields. You're clever - you'll figure it out. Put `grep` or `ack` to good use. If you added functionality, you should consider documenting it and/or writing diff --git a/collab.md b/collab.md index dd7b8059..000fb6ea 100644 --- a/collab.md +++ b/collab.md @@ -29,7 +29,7 @@ Maintainers will do merges of appropriately reviewed-and-approved changes during There may be discussion an even approvals granted outside of the above hours, but merges will generally be deferred. If a PR is considered complex or controversial, the merge of that PR should be delayed to give all interested parties in all timezones the opportunity to provide feedback. Concretely, this means that such PRs should be held for 24 -hours before merging. Of course "complex" and "controversial" are left to the judgement of the people involved, but we trust that part of being a committer is the judgement required to evaluate such things honestly, and not be +hours before merging. Of course "complex" and "controversial" are left to the judgment of the people involved, but we trust that part of being a committer is the judgment required to evaluate such things honestly, and not be motivated by your desire (or your cube-mate's desire) to get their code merged. Also see "Holds" below, any reviewer can issue a "hold" to indicate that the PR is in fact complicated or complex and deserves further review. PRs that are incorrectly judged to be merge-able, may be reverted and subject to re-review, if subsequent reviewers believe that they in fact are controversial or complex. diff --git a/development.md b/development.md index bbd94fef..556f7c22 100644 --- a/development.md +++ b/development.md @@ -221,7 +221,7 @@ go run hack/e2e.go -build -pushup -test -down # seeing the output of failed commands. # -ctl can be used to quickly call kubectl against your e2e cluster. Useful for -# cleaning up after a failed test or viewing logs. Use -v to avoid supressing +# cleaning up after a failed test or viewing logs. Use -v to avoid suppressing # kubectl output. go run hack/e2e.go -v -ctl='get events' go run hack/e2e.go -v -ctl='delete pod foobar' diff --git a/faster_reviews.md b/faster_reviews.md index a2d00465..2562879b 100644 --- a/faster_reviews.md +++ b/faster_reviews.md @@ -135,7 +135,7 @@ might need later - but don't implement them now. We understand that it is hard to imagine, but sometimes we make mistakes. It's OK to push back on changes requested during a review. If you have a good reason -for doing something a certain way, you are absolutley allowed to debate the +for doing something a certain way, you are absolutely allowed to debate the merits of a requested change. You might be overruled, but you might also prevail. We're mostly pretty reasonable people. Mostly. @@ -151,7 +151,7 @@ things you can do that might help kick a stalled process along: * Ping the assignee (@username) on the PR comment stream asking for an estimate of when they can get to it. - * Ping the assigneed by email (many of us have email addresses that are well + * Ping the assignee by email (many of us have email addresses that are well published or are the same as our GitHub handle @google.com or @redhat.com). If you think you have fixed all the issues in a round of review, and you haven't @@ -171,7 +171,7 @@ explanation. ## Final: Use common sense Obviously, none of these points are hard rules. There is no document that can -take the place of common sense and good taste. Use your best judgement, but put +take the place of common sense and good taste. Use your best judgment, but put a bit of thought into how your work can be made easier to review. If you do these things your PRs will flow much more easily. diff --git a/logging.md b/logging.md index 82b6a0c8..23430474 100644 --- a/logging.md +++ b/logging.md @@ -1,7 +1,7 @@ Logging Conventions =================== -The following conventions for the glog levels to use. [glog](http://godoc.org/github.com/golang/glog) is globally prefered to [log](http://golang.org/pkg/log/) for better runtime control. +The following conventions for the glog levels to use. [glog](http://godoc.org/github.com/golang/glog) is globally preferred to [log](http://golang.org/pkg/log/) for better runtime control. * glog.Errorf() - Always an error * glog.Warningf() - Something unexpected, but probably not an error diff --git a/profiling.md b/profiling.md index 1e14b5c4..03b17766 100644 --- a/profiling.md +++ b/profiling.md @@ -4,7 +4,7 @@ This document explain how to plug in profiler and how to profile Kubernetes serv ## Profiling library -Go comes with inbuilt 'net/http/pprof' profiling library and profiling web service. The way service works is binding debug/pprof/ subtree on a running webserver to the profiler. Reading from subpages of debug/pprof returns pprof-formated profiles of the running binary. The output can be processed offline by the tool of choice, or used as an input to handy 'go tool pprof', which can graphically represent the result. +Go comes with inbuilt 'net/http/pprof' profiling library and profiling web service. The way service works is binding debug/pprof/ subtree on a running webserver to the profiler. Reading from subpages of debug/pprof returns pprof-formatted profiles of the running binary. The output can be processed offline by the tool of choice, or used as an input to handy 'go tool pprof', which can graphically represent the result. ## Adding profiling to services to APIserver. @@ -31,4 +31,4 @@ to get 30 sec. CPU profile. ## Contention profiling -To enable contetion profiling you need to add line ```rt.SetBlockProfileRate(1)``` in addition to ```m.mux.HandleFunc(...)``` added before (```rt``` stands for ```runtime``` in ```master.go```). This enables 'debug/pprof/block' subpage, which can be used as an input to ```go tool pprof```. +To enable contention profiling you need to add line ```rt.SetBlockProfileRate(1)``` in addition to ```m.mux.HandleFunc(...)``` added before (```rt``` stands for ```runtime``` in ```master.go```). This enables 'debug/pprof/block' subpage, which can be used as an input to ```go tool pprof```. diff --git a/releasing.md b/releasing.md index 125355c3..02bb0ca4 100644 --- a/releasing.md +++ b/releasing.md @@ -92,7 +92,7 @@ get` while in fact they do not match `v0.5` (the one that was tagged) exactly. To handle that case, creating a new release should involve creating two adjacent commits where the first of them will set the version to `v0.5` and the second will set it to `v0.5-dev`. In that case, even in the presence of merges, there -will be a single comit where the exact `v0.5` version will be used and all +will be a single commit where the exact `v0.5` version will be used and all others around it will either have `v0.4-dev` or `v0.5-dev`. The diagram below illustrates it. diff --git a/writing-a-getting-started-guide.md b/writing-a-getting-started-guide.md index 7c837351..c1066f06 100644 --- a/writing-a-getting-started-guide.md +++ b/writing-a-getting-started-guide.md @@ -6,7 +6,7 @@ guide. A Getting Started Guide is instructions on how to create a Kubernetes cluster on top of a particular type(s) of infrastructure. Infrastructure includes: the IaaS provider for VMs; the node OS; inter-node networking; and node Configuration Management system. -A guide refers to scripts, Configuration Manangement files, and/or binary assets such as RPMs. We call +A guide refers to scripts, Configuration Management files, and/or binary assets such as RPMs. We call the combination of all these things needed to run on a particular type of infrastructure a **distro**. @@ -39,7 +39,7 @@ These guidelines say *what* to do. See the Rationale section for *why*. that are updated to the new version. - Versioned distros should typically not modify or add code in `cluster/`. That is just scripts for developer distros. - - If a versioned distro has not been updated for many binary releases, it may be dropped frome the Matrix. + - If a versioned distro has not been updated for many binary releases, it may be dropped from the Matrix. If you have a cluster partially working, but doing all the above steps seems like too much work, we still want to hear from you. We suggest you write a blog post or a Gist, and we will link to it on our wiki page. @@ -58,13 +58,13 @@ These guidelines say *what* to do. See the Rationale section for *why*. - a development distro needs to have an organization which owns it. This organization needs to: - Setting up and maintaining Continuous Integration that runs e2e frequently (multiple times per day) against the Distro at head, and which notifies all devs of breakage. - - being reasonably available for questions and assiting with + - being reasonably available for questions and assisting with refactoring and feature additions that affect code for their IaaS. ## Rationale - We want want people to create Kubernetes clusters with whatever IaaS, Node OS, configuration management tools, and so on, which they are familiar with. The - guidelines for **versioned distros** are designed for flexiblity. + guidelines for **versioned distros** are designed for flexibility. - We want developers to be able to work without understanding all the permutations of IaaS, NodeOS, and configuration management. The guidelines for **developer distros** are designed for consistency. -- cgit v1.2.3 From ef8d5722be698e57886b2c47df2bdddb9d37da9e Mon Sep 17 00:00:00 2001 From: Paul Morie Date: Fri, 24 Apr 2015 18:02:52 -0400 Subject: Add hint re: fuzzer to api changes doc --- api_changes.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/api_changes.md b/api_changes.md index be02e16c..d68a776f 100644 --- a/api_changes.md +++ b/api_changes.md @@ -236,7 +236,9 @@ assumptions. If you have added any fields which need very careful formatting "this slice will always have at least 1 element", you may get an error or even a panic from the `serialization_test`. If so, look at the diff it produces (or the backtrace in case of a panic) and figure out what you forgot. Encode that -into the fuzzer's custom fuzz functions. +into the fuzzer's custom fuzz functions. Hint: if you added defaults for a field, +that field will need to have a custom fuzz function that ensures that the field is +fuzzed to a non-empty value. The fuzzer can be found in `pkg/api/testing/fuzzer.go`. -- cgit v1.2.3 From 14760aef25239601cc8859694bd464d18404f55c Mon Sep 17 00:00:00 2001 From: markturansky Date: Tue, 14 Apr 2015 17:14:39 -0400 Subject: PersistentVolumeClaimBinder implementation --- persistent-storage.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/persistent-storage.md b/persistent-storage.md index 45ab8d42..fb53ad10 100644 --- a/persistent-storage.md +++ b/persistent-storage.md @@ -12,7 +12,7 @@ A `PersistentVolumeClaim` (PVC) is a user's request for a persistent volume to u One new system component: -`PersistentVolumeManager` is a singleton running in master that manages all PVs in the system, analogous to the node controller. The volume manager watches the API for newly created volumes to manage. The manager also watches for claims by users and binds them to available volumes. +`PersistentVolumeClaimBinder` is a singleton running in master that watches all PersistentVolumeClaims in the system and binds them to the closest matching available PersistentVolume. The volume manager watches the API for newly created volumes to manage. One new volume: @@ -32,7 +32,7 @@ Kubernetes makes no guarantees at runtime that the underlying storage exists or #### Describe available storage -Cluster administrators use the API to manage *PersistentVolumes*. The singleton PersistentVolumeManager watches the Kubernetes API for new volumes and adds them to its internal cache of volumes in the system. All persistent volumes are managed and made available by the volume manager. The manager also watches for new claims for storage and binds them to an available volume by matching the volume's characteristics (AccessModes and storage size) to the user's request. +Cluster administrators use the API to manage *PersistentVolumes*. A custom store ```NewPersistentVolumeOrderedIndex``` will index volumes by access modes and sort by storage capacity. The ```PersistentVolumeClaimBinder``` watches for new claims for storage and binds them to an available volume by matching the volume's characteristics (AccessModes and storage size) to the user's request. PVs are system objects and, thus, have no namespace. @@ -151,7 +151,7 @@ myclaim-1 map[] pending #### Matching and binding - The ```PersistentVolumeManager``` attempts to find an available volume that most closely matches the user's request. If one exists, they are bound by putting a reference on the PV to the PVC. Requests can go unfulfilled if a suitable match is not found. + The ```PersistentVolumeClaimBinder``` attempts to find an available volume that most closely matches the user's request. If one exists, they are bound by putting a reference on the PV to the PVC. Requests can go unfulfilled if a suitable match is not found. ``` @@ -209,6 +209,6 @@ cluster/kubectl.sh delete pvc myclaim-1 ``` -The ```PersistentVolumeManager``` will reconcile this by removing the claim reference from the PV and change the PVs status to 'Released'. +The ```PersistentVolumeClaimBinder``` will reconcile this by removing the claim reference from the PV and change the PVs status to 'Released'. Admins can script the recycling of released volumes. Future dynamic provisioners will understand how a volume should be recycled. -- cgit v1.2.3 From c7f8e8e7f8f037b0e7d94b0e361b75ea5c50676d Mon Sep 17 00:00:00 2001 From: Wojciech Tyczynski Date: Fri, 17 Apr 2015 14:16:33 +0200 Subject: Improvements to conversions generator. --- api_changes.md | 19 +++++++++++++++++-- 1 file changed, 17 insertions(+), 2 deletions(-) diff --git a/api_changes.md b/api_changes.md index 5e648544..6c495c4c 100644 --- a/api_changes.md +++ b/api_changes.md @@ -222,9 +222,24 @@ types, structural change in particular - you must add some logic to convert versioned APIs to and from the internal representation. If you see errors from the `serialization_test`, it may indicate the need for explicit conversions. +Performance of conversions very heavily influence performance of apiserver. +Thus, we are auto-generating conversion functions that are much more efficient +than the generic ones (which are based on reflections and thus are highly +inefficient). + The conversion code resides with each versioned API - -`pkg/api//conversion.go`. Unsurprisingly, this also requires you to -add tests to `pkg/api//conversion_test.go`. +`pkg/api//conversion.go`. To regenerate conversion functions: + - run +``` + $ go run cmd/kube-conversion/conversion.go -v -f -n +``` + - replace all conversion functions (convert\* functions) in the above file + with the contents of \ + - replace arguments of `newer.Scheme.AddGeneratedConversionFuncs` + with the contents of \ + +Unsurprisingly, this also requires you to add tests to +`pkg/api//conversion_test.go`. ## Update the fuzzer -- cgit v1.2.3 From e3e627bb63256f96741845b53fdbf9b999243437 Mon Sep 17 00:00:00 2001 From: Robert Rati Date: Wed, 29 Apr 2015 14:50:13 -0400 Subject: More updates based on feedback. #6993 --- high-availability.md | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/high-availability.md b/high-availability.md index 37a5eb09..04d206fd 100644 --- a/high-availability.md +++ b/high-availability.md @@ -28,13 +28,14 @@ Some command line options would be added to components that can do HA: * Lease Duration - How long a component can be master ## Design Discussion Notes -Some components may run numerous threads in order to perform tasks in parallel. Upon losing master status, such components should exit instantly instead of attempting to gracefully shut down such threads. This is to ensure that, in the case there's some propagation delay in informing the threads they should stop, the lame-duck threads won't interfere with the new master. The component should exit with an exit code indicating that the component is not the master. Since all components will be run by systemd or some other monitoring system, this will just result in a restart. +Some components may run numerous threads in order to perform tasks in parallel. Upon losing master status, such components should exit instantly instead of attempting to gracefully shut down such threads. This is to ensure that, in the case there's some propagation delay in informing the threads they should stop, the lame-duck threads won't interfere with the new master. The component should exit with an exit code indicating that the component is not the master. Since all components will be run by systemd or some other monitoring system, this will just result in a restart. -There is a short window for a split brain condition because we cannot gate operations at the apiserver. Having the daemons exit shortens this window but does not eliminate it. A proper solution for this problem will be addressed at a later date. The proposed solution is: +There is a short window after a new master acquires the lease, during which data from the old master might be committed. This is because there is currently no way to condition a write on its source being the master. Having the daemons exit shortens this window but does not eliminate it. A proper solution for this problem will be addressed at a later date. The proposed solution is: 1. This requires transaction support in etcd (which is already planned - see [coreos/etcd#2675](https://github.com/coreos/etcd/pull/2675)) -2. Apart from the entry in etcd that is tracking the lease for a given component and is periodically refreshed, we introduce another entry (per component) that is changed only when the master is changing - let's call it "current master" entry (we don't refresh it). -3. Master replica is aware of a version of its "current master" etcd entry. -4. Whenever a master replica is trying to write something, it also attaches a "precondition" for the version of its "current master" entry [the whole transaction cannot succeed if the version of the corresponding "current master" entry in etcd has changed]. This basically guarantees that if we elect the new master, all transactions coming from the old master will fail. +2. The entry in etcd that is tracking the lease for a given component (the "current master" entry) would have as its value the host:port of the lease-holder (as described earlier) and a sequence number. The sequence number is incremented whenever a new master gets the lease. +3. Master replica is aware of the latest sequence number. +4. Whenever master replica sends a mutating operation to the API server, it includes the sequence number. +5. When the API server makes the corresponding write to etcd, it includes it in a transaction that does a compare-and-swap on the "current master" entry (old value == new value == host:port and sequence number from the replica that sent the mutating operation). This basically guarantees that if we elect the new master, all transactions coming from the old master will fail. You can think of this as the master attaching a "precondition" of its belief about who is the latest master. ## Open Questions: * Is there a desire to keep track of all nodes for a specific component type? -- cgit v1.2.3 From 915f099020547e911ee8b7badfc4e130d5cf8da3 Mon Sep 17 00:00:00 2001 From: Marc Tamsky Date: Thu, 30 Apr 2015 22:58:33 -0700 Subject: React to failure by growing the remaining clusters --- federation.md | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/federation.md b/federation.md index df9f37eb..e261833e 100644 --- a/federation.md +++ b/federation.md @@ -222,10 +222,14 @@ initial implementation targeting single cloud provider only. 1. Auto-scaling (not yet available) in the remaining clusters takes care of it for me automagically as the additional failed-over traffic arrives (with some latency). +1. I manually specify "additional resources to be provisioned" per + remaining cluster, possibly proportional to both the remaining functioning resources + and the unavailable resources in the failed cluster(s). + (All the benefits of over-provisioning, without expensive idle resources.) Doing nothing (i.e. forcing users to choose between 1 and 2 on their own) is probably an OK starting point. Kubernetes autoscaling can get -us to three at some later date. +us to 3 at some later date. Up to this point, this use case ("Unavailability Zones") seems materially different from all the others above. It does not require dynamic cross-cluster service migration (we assume that the service is already running in more than one cluster when the failure occurs). Nor does it necessarily involve cross-cluster service discovery or location affinity. As a result, I propose that we address this use case somewhat independently of the others (although I strongly suspect that it will become substantially easier once we've solved the others). -- cgit v1.2.3 From eae1999ffc5b3de397887a7f181d3be8fcf34ca3 Mon Sep 17 00:00:00 2001 From: Robert Rati Date: Fri, 1 May 2015 10:02:56 -0400 Subject: Fixed list formatting in the design discussion notes. #6993 --- high-availability.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/high-availability.md b/high-availability.md index 04d206fd..647c9562 100644 --- a/high-availability.md +++ b/high-availability.md @@ -31,10 +31,15 @@ Some command line options would be added to components that can do HA: Some components may run numerous threads in order to perform tasks in parallel. Upon losing master status, such components should exit instantly instead of attempting to gracefully shut down such threads. This is to ensure that, in the case there's some propagation delay in informing the threads they should stop, the lame-duck threads won't interfere with the new master. The component should exit with an exit code indicating that the component is not the master. Since all components will be run by systemd or some other monitoring system, this will just result in a restart. There is a short window after a new master acquires the lease, during which data from the old master might be committed. This is because there is currently no way to condition a write on its source being the master. Having the daemons exit shortens this window but does not eliminate it. A proper solution for this problem will be addressed at a later date. The proposed solution is: + 1. This requires transaction support in etcd (which is already planned - see [coreos/etcd#2675](https://github.com/coreos/etcd/pull/2675)) + 2. The entry in etcd that is tracking the lease for a given component (the "current master" entry) would have as its value the host:port of the lease-holder (as described earlier) and a sequence number. The sequence number is incremented whenever a new master gets the lease. + 3. Master replica is aware of the latest sequence number. + 4. Whenever master replica sends a mutating operation to the API server, it includes the sequence number. + 5. When the API server makes the corresponding write to etcd, it includes it in a transaction that does a compare-and-swap on the "current master" entry (old value == new value == host:port and sequence number from the replica that sent the mutating operation). This basically guarantees that if we elect the new master, all transactions coming from the old master will fail. You can think of this as the master attaching a "precondition" of its belief about who is the latest master. ## Open Questions: -- cgit v1.2.3 From 4e50c7273b2b096b968aeec4f6a3379d33ac5d8d Mon Sep 17 00:00:00 2001 From: Brendan Burns Date: Thu, 30 Apr 2015 22:16:59 -0700 Subject: Add a central simple getting started guide with kubernetes guide. Point several getting started guides at this doc. --- simple-rolling-update.md | 1 - 1 file changed, 1 deletion(-) diff --git a/simple-rolling-update.md b/simple-rolling-update.md index 43b086ae..2d2bd826 100644 --- a/simple-rolling-update.md +++ b/simple-rolling-update.md @@ -57,7 +57,6 @@ then ```foo-next``` is synthesized using the pattern ```- Date: Mon, 4 May 2015 15:37:07 -0400 Subject: Add step to API changes doc for swagger regen --- api_changes.md | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/api_changes.md b/api_changes.md index 6c495c4c..f46d2d4e 100644 --- a/api_changes.md +++ b/api_changes.md @@ -301,6 +301,14 @@ you'll figure it out. Put `grep` or `ack` to good use. If you added functionality, you should consider documenting it and/or writing an example to illustrate your change. +Make sure you update the swagger API spec by running: + +```shell +$ hack/update-swagger-spec.sh +``` + +The API spec changes should be in a commit separate from your other changes. + ## Incompatible API changes If your change is going to be backward incompatible or might be a breaking change for API consumers, please send an announcement to `kubernetes-dev@googlegroups.com` before -- cgit v1.2.3 From ff2913cfb3328510a869c29f5ee950d6c650a43f Mon Sep 17 00:00:00 2001 From: Saad Ali Date: Tue, 5 May 2015 18:11:58 -0700 Subject: Fix event doc link --- event_compression.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/event_compression.md b/event_compression.md index 99dda143..2523c859 100644 --- a/event_compression.md +++ b/event_compression.md @@ -74,5 +74,5 @@ This demonstrates what would have been 20 separate entries (indicating schedulin * Issue [#4073](https://github.com/GoogleCloudPlatform/kubernetes/issues/4073): Compress duplicate events * PR [#4157](https://github.com/GoogleCloudPlatform/kubernetes/issues/4157): Add "Update Event" to Kubernetes API * PR [#4206](https://github.com/GoogleCloudPlatform/kubernetes/issues/4206): Modify Event struct to allow compressing multiple recurring events in to a single event - * PR [#4306](https://github.com/GoogleCloudPlatform/kubernetes/issues/4073): Compress recurring events in to a single event to optimize etcd storage + * PR [#4306](https://github.com/GoogleCloudPlatform/kubernetes/issues/4306): Compress recurring events in to a single event to optimize etcd storage * PR [#4444](https://github.com/GoogleCloudPlatform/kubernetes/pull/4444): Switch events history to use LRU cache instead of map -- cgit v1.2.3 From 24906a08ebbce50f3a4db4b3052f102bffa5bbe7 Mon Sep 17 00:00:00 2001 From: Paul Morie Date: Wed, 6 May 2015 10:04:39 -0400 Subject: Fix link to service accounts doc in security context doc --- security_context.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/security_context.md b/security_context.md index cd10202e..62e203a5 100644 --- a/security_context.md +++ b/security_context.md @@ -32,7 +32,7 @@ Processes in pods will need to have consistent UID/GID/SELinux category labels i * The concept of a security context should not be tied to a particular security mechanism or platform (ie. SELinux, AppArmor) * Applying a different security context to a scope (namespace or pod) requires a solution such as the one proposed for - [service accounts](https://github.com/GoogleCloudPlatform/kubernetes/pull/2297). + [service accounts](./service_accounts.md). ## Use Cases -- cgit v1.2.3 From 950547c92a52860fd55c49631b8dcc08403ff5fd Mon Sep 17 00:00:00 2001 From: Brendan Burns Date: Thu, 30 Apr 2015 10:28:36 -0700 Subject: Add support for --rollback. --- simple-rolling-update.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/simple-rolling-update.md b/simple-rolling-update.md index 2d2bd826..c5667b44 100644 --- a/simple-rolling-update.md +++ b/simple-rolling-update.md @@ -38,7 +38,7 @@ it is assumed that the rollout is nearly completed, and ```foo-next``` is rename ### Aborting a rollout Abort is assumed to want to reverse a rollout in progress. -```kubectl rolling-update rc foo [foo-v2] --abort``` +```kubectl rolling-update rc foo [foo-v2] --rollback``` This is really just semantic sugar for: -- cgit v1.2.3 From 6ab2274c588cbbfaf2ff4a1f7551b2e7b2cc5ad8 Mon Sep 17 00:00:00 2001 From: Weiwei Jiang Date: Thu, 7 May 2015 16:10:50 +0800 Subject: Fix wrong link for security context --- service_accounts.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/service_accounts.md b/service_accounts.md index a3a1bb49..5eaa0d99 100644 --- a/service_accounts.md +++ b/service_accounts.md @@ -21,7 +21,7 @@ They also may interact with services other than the Kubernetes API, such as: A service account binds together several things: - a *name*, understood by users, and perhaps by peripheral systems, for an identity - a *principal* that can be authenticated and [authorized](../authorization.md) - - a [security context](./security_contexts.md), which defines the Linux Capabilities, User IDs, Groups IDs, and other + - a [security context](./security_context.md), which defines the Linux Capabilities, User IDs, Groups IDs, and other capabilities and controls on interaction with the file system and OS. - a set of [secrets](./secrets.md), which a container may use to access various networked resources. -- cgit v1.2.3 From 8b6e9102beb4fc7914000c10e0b0ef99b5245fbf Mon Sep 17 00:00:00 2001 From: Matt Bogosian Date: Thu, 7 May 2015 12:04:31 -0700 Subject: Fix environment variable error in Vagrant docs: `KUBERNETES_MASTER_MINION` -> `KUBERNETES_MINION_MEMORY`. --- developer-guides/vagrant.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/developer-guides/vagrant.md b/developer-guides/vagrant.md index baf40b97..50c9769a 100644 --- a/developer-guides/vagrant.md +++ b/developer-guides/vagrant.md @@ -328,7 +328,7 @@ If you need more granular control, you can set the amount of memory for the mast ```sh export KUBERNETES_MASTER_MEMORY=1536 -export KUBERNETES_MASTER_MINION=2048 +export KUBERNETES_MINION_MEMORY=2048 ``` #### I ran vagrant suspend and nothing works! -- cgit v1.2.3 From c7c03fe466e28070d7ecc5194f74a391fcf785a6 Mon Sep 17 00:00:00 2001 From: Paul Weil Date: Fri, 8 May 2015 16:38:28 -0400 Subject: bring doc up to date with actual api types --- security_context.md | 113 +++++++++++++++++++--------------------------------- 1 file changed, 40 insertions(+), 73 deletions(-) diff --git a/security_context.md b/security_context.md index 62e203a5..5f5376b0 100644 --- a/security_context.md +++ b/security_context.md @@ -65,8 +65,8 @@ be addressed with security contexts: ### Overview A *security context* consists of a set of constraints that determine how a container -is secured before getting created and run. It has a 1:1 correspondence to a -[service account](https://github.com/GoogleCloudPlatform/kubernetes/pull/2297). A *security context provider* is passed to the Kubelet so it can have a chance +is secured before getting created and run. A security context resides on the container and represents the runtime parameters that will +be used to create and run the container via container APIs. A *security context provider* is passed to the Kubelet so it can have a chance to mutate Docker API calls in order to apply the security context. It is recommended that this design be implemented in two phases: @@ -88,7 +88,7 @@ type SecurityContextProvider interface { // the container is created. // An error is returned if it's not possible to secure the container as // requested with a security context. - ModifyContainerConfig(pod *api.Pod, container *api.Container, config *docker.Config) error + ModifyContainerConfig(pod *api.Pod, container *api.Container, config *docker.Config) // ModifyHostConfig is called before the Docker runContainer call. // The security context provider can make changes to the HostConfig, affecting @@ -103,88 +103,55 @@ If the value of the SecurityContextProvider field on the Kubelet is nil, the kub ### Security Context -A security context has a 1:1 correspondence to a service account and it can be included as -part of the service account resource. Following is an example of an initial implementation: +A security context resides on the container and represents the runtime parameters that will +be used to create and run the container via container APIs. Following is an example of an initial implementation: ```go +type type Container struct { + ... other fields omitted ... + // Optional: SecurityContext defines the security options the pod should be run with + SecurityContext *SecurityContext +} -// SecurityContext specifies the security constraints associated with a service account +// SecurityContext holds security configuration that will be applied to a container. SecurityContext +// contains duplication of some existing fields from the Container resource. These duplicate fields +// will be populated based on the Container configuration if they are not set. Defining them on +// both the Container AND the SecurityContext will result in an error. type SecurityContext struct { - // user is the uid to use when running the container - User int - - // AllowPrivileged indicates whether this context allows privileged mode containers - AllowPrivileged bool - - // AllowedVolumeTypes lists the types of volumes that a container can bind - AllowedVolumeTypes []string - - // AddCapabilities is the list of Linux kernel capabilities to add - AddCapabilities []string - - // RemoveCapabilities is the list of Linux kernel capabilities to remove - RemoveCapabilities []string - - // Isolation specifies the type of isolation required for containers - // in this security context - Isolation ContainerIsolationSpec -} + // Capabilities are the capabilities to add/drop when running the container + Capabilities *Capabilities -// ContainerIsolationSpec indicates intent for container isolation -type ContainerIsolationSpec struct { - // Type is the container isolation type (None, Private) - Type ContainerIsolationType - - // FUTURE: IDMapping specifies how users and groups from the host will be mapped - IDMapping *IDMapping -} + // Run the container in privileged mode + Privileged *bool -// ContainerIsolationType is the type of container isolation for a security context -type ContainerIsolationType string + // SELinuxOptions are the labels to be applied to the container + // and volumes + SELinuxOptions *SELinuxOptions -const ( - // ContainerIsolationNone means that no additional consraints are added to - // containers to isolate them from their host - ContainerIsolationNone ContainerIsolationType = "None" - - // ContainerIsolationPrivate means that containers are isolated in process - // and storage from their host and other containers. - ContainerIsolationPrivate ContainerIsolationType = "Private" -) - -// IDMapping specifies the requested user and group mappings for containers -// associated with a specific security context -type IDMapping struct { - // SharedUsers is the set of user ranges that must be unique to the entire cluster - SharedUsers []IDMappingRange - - // SharedGroups is the set of group ranges that must be unique to the entire cluster - SharedGroups []IDMappingRange + // RunAsUser is the UID to run the entrypoint of the container process. + RunAsUser *int64 +} - // PrivateUsers are mapped to users on the host node, but are not necessarily - // unique to the entire cluster - PrivateUsers []IDMappingRange +// SELinuxOptions are the labels to be applied to the container. +type SELinuxOptions struct { + // SELinux user label + User string - // PrivateGroups are mapped to groups on the host node, but are not necessarily - // unique to the entire cluster - PrivateGroups []IDMappingRange -} + // SELinux role label + Role string -// IDMappingRange specifies a mapping between container IDs and node IDs -type IDMappingRange struct { - // ContainerID is the starting container UID or GID - ContainerID int + // SELinux type label + Type string - // HostID is the starting host UID or GID - HostID int - - // Length is the length of the UID/GID range - Length int + // SELinux level label. + Level string } - ``` +### Admission +It is up to an admission plugin to determine if the security context is acceptable or not. At the +time of writing, the admission control plugin for security contexts will only allow a context that +has defined capabilities or privileged. Contexts that attempt to define a UID or SELinux options +will be denied by default. In the future the admission plugin will base this decision upon +configurable policies that reside within the [service account](https://github.com/GoogleCloudPlatform/kubernetes/pull/2297). -#### Security Context Lifecycle - -The lifecycle of a security context will be tied to that of a service account. It is expected that a service account with a default security context will be created for every Kubernetes namespace (without administrator intervention). If resources need to be allocated when creating a security context (for example, assign a range of host uids/gids), a pattern such as [finalizers](https://github.com/GoogleCloudPlatform/kubernetes/issues/3585) can be used before declaring the security context / service account / namespace ready for use. -- cgit v1.2.3 From c92c7a5d8201e1bb2b74af77f0c3980d6b8c750b Mon Sep 17 00:00:00 2001 From: Wojciech Tyczynski Date: Wed, 13 May 2015 14:36:59 +0200 Subject: Instructions for generating conversions. --- api_changes.md | 28 +++++++++++++++++++++------- 1 file changed, 21 insertions(+), 7 deletions(-) diff --git a/api_changes.md b/api_changes.md index f46d2d4e..8b0a0e56 100644 --- a/api_changes.md +++ b/api_changes.md @@ -227,18 +227,32 @@ Thus, we are auto-generating conversion functions that are much more efficient than the generic ones (which are based on reflections and thus are highly inefficient). -The conversion code resides with each versioned API - -`pkg/api//conversion.go`. To regenerate conversion functions: +The conversion code resides with each versioned API. There are two files: + - `pkg/api//conversion.go` containing manually written conversion + functions + - `pkg/api//conversion_generated.go` containing auto-generated + conversion functions + +Since auto-generated conversion functions are using manually written ones, +those manually written should be named with a defined convention, i.e. a function +converting type X in pkg a to type Y in pkg b, should be named: +`convert_a_X_To_b_Y`. + +Also note that you can (and for efficiency reasons should) use auto-generated +conversion functions when writing your conversion functions. + +Once all the necessary manually written conversions are added, you need to +regenerate auto-generated ones. To regenerate them: - run ``` $ go run cmd/kube-conversion/conversion.go -v -f -n ``` - - replace all conversion functions (convert\* functions) in the above file - with the contents of \ - - replace arguments of `newer.Scheme.AddGeneratedConversionFuncs` - with the contents of \ + - replace all conversion functions (convert\* functions) in the + `pkg/api//conversion_generated.go` with the contents of \ + - replace arguments of `newer.Scheme.AddGeneratedConversionFuncs` in the + `pkg/api//conversion_generated.go` with the contents of \ -Unsurprisingly, this also requires you to add tests to +Unsurprisingly, adding manually written conversion also requires you to add tests to `pkg/api//conversion_test.go`. ## Update the fuzzer -- cgit v1.2.3 From b67f72a3168e3be7368c968d93172a365bd84eb1 Mon Sep 17 00:00:00 2001 From: Tim Hockin Date: Tue, 12 May 2015 21:59:44 -0700 Subject: Switch git hooks to use pre-commit --- development.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/development.md b/development.md index 556f7c22..6d6bdb86 100644 --- a/development.md +++ b/development.md @@ -105,8 +105,7 @@ directory. This will keep you from accidentally committing non-gofmt'd go code. ``` cd kubernetes/.git/hooks/ -ln -s ../../hooks/prepare-commit-msg . -ln -s ../../hooks/commit-msg . +ln -s ../../hooks/pre-commit . ``` ## Unit tests -- cgit v1.2.3 From e1d595ebbd61baebc62f6db3150ee5881e6e71a8 Mon Sep 17 00:00:00 2001 From: Jeff Lowdermilk Date: Thu, 14 May 2015 15:12:45 -0700 Subject: Add ga-beacon analytics to gendocs scripts hack/run-gendocs.sh puts ga-beacon analytics link into all md files, hack/verify-gendocs.sh verifies presence of link. --- README.md | 3 +++ api_changes.md | 3 +++ coding-conventions.md | 3 +++ collab.md | 3 +++ developer-guides/vagrant.md | 3 +++ development.md | 3 +++ faster_reviews.md | 3 +++ flaky-tests.md | 3 +++ issues.md | 3 +++ logging.md | 3 +++ profiling.md | 3 +++ pull-requests.md | 3 +++ releasing.md | 3 +++ writing-a-getting-started-guide.md | 3 +++ 14 files changed, 42 insertions(+) diff --git a/README.md b/README.md index bf398e9f..13ccc42d 100644 --- a/README.md +++ b/README.md @@ -19,3 +19,6 @@ Docs in this directory relate to developing Kubernetes. and how the version information gets embedded into the built binaries. * **Profiling Kubernetes** ([profiling.md](profiling.md)): How to plug in go pprof profiler to Kubernetes. + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/README.md?pixel)]() diff --git a/api_changes.md b/api_changes.md index 8b0a0e56..c2932215 100644 --- a/api_changes.md +++ b/api_changes.md @@ -332,3 +332,6 @@ the change gets in. If you are unsure, ask. Also make sure that the change gets ## Adding new REST objects TODO(smarterclayton): write this. + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/api_changes.md?pixel)]() diff --git a/coding-conventions.md b/coding-conventions.md index 3d493803..bdcbb708 100644 --- a/coding-conventions.md +++ b/coding-conventions.md @@ -5,3 +5,6 @@ Coding style advice for contributors - https://github.com/golang/go/wiki/CodeReviewComments - https://gist.github.com/lavalamp/4bd23295a9f32706a48f + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/coding-conventions.md?pixel)]() diff --git a/collab.md b/collab.md index 000fb6ea..293cd6f4 100644 --- a/collab.md +++ b/collab.md @@ -38,3 +38,6 @@ PRs that are incorrectly judged to be merge-able, may be reverted and subject to ## Holds Any maintainer or core contributor who wants to review a PR but does not have time immediately may put a hold on a PR simply by saying so on the PR discussion and offering an ETA measured in single-digit days at most. Any PR that has a hold shall not be merged until the person who requested the hold acks the review, withdraws their hold, or is overruled by a preponderance of maintainers. + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/collab.md?pixel)]() diff --git a/developer-guides/vagrant.md b/developer-guides/vagrant.md index 50c9769a..f958b124 100644 --- a/developer-guides/vagrant.md +++ b/developer-guides/vagrant.md @@ -333,3 +333,6 @@ export KUBERNETES_MINION_MEMORY=2048 #### I ran vagrant suspend and nothing works! ```vagrant suspend``` seems to mess up the network. It's not supported at this time. + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/developer-guides/vagrant.md?pixel)]() diff --git a/development.md b/development.md index 6d6bdb86..02b513cc 100644 --- a/development.md +++ b/development.md @@ -267,3 +267,6 @@ git remote set-url --push upstream no_push ``` hack/run-gendocs.sh ``` + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/development.md?pixel)]() diff --git a/faster_reviews.md b/faster_reviews.md index 2562879b..ed890a7f 100644 --- a/faster_reviews.md +++ b/faster_reviews.md @@ -175,3 +175,6 @@ take the place of common sense and good taste. Use your best judgment, but put a bit of thought into how your work can be made easier to review. If you do these things your PRs will flow much more easily. + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/faster_reviews.md?pixel)]() diff --git a/flaky-tests.md b/flaky-tests.md index e352e110..56bd2c59 100644 --- a/flaky-tests.md +++ b/flaky-tests.md @@ -64,3 +64,6 @@ Eventually you will have sufficient runs for your purposes. At that point you ca If you do a final check for flakes with ```docker ps -a```, ignore tasks that exited -1, since that's what happens when you stop the replication controller. Happy flake hunting! + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/flaky-tests.md?pixel)]() diff --git a/issues.md b/issues.md index 51395cae..99e1089a 100644 --- a/issues.md +++ b/issues.md @@ -17,3 +17,6 @@ Definitions * design - priority/design is for issues that are used to track design discussions * support - priority/support is used for issues tracking user support requests * untriaged - anything without a priority/X label will be considered untriaged + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/issues.md?pixel)]() diff --git a/logging.md b/logging.md index 23430474..331eda97 100644 --- a/logging.md +++ b/logging.md @@ -24,3 +24,6 @@ The following conventions for the glog levels to use. [glog](http://godoc.org/g * Logging in particularly thorny parts of code where you may want to come back later and check it As per the comments, the practical default level is V(2). Developers and QE environments may wish to run at V(3) or V(4). If you wish to change the log level, you can pass in `-v=X` where X is the desired maximum level to log. + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/logging.md?pixel)]() diff --git a/profiling.md b/profiling.md index 03b17766..1dd42095 100644 --- a/profiling.md +++ b/profiling.md @@ -32,3 +32,6 @@ to get 30 sec. CPU profile. ## Contention profiling To enable contention profiling you need to add line ```rt.SetBlockProfileRate(1)``` in addition to ```m.mux.HandleFunc(...)``` added before (```rt``` stands for ```runtime``` in ```master.go```). This enables 'debug/pprof/block' subpage, which can be used as an input to ```go tool pprof```. + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/profiling.md?pixel)]() diff --git a/pull-requests.md b/pull-requests.md index ed12b839..627bc64e 100644 --- a/pull-requests.md +++ b/pull-requests.md @@ -14,3 +14,6 @@ We want to limit the total number of PRs in flight to: * Maintain a clean project * Remove old PRs that would be difficult to rebase as the underlying code has changed over time * Encourage code velocity + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/pull-requests.md?pixel)]() diff --git a/releasing.md b/releasing.md index 02bb0ca4..803e321a 100644 --- a/releasing.md +++ b/releasing.md @@ -163,3 +163,6 @@ After this summary, preamble, all the relevant PRs/issues that got in that version should be listed and linked together with a small summary understandable by plain mortals (in a perfect world PR/issue's title would be enough but often it is just too cryptic/geeky/domain-specific that it isn't). + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/releasing.md?pixel)]() diff --git a/writing-a-getting-started-guide.md b/writing-a-getting-started-guide.md index c1066f06..873fafcc 100644 --- a/writing-a-getting-started-guide.md +++ b/writing-a-getting-started-guide.md @@ -97,3 +97,6 @@ These guidelines say *what* to do. See the Rationale section for *why*. if you use another Configuration Management tool -- you just have to do some manual steps during testing and deployment. + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/writing-a-getting-started-guide.md?pixel)]() -- cgit v1.2.3 From d3a429a76f5e1852ed437c30a3cdb1607e04f713 Mon Sep 17 00:00:00 2001 From: Jeff Lowdermilk Date: Thu, 14 May 2015 15:12:45 -0700 Subject: Add ga-beacon analytics to gendocs scripts hack/run-gendocs.sh puts ga-beacon analytics link into all md files, hack/verify-gendocs.sh verifies presence of link. --- README.md | 3 +++ access.md | 3 +++ admission_control.md | 3 +++ admission_control_limit_range.md | 3 +++ admission_control_resource_quota.md | 3 +++ architecture.md | 3 +++ clustering.md | 3 +++ clustering/README.md | 4 +++- command_execution_port_forwarding.md | 4 +++- event_compression.md | 3 +++ identifiers.md | 3 +++ namespaces.md | 4 +++- networking.md | 3 +++ persistent-storage.md | 3 +++ principles.md | 3 +++ secrets.md | 3 +++ security.md | 3 +++ security_context.md | 3 +++ service_accounts.md | 3 +++ simple-rolling-update.md | 3 +++ 20 files changed, 60 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index cda831a4..b70c5615 100644 --- a/README.md +++ b/README.md @@ -15,3 +15,6 @@ A single Kubernetes cluster is not intended to span multiple availability zones. Finally, Kubernetes aspires to be an extensible, pluggable, building-block OSS platform and toolkit. Therefore, architecturally, we want Kubernetes to be built as a collection of pluggable components and layers, with the ability to use alternative schedulers, controllers, storage systems, and distribution mechanisms, and we're evolving its current code in that direction. Furthermore, we want others to be able to extend Kubernetes functionality, such as with higher-level PaaS functionality or multi-cluster layers, without modification of core Kubernetes source. Therefore, its API isn't just (or even necessarily mainly) targeted at end users, but at tool and extension developers. Its APIs are intended to serve as the foundation for an open ecosystem of tools, automation systems, and higher-level API layers. Consequently, there are no "internal" inter-component APIs. All APIs are visible and available, including the APIs used by the scheduler, the node controller, the replication-controller manager, Kubelet's API, etc. There's no glass to break -- in order to handle more complex use cases, one can just access the lower-level APIs in a fully transparent, composable manner. For more about the Kubernetes architecture, see [architecture](architecture.md). + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/README.md?pixel)]() diff --git a/access.md b/access.md index 9de4d6c8..8fd09703 100644 --- a/access.md +++ b/access.md @@ -246,3 +246,6 @@ Initial implementation: Improvements: - API server does logging instead. - Policies to drop logging for high rate trusted API calls, or by users performing audit or other sensitive functions. + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/access.md?pixel)]() diff --git a/admission_control.md b/admission_control.md index 1e1c1e53..749e949e 100644 --- a/admission_control.md +++ b/admission_control.md @@ -77,3 +77,6 @@ will ensure the following: 6. Object is persisted If at any step, there is an error, the request is canceled. + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/admission_control.md?pixel)]() diff --git a/admission_control_limit_range.md b/admission_control_limit_range.md index 3f2ccd7b..daddb425 100644 --- a/admission_control_limit_range.md +++ b/admission_control_limit_range.md @@ -130,3 +130,6 @@ In the current proposal, the **LimitRangeItem** matches purely on **LimitRangeIt It is expected we will want to define limits for particular pods or containers by name/uid and label/field selector. To make a **LimitRangeItem** more restrictive, we will intend to add these additional restrictions at a future point in time. + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/admission_control_limit_range.md?pixel)]() diff --git a/admission_control_resource_quota.md b/admission_control_resource_quota.md index ebad0728..b2dfbe85 100644 --- a/admission_control_resource_quota.md +++ b/admission_control_resource_quota.md @@ -151,3 +151,6 @@ replicationcontrollers 5 20 resourcequotas 1 1 services 3 5 ``` + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/admission_control_resource_quota.md?pixel)]() diff --git a/architecture.md b/architecture.md index 3f021aaf..c50cfe0d 100644 --- a/architecture.md +++ b/architecture.md @@ -42,3 +42,6 @@ The scheduler binds unscheduled pods to nodes via the `/binding` API. The schedu All other cluster-level functions are currently performed by the Controller Manager. For instance, `Endpoints` objects are created and updated by the endpoints controller, and nodes are discovered, managed, and monitored by the node controller. These could eventually be split into separate components to make them independently pluggable. The [`replicationController`](../replication-controller.md) is a mechanism that is layered on top of the simple [`pod`](../pods.md) API. We eventually plan to port it to a generic plug-in mechanism, once one is implemented. + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/architecture.md?pixel)]() diff --git a/clustering.md b/clustering.md index f447ef10..d57d631d 100644 --- a/clustering.md +++ b/clustering.md @@ -58,3 +58,6 @@ This diagram dynamic clustering using the bootstrap API endpoint. That API endp This flow has the admin manually approving the kubelet signing requests. This is the `queue` policy defined above.This manual intervention could be replaced by code that can verify the signing requests via other means. ![Dynamic Sequence Diagram](clustering/dynamic.png) + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/clustering.md?pixel)]() diff --git a/clustering/README.md b/clustering/README.md index 7e9d79c8..09d2c4e1 100644 --- a/clustering/README.md +++ b/clustering/README.md @@ -23,4 +23,6 @@ If you are using boot2docker and get warnings about clock skew (or if things are ## Automatically rebuild on file changes -If you have the fswatch utility installed, you can have it monitor the file system and automatically rebuild when files have changed. Just do a `make watch`. \ No newline at end of file +If you have the fswatch utility installed, you can have it monitor the file system and automatically rebuild when files have changed. Just do a `make watch`. + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/clustering/README.md?pixel)]() diff --git a/command_execution_port_forwarding.md b/command_execution_port_forwarding.md index 3b9aeec7..3e548d40 100644 --- a/command_execution_port_forwarding.md +++ b/command_execution_port_forwarding.md @@ -141,4 +141,6 @@ functionality. We need to make sure that users are not allowed to execute remote commands or do port forwarding to containers they aren't allowed to access. -Additional work is required to ensure that multiple command execution or port forwarding connections from different clients are not able to see each other's data. This can most likely be achieved via SELinux labeling and unique process contexts. \ No newline at end of file +Additional work is required to ensure that multiple command execution or port forwarding connections from different clients are not able to see each other's data. This can most likely be achieved via SELinux labeling and unique process contexts. + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/command_execution_port_forwarding.md?pixel)]() diff --git a/event_compression.md b/event_compression.md index 2523c859..db0337f0 100644 --- a/event_compression.md +++ b/event_compression.md @@ -76,3 +76,6 @@ This demonstrates what would have been 20 separate entries (indicating schedulin * PR [#4206](https://github.com/GoogleCloudPlatform/kubernetes/issues/4206): Modify Event struct to allow compressing multiple recurring events in to a single event * PR [#4306](https://github.com/GoogleCloudPlatform/kubernetes/issues/4306): Compress recurring events in to a single event to optimize etcd storage * PR [#4444](https://github.com/GoogleCloudPlatform/kubernetes/pull/4444): Switch events history to use LRU cache instead of map + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/event_compression.md?pixel)]() diff --git a/identifiers.md b/identifiers.md index d2e5d5c7..b75577c2 100644 --- a/identifiers.md +++ b/identifiers.md @@ -88,3 +88,6 @@ objectives. 1. Each container is started up with enough metadata to distinguish the pod from whence it came. 2. Each attempt to run a container is assigned a UID (a string) that is unique across time. 1. This may correspond to Docker's container ID. + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/identifiers.md?pixel)]() diff --git a/namespaces.md b/namespaces.md index 0e89bf56..ade07e11 100644 --- a/namespaces.md +++ b/namespaces.md @@ -332,4 +332,6 @@ has a deletion timestamp and that its list of finalizers is empty. As a result, content associated from that namespace has been purged. It performs a final DELETE action to remove that Namespace from the storage. -At this point, all content associated with that Namespace, and the Namespace itself are gone. \ No newline at end of file +At this point, all content associated with that Namespace, and the Namespace itself are gone. + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/namespaces.md?pixel)]() diff --git a/networking.md b/networking.md index d664806f..f351629e 100644 --- a/networking.md +++ b/networking.md @@ -106,3 +106,6 @@ Another approach could be to create a new host interface alias for each pod, if ### IPv6 IPv6 would be a nice option, also, but we can't depend on it yet. Docker support is in progress: [Docker issue #2974](https://github.com/dotcloud/docker/issues/2974), [Docker issue #6923](https://github.com/dotcloud/docker/issues/6923), [Docker issue #6975](https://github.com/dotcloud/docker/issues/6975). Additionally, direct ipv6 assignment to instances doesn't appear to be supported by major cloud providers (e.g., AWS EC2, GCE) yet. We'd happily take pull requests from people running Kubernetes on bare metal, though. :-) + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/networking.md?pixel)]() diff --git a/persistent-storage.md b/persistent-storage.md index fb53ad10..b52e6b71 100644 --- a/persistent-storage.md +++ b/persistent-storage.md @@ -212,3 +212,6 @@ cluster/kubectl.sh delete pvc myclaim-1 The ```PersistentVolumeClaimBinder``` will reconcile this by removing the claim reference from the PV and change the PVs status to 'Released'. Admins can script the recycling of released volumes. Future dynamic provisioners will understand how a volume should be recycled. + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/persistent-storage.md?pixel)]() diff --git a/principles.md b/principles.md index 499b540b..cf8833a4 100644 --- a/principles.md +++ b/principles.md @@ -53,3 +53,6 @@ TODO ## General principles * [Eric Raymond's 17 UNIX rules](https://en.wikipedia.org/wiki/Unix_philosophy#Eric_Raymond.E2.80.99s_17_Unix_Rules) + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/principles.md?pixel)]() diff --git a/secrets.md b/secrets.md index e07f271d..119c673a 100644 --- a/secrets.md +++ b/secrets.md @@ -558,3 +558,6 @@ source. Both containers will have the following files present on their filesyst /etc/secret-volume/username /etc/secret-volume/password + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/secrets.md?pixel)]() diff --git a/security.md b/security.md index b446f66c..4c446d10 100644 --- a/security.md +++ b/security.md @@ -115,3 +115,6 @@ Both the Kubelet and Kube Proxy need information related to their specific roles The controller manager for Replication Controllers and other future controllers act on behalf of a user via delegation to perform automated maintenance on Kubernetes resources. Their ability to access or modify resource state should be strictly limited to their intended duties and they should be prevented from accessing information not pertinent to their role. For example, a replication controller needs only to create a copy of a known pod configuration, to determine the running state of an existing pod, or to delete an existing pod that it created - it does not need to know the contents or current state of a pod, nor have access to any data in the pods attached volumes. The Kubernetes pod scheduler is responsible for reading data from the pod to fit it onto a minion in the cluster. At a minimum, it needs access to view the ID of a pod (to craft the binding), its current state, any resource information necessary to identify placement, and other data relevant to concerns like anti-affinity, zone or region preference, or custom logic. It does not need the ability to modify pods or see other resources, only to create bindings. It should not need the ability to delete bindings unless the scheduler takes control of relocating components on failed hosts (which could be implemented by a separate component that can delete bindings but not create them). The scheduler may need read access to user or project-container information to determine preferential location (underspecified at this time). + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/security.md?pixel)]() diff --git a/security_context.md b/security_context.md index 5f5376b0..fdacb173 100644 --- a/security_context.md +++ b/security_context.md @@ -155,3 +155,6 @@ has defined capabilities or privileged. Contexts that attempt to define a UID o will be denied by default. In the future the admission plugin will base this decision upon configurable policies that reside within the [service account](https://github.com/GoogleCloudPlatform/kubernetes/pull/2297). + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/security_context.md?pixel)]() diff --git a/service_accounts.md b/service_accounts.md index 5eaa0d99..9e6bc099 100644 --- a/service_accounts.md +++ b/service_accounts.md @@ -162,3 +162,6 @@ to services in the same namespace and read-write access to events in that namesp Finally, it may provide an interface to automate creation of new serviceAccounts. In that case, the user may want to GET serviceAccounts to see what has been created. + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/service_accounts.md?pixel)]() diff --git a/simple-rolling-update.md b/simple-rolling-update.md index c5667b44..fed1b84f 100644 --- a/simple-rolling-update.md +++ b/simple-rolling-update.md @@ -89,3 +89,6 @@ then ```foo-next``` is synthesized using the pattern ```- Date: Thu, 14 May 2015 15:12:45 -0700 Subject: Add ga-beacon analytics to gendocs scripts hack/run-gendocs.sh puts ga-beacon analytics link into all md files, hack/verify-gendocs.sh verifies presence of link. --- autoscaling.md | 3 +++ federation.md | 3 +++ high-availability.md | 3 +++ 3 files changed, 9 insertions(+) diff --git a/autoscaling.md b/autoscaling.md index c1d1578b..a2838743 100644 --- a/autoscaling.md +++ b/autoscaling.md @@ -252,3 +252,6 @@ to prevent this, deployment orchestration should notify the auto-scaler that a d temporarily disable negative decrement thresholds until the deployment process is completed. It is more important for an auto-scaler to be able to grow capacity during a deployment than to shrink the number of instances precisely. + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/autoscaling.md?pixel)]() diff --git a/federation.md b/federation.md index e261833e..a2d30017 100644 --- a/federation.md +++ b/federation.md @@ -429,3 +429,6 @@ does the zookeeper config look like for N=3 across 3 AZs -- and how does each replica find the other replicas and how do clients find their primary zookeeper replica? And now how do I do a shared, highly available redis database? + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/federation.md?pixel)]() diff --git a/high-availability.md b/high-availability.md index 647c9562..909903a2 100644 --- a/high-availability.md +++ b/high-availability.md @@ -44,3 +44,6 @@ There is a short window after a new master acquires the lease, during which data ## Open Questions: * Is there a desire to keep track of all nodes for a specific component type? + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/high-availability.md?pixel)]() -- cgit v1.2.3 From cf0bda9102a72d6fb78c716882dc280f2394abfe Mon Sep 17 00:00:00 2001 From: Chao Xu Date: Fri, 15 May 2015 15:28:28 -0700 Subject: update docs/devel flaky-tests to v1beta3 --- flaky-tests.md | 42 ++++++++++++++++++------------------------ 1 file changed, 18 insertions(+), 24 deletions(-) diff --git a/flaky-tests.md b/flaky-tests.md index e352e110..a7ea75f8 100644 --- a/flaky-tests.md +++ b/flaky-tests.md @@ -11,33 +11,27 @@ There is a testing image ```brendanburns/flake``` up on the docker hub. We will Create a replication controller with the following config: ```yaml -id: flakecontroller +apiVersion: v1beta3 kind: ReplicationController -apiVersion: v1beta1 -desiredState: +metadata: + name: flakecontroller +spec: replicas: 24 - replicaSelector: - name: flake - podTemplate: - desiredState: - manifest: - version: v1beta1 - id: "" - volumes: [] - containers: - - name: flake - image: brendanburns/flake - env: - - name: TEST_PACKAGE - value: pkg/tools - - name: REPO_SPEC - value: https://github.com/GoogleCloudPlatform/kubernetes - restartpolicy: {} - labels: - name: flake -labels: - name: flake + template: + metadata: + labels: + name: flake + spec: + containers: + - name: flake + image: brendanburns/flake + env: + - name: TEST_PACKAGE + value: pkg/tools + - name: REPO_SPEC + value: https://github.com/GoogleCloudPlatform/kubernetes ``` +Note that we omit the labels and the selector fields of the replication controller, because they will be populated from the labels field of the pod template by default. ```./cluster/kubectl.sh create -f controller.yaml``` -- cgit v1.2.3 From 31b44ff68f996c4004bec2171c2a8448a942005b Mon Sep 17 00:00:00 2001 From: Eric Tune Date: Tue, 28 Apr 2015 18:10:59 -0700 Subject: Add API change suggestions. --- api_changes.md | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/api_changes.md b/api_changes.md index c2932215..6e29c3f6 100644 --- a/api_changes.md +++ b/api_changes.md @@ -12,7 +12,7 @@ not all API changes will need all of these steps. ## Operational overview -It's important to have a high level understanding of the API system used in +It is important to have a high level understanding of the API system used in Kubernetes in order to navigate the rest of this document. As mentioned above, the internal representation of an API object is decoupled @@ -24,13 +24,13 @@ a GET involves a great deal of machinery. The conversion process is logically a "star" with the internal form at the center. Every versioned API can be converted to the internal form (and vice-versa), but versioned APIs do not convert to other versioned APIs directly. -This sounds like a heavy process, but in reality we don't intend to keep more +This sounds like a heavy process, but in reality we do not intend to keep more than a small number of versions alive at once. While all of the Kubernetes code operates on the internal structures, they are always converted to a versioned form before being written to storage (disk or etcd) or being sent over a wire. Clients should consume and operate on the versioned APIs exclusively. -To demonstrate the general process, let's walk through a (hypothetical) example: +To demonstrate the general process, here is a (hypothetical) example: 1. A user POSTs a `Pod` object to `/api/v7beta1/...` 2. The JSON is unmarshalled into a `v7beta1.Pod` structure @@ -176,6 +176,12 @@ If your change includes new fields for which you will need default values, you need to add cases to `pkg/api//defaults.go`. Of course, since you have added code, you have to add a test: `pkg/api//defaults_test.go`. +Do use pointers to scalars when you need to distinguish between an unset value +and an an automatic zero value. For example, +`PodSpec.TerminationGracePeriodSeconds` is defined as `*int64` the go type +definition. A zero value means 0 seconds, and a nil value asks the system to +pick a default. + Don't forget to run the tests! ### Edit conversion.go -- cgit v1.2.3 From 31fce228e472c3a8c84d992552329343b731d1c2 Mon Sep 17 00:00:00 2001 From: Paul Morie Date: Tue, 12 May 2015 19:48:29 -0400 Subject: Add variable expansion and design doc --- expansion.md | 407 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 407 insertions(+) create mode 100644 expansion.md diff --git a/expansion.md b/expansion.md new file mode 100644 index 00000000..00c32797 --- /dev/null +++ b/expansion.md @@ -0,0 +1,407 @@ +# Variable expansion in pod command, args, and env + +## Abstract + +A proposal for the expansion of environment variables using a simple `$(var)` syntax. + +## Motivation + +It is extremely common for users to need to compose environment variables or pass arguments to +their commands using the values of environment variables. Kubernetes should provide a facility for +the 80% cases in order to decrease coupling and the use of workarounds. + +## Goals + +1. Define the syntax format +2. Define the scoping and ordering of substitutions +3. Define the behavior for unmatched variables +4. Define the behavior for unexpected/malformed input + +## Constraints and Assumptions + +* This design should describe the simplest possible syntax to accomplish the use-cases +* Expansion syntax will not support more complicated shell-like behaviors such as default values + (viz: `$(VARIABLE_NAME:"default")`), inline substitution, etc. + +## Use Cases + +1. As a user, I want to compose new environment variables for a container using a substitution + syntax to reference other variables in the container's environment and service environment + variables +1. As a user, I want to substitute environment variables into a container's command +1. As a user, I want to do the above without requiring the container's image to have a shell +1. As a user, I want to be able to specify a default value for a service variable which may + not exist +1. As a user, I want to see an event associated with the pod if an expansion fails (ie, references + variable names that cannot be expanded) + +### Use Case: Composition of environment variables + +Currently, containers are injected with docker-style environment variables for the services in +their pod's namespace. There are several variables for each service, but users routinely need +to compose URLs based on these variables because there is not a variable for the exact format +they need. Users should be able to build new environment variables with the exact format they need. +Eventually, it should also be possible to turn off the automatic injection of the docker-style +variables into pods and let the users consume the exact information they need via the downward API +and composition. + +#### Expanding expanded variables + +It should be possible to reference an variable which is itself the result of an expansion, if the +referenced variable is declared in the container's environment prior to the one referencing it. +Put another way -- a container's environment is expanded in order, and expanded variables are +available to subsequent expansions. + +### Use Case: Variable expansion in command + +Users frequently need to pass the values of environment variables to a container's command. +Currently, Kubernetes does not perform any expansion of varibles. The workaround is to invoke a +shell in the container's command and have the shell perform the substitution, or to write a wrapper +script that sets up the environment and runs the command. This has a number of drawbacks: + +1. Solutions that require a shell are unfriendly to images that do not contain a shell +2. Wrapper scripts make it harder to use images as base images +3. Wrapper scripts increase coupling to kubernetes + +Users should be able to do the 80% case of variable expansion in command without writing a wrapper +script or adding a shell invocation to their containers' commands. + +### Use Case: Images without shells + +The current workaround for variable expansion in a container's command requires the container's +image to have a shell. This is unfriendly to images that do not contain a shell (`scratch` images, +for example). Users should be able to perform the other use-cases in this design without regard to +the content of their images. + +### Use Case: See an event for incomplete expansions + +It is possible that a container with incorrect variable values or command line may continue to run +for a long period of time, and that the end-user would have no visual or obvious warning of the +incorrect configuration. If the kubelet creates an event when an expansion references a variable +that cannot be expanded, it will help users quickly detect problems with expansions. + +## Design Considerations + +### What features should be supported? + +In order to limit complexity, we want to provide the right amount of functionality so that the 80% +cases can be realized and nothing more. We felt that the essentials boiled down to: + +1. Ability to perform direct expansion of variables in a string +2. Ability to specify default values via a prioritized mapping function but without support for + defaults as a syntax-level feature + +### What should the syntax be? + +The exact syntax for variable expansion has a large impact on how users perceive and relate to the +feature. We considered implementing a very restrictive subset of the shell `${var}` syntax. This +syntax is an attractive option on some level, because many people are familiar with it. However, +this syntax also has a large number of lesser known features such as the ability to provide +default values for unset variables, perform inline substitution, etc. + +In the interest of preventing conflation of the expansion feature in Kubernetes with the shell +feature, we chose a different syntax similar to the one in Makefiles, `$(var)`. We also chose not +to support the bar `$var` format, since it is not required to implement the required use-cases. + +Nested references, ie, variable expansion within variable names, are not supported. + +#### How should unmatched references be treated? + +Ideally, it should be extremely clear when a variable reference couldn't be expanded. We decided +the best experience for unmatched variable references would be to have the entire reference, syntax +included, show up in the output. As an example, if the reference `$(VARIABLE_NAME)` cannot be +expanded, then `$(VARIABLE_NAME)` should be present in the output. + +#### Escaping the operator + +Although the `$(var)` syntax does overlap with the `$(command)` form of command substitution +supported by many shells, because unexpanded variables are present verbatim in the output, we +expect this will not present a problem to many users. If there is a collision between a varible +name and command substitution syntax, the syntax can be escaped with the form `$$(VARIABLE_NAME)`, +which will evaluate to `$(VARIABLE_NAME)` whether `VARIABLE_NAME` can be expanded or not. + +## Design + +This design encompasses the variable expansion syntax and specification and the changes needed to +incorporate the expansion feature into the container's environment and command. + +### Syntax and expansion mechanics + +This section describes the expansion syntax, evaluation of variable values, and how unexpected or +malformed inputs are handled. + +#### Syntax + +The inputs to the expansion feature are: + +1. A utf-8 string (the input string) which may contain variable references +2. A function (the mapping function) that maps the name of a variable to the variable's value, of + type `func(string) string` + +Variable references in the input string are indicated exclusively with the syntax +`$()`. The syntax tokens are: + +- `$`: the operator +- `(`: the reference opener +- `)`: the reference closer + +The operator has no meaning unless accompanied by the reference opener and closer tokens. The +operator can be escaped using `$$`. One literal `$` will be emitted for each `$$` in the input. + +The reference opener and closer characters have no meaning when not part of a variable reference. +If a variable reference is malformed, viz: `$(VARIABLE_NAME` without a closing expression, the +operator and expression opening characters are treated as ordinary characters without special +meanings. + +#### Scope and ordering of substitutions + +The scope in which variable references are expanded is defined by the mapping function. Within the +mapping function, any arbitrary strategy may be used to determine the value of a variable name. +The most basic implementation of a mapping function is to use a `map[string]string` to lookup the +value of a variable. + +In order to support default values for variables like service variables presented by the kubelet, +which may not be bound because the service that provides them does not yet exist, there should be a +mapping function that uses a list of `map[string]string` like: + +```go +func MakeMappingFunc(maps ...map[string]string) func(string) string { + return func(input string) string { + for _, context := range maps { + val, ok := context[input] + if ok { + return val + } + } + + return "" + } +} + +// elsewhere +containerEnv := map[string]string{ + "FOO": "BAR", + "ZOO": "ZAB", + "SERVICE2_HOST": "some-host", +} + +serviceEnv := map[string]string{ + "SERVICE_HOST": "another-host", + "SERVICE_PORT": "8083", +} + +// single-map variation +mapping := MakeMappingFunc(containerEnv) + +// default variables not found in serviceEnv +mappingWithDefaults := MakeMappingFunc(serviceEnv, containerEnv) +``` + +### Implementation changes + +The necessary changes to implement this functionality are: + +1. Add a new interface, `ObjectEventRecorder`, which is like the `EventRecorder` interface, but + scoped to a single object, and a function that returns an `ObjectEventRecorder` given an + `ObjectReference` and an `EventRecorder` +2. Introduce `third_party/golang/expansion` package that provides: + 1. An `Expand(string, func(string) string) string` function + 2. A `MappingFuncFor(ObjectEventRecorder, ...map[string]string) string` function +3. Add a new EnvVarSource for expansions and associated tests +4. Make the kubelet expand environment correctly +5. Make the kubelet expand command correctly + +#### Event Recording + +In order to provide an event when an expansion references undefined variables, the mapping function +must be able to create an event. In order to facilitate this, we should create a new interface in +the `api/client/record` package which is similar to `EventRecorder`, but scoped to a single object: + +```go +// ObjectEventRecorder knows how to record events about a single object. +type ObjectEventRecorder interface { + // Event constructs an event from the given information and puts it in the queue for sending. + // 'reason' is the reason this event is generated. 'reason' should be short and unique; it will + // be used to automate handling of events, so imagine people writing switch statements to + // handle them. You want to make that easy. + // 'message' is intended to be human readable. + // + // The resulting event will be created in the same namespace as the reference object. + Event(reason, message string) + + // Eventf is just like Event, but with Sprintf for the message field. + Eventf(reason, messageFmt string, args ...interface{}) + + // PastEventf is just like Eventf, but with an option to specify the event's 'timestamp' field. + PastEventf(timestamp util.Time, reason, messageFmt string, args ...interface{}) +} +``` + +There should also be a function that can construct an `ObjectEventRecorder` from a `runtime.Object` +and an `EventRecorder`: + +```go +type objectRecorderImpl struct { + object runtime.Object + recorder EventRecorder +} + +func (r *objectRecorderImpl) Event(reason, message string) { + r.recorder.Event(r.object, reason, message) +} + +func ObjectEventRecorderFor(object runtime.Object, recorder EventRecorder) ObjectEventRecorder { + return &objectRecorderImpl{object, recorder} +} +``` + +#### Expansion package + +The expansion package should provide two methods: + +```go +// MappingFuncFor returns a mapping function for use with Expand that +// implements the expansion semantics defined in the expansion spec; it +// returns the input string wrapped in the expansion syntax if no mapping +// for the input is found. If no expansion is found for a key, an event +// is raised on the given recorder. +func MappingFuncFor(recorder record.ObjectEventRecorder, context ...map[string]string) func(string) string { + // ... +} + +// Expand replaces variable references in the input string according to +// the expansion spec using the given mapping function to resolve the +// values of variables. +func Expand(input string, mapping func(string) string) string { + // ... +} +``` + +#### Expansion `EnvVarSource` + +In order to avoid changing the existing behavior of the `EnvVar.Value` field, there should be a new +`EnvVarSource` that represents a variable expansion that an env var's value should come from: + +```go +// EnvVarSource represents a source for the value of an EnvVar. +type EnvVarSource struct { + // Other fields omitted + + Expansion *EnvVarExpansion +} + +type EnvVarExpansion struct { + // The input string to be expanded + Expand string +} +``` + +#### Kubelet changes + +The Kubelet should change to: + +1. Correctly expand environment variables with `Expansion` sources +2. Correctly expand references in the Command and Args + +### Examples + +#### Inputs and outputs + +These examples are in the context of the mapping: + +| Name | Value | +|-------------|------------| +| `VAR_A` | `"A"` | +| `VAR_B` | `"B"` | +| `VAR_C` | `"C"` | +| `VAR_REF` | `$(VAR_A)` | +| `VAR_EMPTY` | `""` | + +No other variables are defined. + +| Input | Result | +|--------------------------------|----------------------------| +| `"$(VAR_A)"` | `"A"` | +| `"___$(VAR_B)___"` | `"___B___"` | +| `"___$(VAR_C)"` | `"___C"` | +| `"$(VAR_A)-$(VAR_A)"` | `"A-A"` | +| `"$(VAR_A)-1"` | `"A-1"` | +| `"$(VAR_A)_$(VAR_B)_$(VAR_C)"` | `"A_B_C"` | +| `"$$(VAR_B)_$(VAR_A)"` | `"$(VAR_B)_A"` | +| `"$$(VAR_A)_$$(VAR_B)"` | `"$(VAR_A)_$(VAR_B)"` | +| `"f000-$$VAR_A"` | `"f000-$VAR_A"` | +| `"foo\\$(VAR_C)bar"` | `"foo\Cbar"` | +| `"foo\\\\$(VAR_C)bar"` | `"foo\\Cbar"` | +| `"foo\\\\\\\\$(VAR_A)bar"` | `"foo\\\\Abar"` | +| `"$(VAR_A$(VAR_B))"` | `"$(VAR_A$(VAR_B))"` | +| `"$(VAR_A$(VAR_B)"` | `"$(VAR_A$(VAR_B)"` | +| `"$(VAR_REF)"` | `"$(VAR_A)"` | +| `"%%$(VAR_REF)--$(VAR_REF)%%"` | `"%%$(VAR_A)--$(VAR_A)%%"` | +| `"foo$(VAR_EMPTY)bar"` | `"foobar"` | +| `"foo$(VAR_Awhoops!"` | `"foo$(VAR_Awhoops!"` | +| `"f00__(VAR_A)__"` | `"f00__(VAR_A)__"` | +| `"$?_boo_$!"` | `"$?_boo_$!"` | +| `"$VAR_A"` | `"$VAR_A"` | +| `"$(VAR_DNE)"` | `"$(VAR_DNE)"` | +| `"$$$$$$(BIG_MONEY)"` | `"$$$(BIG_MONEY)"` | +| `"$$$$$$(VAR_A)"` | `"$$$(VAR_A)"` | +| `"$$$$$$$(GOOD_ODDS)"` | `"$$$$(GOOD_ODDS)"` | +| `"$$$$$$$(VAR_A)"` | `"$$$A"` | +| `"$VAR_A)"` | `"$VAR_A)"` | +| `"${VAR_A}"` | `"${VAR_A}"` | +| `"$(VAR_B)_______$(A"` | `"B_______$(A"` | +| `"$(VAR_C)_______$("` | `"C_______$("` | +| `"$(VAR_A)foobarzab$"` | `"Afoobarzab$"` | +| `"foo-\\$(VAR_A"` | `"foo-\$(VAR_A"` | +| `"--$($($($($--"` | `"--$($($($($--"` | +| `"$($($($($--foo$("` | `"$($($($($--foo$("` | +| `"foo0--$($($($("` | `"foo0--$($($($("` | +| `"$(foo$$var)` | `$(foo$$var)` | + +#### In a pod: building a URL + +Notice the `$(var)` syntax. + +```yaml +apiVersion: v1beta3 +kind: Pod +metadata: + name: expansion-pod +spec: + containers: + - name: test-container + image: gcr.io/google_containers/busybox + command: [ "/bin/sh", "-c", "env" ] + env: + - name: PUBLIC_URL + valueFrom: + expansion: + expand: "http://$(GITSERVER_SERVICE_HOST):$(GITSERVER_SERVICE_PORT)" + restartPolicy: Never +``` + +#### In a pod: building a URL using downward API + +```yaml +apiVersion: v1beta3 +kind: Pod +metadata: + name: expansion-pod +spec: + containers: + - name: test-container + image: gcr.io/google_containers/busybox + command: [ "/bin/sh", "-c", "env" ] + env: + - name: POD_NAMESPACE + valueFrom: + fieldRef: + fieldPath: "metadata.namespace" + - name: PUBLIC_URL + valueFrom: + expansion: + expand: "http://gitserver.$(POD_NAMESPACE):$(SERVICE_PORT)" + restartPolicy: Never +``` + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/expansion.md?pixel)]() -- cgit v1.2.3 From aeb81528075456697287e110e1efd4558491de57 Mon Sep 17 00:00:00 2001 From: Vishnu Kannan Date: Tue, 12 May 2015 15:13:03 -0700 Subject: Updating namespaces to be DNS labels instead of DNS names. --- namespaces.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/namespaces.md b/namespaces.md index ade07e11..c4a1a90d 100644 --- a/namespaces.md +++ b/namespaces.md @@ -51,7 +51,7 @@ type Namespace struct { } ``` -A *Namespace* name is a DNS compatible subdomain. +A *Namespace* name is a DNS compatible label. A *Namespace* must exist prior to associating content with it. -- cgit v1.2.3 From bb07a8b81e212671a8c398723b7941149e46e952 Mon Sep 17 00:00:00 2001 From: Tim Hockin Date: Thu, 14 May 2015 17:38:08 -0700 Subject: Don't rename api imports in conversions --- api_changes.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/api_changes.md b/api_changes.md index 6e29c3f6..6eff094b 100644 --- a/api_changes.md +++ b/api_changes.md @@ -255,7 +255,7 @@ regenerate auto-generated ones. To regenerate them: ``` - replace all conversion functions (convert\* functions) in the `pkg/api//conversion_generated.go` with the contents of \ - - replace arguments of `newer.Scheme.AddGeneratedConversionFuncs` in the + - replace arguments of `api.Scheme.AddGeneratedConversionFuncs` in the `pkg/api//conversion_generated.go` with the contents of \ Unsurprisingly, adding manually written conversion also requires you to add tests to -- cgit v1.2.3 From 3c173916ea41e7bb03bae1af85eefa4bc027c985 Mon Sep 17 00:00:00 2001 From: Wojciech Tyczynski Date: Tue, 19 May 2015 17:47:03 +0200 Subject: Automatically generate conversions --- api_changes.md | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/api_changes.md b/api_changes.md index 6eff094b..4627c6df 100644 --- a/api_changes.md +++ b/api_changes.md @@ -251,12 +251,8 @@ Once all the necessary manually written conversions are added, you need to regenerate auto-generated ones. To regenerate them: - run ``` - $ go run cmd/kube-conversion/conversion.go -v -f -n + $ hack/update-generated-conversions.sh ``` - - replace all conversion functions (convert\* functions) in the - `pkg/api//conversion_generated.go` with the contents of \ - - replace arguments of `api.Scheme.AddGeneratedConversionFuncs` in the - `pkg/api//conversion_generated.go` with the contents of \ Unsurprisingly, adding manually written conversion also requires you to add tests to `pkg/api//conversion_test.go`. -- cgit v1.2.3 From c817b2f96f0640e57d9fb5209e152a9dfefae11d Mon Sep 17 00:00:00 2001 From: Chao Xu Date: Wed, 20 May 2015 17:17:01 -0700 Subject: in docs, update replicationController to replicationcontroller --- developer-guides/vagrant.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/developer-guides/vagrant.md b/developer-guides/vagrant.md index f958b124..d0c07f3f 100644 --- a/developer-guides/vagrant.md +++ b/developer-guides/vagrant.md @@ -184,7 +184,7 @@ NAME IMAGE(S) HOST LABELS STATUS $ cluster/kubectl.sh get services NAME LABELS SELECTOR IP PORT -$ cluster/kubectl.sh get replicationControllers +$ cluster/kubectl.sh get replicationcontrollers NAME IMAGE(S SELECTOR REPLICAS ``` @@ -224,7 +224,7 @@ kubernetes-minion-1: fa0e29c94501 kubernetes/pause:latest "/pause" 8 minutes ago Up 8 minutes 0.0.0.0:8080->80/tcp k8s--net.a90e7ce4--7813c8bd_-_3ffe_-_11e4_-_9036_-_0800279696e1.etcd--7813c8bd_-_3ffe_-_11e4_-_9036_-_0800279696e1--baf5b21b ``` -Going back to listing the pods, services and replicationControllers, you now have: +Going back to listing the pods, services and replicationcontrollers, you now have: ``` $ cluster/kubectl.sh get pods @@ -236,7 +236,7 @@ NAME IMAGE(S) HOST $ cluster/kubectl.sh get services NAME LABELS SELECTOR IP PORT -$ cluster/kubectl.sh get replicationControllers +$ cluster/kubectl.sh get replicationcontrollers NAME IMAGE(S SELECTOR REPLICAS myNginx nginx name=my-nginx 3 ``` -- cgit v1.2.3 From 93f791e943a103efca378ca82fcfca1cada7f3e7 Mon Sep 17 00:00:00 2001 From: Chao Xu Date: Wed, 20 May 2015 17:17:01 -0700 Subject: in docs, update replicationController to replicationcontroller --- architecture.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/architecture.md b/architecture.md index c50cfe0d..ebfb4964 100644 --- a/architecture.md +++ b/architecture.md @@ -41,7 +41,7 @@ The scheduler binds unscheduled pods to nodes via the `/binding` API. The schedu All other cluster-level functions are currently performed by the Controller Manager. For instance, `Endpoints` objects are created and updated by the endpoints controller, and nodes are discovered, managed, and monitored by the node controller. These could eventually be split into separate components to make them independently pluggable. -The [`replicationController`](../replication-controller.md) is a mechanism that is layered on top of the simple [`pod`](../pods.md) API. We eventually plan to port it to a generic plug-in mechanism, once one is implemented. +The [`replicationcontroller`](../replication-controller.md) is a mechanism that is layered on top of the simple [`pod`](../pods.md) API. We eventually plan to port it to a generic plug-in mechanism, once one is implemented. [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/architecture.md?pixel)]() -- cgit v1.2.3 From b4aee3255a668603c1f1b37dac0997e4815cb491 Mon Sep 17 00:00:00 2001 From: Chao Xu Date: Wed, 20 May 2015 16:54:53 -0700 Subject: in docs, update "minions" to "nodes" --- access.md | 4 ++-- security.md | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/access.md b/access.md index 8fd09703..647ce552 100644 --- a/access.md +++ b/access.md @@ -65,7 +65,7 @@ Cluster in Large organization: Org-run cluster: - organization that runs K8s master components is same as the org that runs apps on K8s. - - Minions may be on-premises VMs or physical machines; Cloud VMs; or a mix. + - Nodes may be on-premises VMs or physical machines; Cloud VMs; or a mix. Hosted cluster: - Offering K8s API as a service, or offering a Paas or Saas built on K8s @@ -223,7 +223,7 @@ Initially: Improvements: - allow one namespace to charge the quota for one or more other namespaces. This would be controlled by a policy which allows changing a billing_namespace= label on an object. - allow quota to be set by namespace owners for (namespace x label) combinations (e.g. let "webserver" namespace use 100 cores, but to prevent accidents, don't allow "webserver" namespace and "instance=test" use more than 10 cores. -- tools to help write consistent quota config files based on number of minions, historical namespace usages, QoS needs, etc. +- tools to help write consistent quota config files based on number of nodes, historical namespace usages, QoS needs, etc. - way for K8s Cluster Admin to incrementally adjust Quota objects. Simple profile: diff --git a/security.md b/security.md index 4c446d10..26d543c9 100644 --- a/security.md +++ b/security.md @@ -104,7 +104,7 @@ A pod runs in a *security context* under a *service account* that is defined by ### TODO: authorization, authentication -### Isolate the data store from the minions and supporting infrastructure +### Isolate the data store from the nodes and supporting infrastructure Access to the central data store (etcd) in Kubernetes allows an attacker to run arbitrary containers on hosts, to gain access to any protected information stored in either volumes or in pods (such as access tokens or shared secrets provided as environment variables), to intercept and redirect traffic from running services by inserting middlemen, or to simply delete the entire history of the custer. @@ -114,7 +114,7 @@ Both the Kubelet and Kube Proxy need information related to their specific roles The controller manager for Replication Controllers and other future controllers act on behalf of a user via delegation to perform automated maintenance on Kubernetes resources. Their ability to access or modify resource state should be strictly limited to their intended duties and they should be prevented from accessing information not pertinent to their role. For example, a replication controller needs only to create a copy of a known pod configuration, to determine the running state of an existing pod, or to delete an existing pod that it created - it does not need to know the contents or current state of a pod, nor have access to any data in the pods attached volumes. -The Kubernetes pod scheduler is responsible for reading data from the pod to fit it onto a minion in the cluster. At a minimum, it needs access to view the ID of a pod (to craft the binding), its current state, any resource information necessary to identify placement, and other data relevant to concerns like anti-affinity, zone or region preference, or custom logic. It does not need the ability to modify pods or see other resources, only to create bindings. It should not need the ability to delete bindings unless the scheduler takes control of relocating components on failed hosts (which could be implemented by a separate component that can delete bindings but not create them). The scheduler may need read access to user or project-container information to determine preferential location (underspecified at this time). +The Kubernetes pod scheduler is responsible for reading data from the pod to fit it onto a node in the cluster. At a minimum, it needs access to view the ID of a pod (to craft the binding), its current state, any resource information necessary to identify placement, and other data relevant to concerns like anti-affinity, zone or region preference, or custom logic. It does not need the ability to modify pods or see other resources, only to create bindings. It should not need the ability to delete bindings unless the scheduler takes control of relocating components on failed hosts (which could be implemented by a separate component that can delete bindings but not create them). The scheduler may need read access to user or project-container information to determine preferential location (underspecified at this time). [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/security.md?pixel)]() -- cgit v1.2.3 From 5ee2d2ea4d7222bc11a3b667d2d6b6a586ee42e4 Mon Sep 17 00:00:00 2001 From: Chao Xu Date: Thu, 21 May 2015 11:05:25 -0700 Subject: update docs/design/secrets.md to v1beta3 --- secrets.md | 229 ++++++++++++++++++++++++++++++++----------------------------- 1 file changed, 120 insertions(+), 109 deletions(-) diff --git a/secrets.md b/secrets.md index 119c673a..5f8cb501 100644 --- a/secrets.md +++ b/secrets.md @@ -389,12 +389,14 @@ To create a pod that uses an ssh key stored as a secret, we first need to create ```json { - "apiVersion": "v1beta2", "kind": "Secret", - "id": "ssh-key-secret", + "apiVersion": "v1beta3", + "metadata": { + "name": "ssh-key-secret" + }, "data": { - "id-rsa.pub": "dmFsdWUtMQ0K", - "id-rsa": "dmFsdWUtMg0KDQo=" + "id-rsa": "dmFsdWUtMg0KDQo=", + "id-rsa.pub": "dmFsdWUtMQ0K" } } ``` @@ -407,38 +409,36 @@ Now we can create a pod which references the secret with the ssh key and consume ```json { - "id": "secret-test-pod", "kind": "Pod", - "apiVersion":"v1beta2", - "labels": { - "name": "secret-test" + "apiVersion": "v1beta3", + "metadata": { + "name": "secret-test-pod", + "labels": { + "name": "secret-test" + } }, - "desiredState": { - "manifest": { - "version": "v1beta1", - "id": "secret-test-pod", - "containers": [{ + "spec": { + "volumes": [ + { + "name": "secret-volume", + "secret": { + "secretName": "ssh-key-secret" + } + } + ], + "containers": [ + { "name": "ssh-test-container", "image": "mySshImage", - "volumeMounts": [{ - "name": "secret-volume", - "mountPath": "/etc/secret-volume", - "readOnly": true - }] - }], - "volumes": [{ - "name": "secret-volume", - "source": { - "secret": { - "target": { - "kind": "Secret", - "namespace": "example", - "name": "ssh-key-secret" - } + "volumeMounts": [ + { + "name": "secret-volume", + "readOnly": true, + "mountPath": "/etc/secret-volume" } - } - }] - } + ] + } + ] } } ``` @@ -452,105 +452,116 @@ The container is then free to use the secret data to establish an ssh connection ### Use-Case: Pods with pod / test credentials -Let's compare examples where a pod consumes a secret containing prod credentials and another pod -consumes a secret with test environment credentials. +This example illustrates a pod which consumes a secret containing prod +credentials and another pod which consumes a secret with test environment +credentials. The secrets: ```json -[{ - "apiVersion": "v1beta2", - "kind": "Secret", - "id": "prod-db-secret", - "data": { - "username": "dmFsdWUtMQ0K", - "password": "dmFsdWUtMg0KDQo=" - } -}, { - "apiVersion": "v1beta2", - "kind": "Secret", - "id": "test-db-secret", - "data": { - "username": "dmFsdWUtMQ0K", - "password": "dmFsdWUtMg0KDQo=" - } -}] + "apiVersion": "v1beta3", + "kind": "List", + "items": + [{ + "kind": "Secret", + "apiVersion": "v1beta3", + "metadata": { + "name": "prod-db-secret" + }, + "data": { + "password": "dmFsdWUtMg0KDQo=", + "username": "dmFsdWUtMQ0K" + } + }, + { + "kind": "Secret", + "apiVersion": "v1beta3", + "metadata": { + "name": "test-db-secret" + }, + "data": { + "password": "dmFsdWUtMg0KDQo=", + "username": "dmFsdWUtMQ0K" + } + }] +} ``` The pods: ```json -[{ - "id": "prod-db-client-pod", - "kind": "Pod", - "apiVersion":"v1beta2", - "labels": { - "name": "prod-db-client" - }, - "desiredState": { - "manifest": { - "version": "v1beta1", - "id": "prod-db-pod", - "containers": [{ - "name": "db-client-container", - "image": "myClientImage", - "volumeMounts": [{ +{ + "apiVersion": "v1beta3", + "kind": "List", + "items": + [{ + "kind": "Pod", + "apiVersion": "v1beta3", + "metadata": { + "name": "prod-db-client-pod", + "labels": { + "name": "prod-db-client" + } + }, + "spec": { + "volumes": [ + { "name": "secret-volume", - "mountPath": "/etc/secret-volume", - "readOnly": true - }] - }], - "volumes": [{ - "name": "secret-volume", - "source": { "secret": { - "target": { - "kind": "Secret", - "namespace": "example", - "name": "prod-db-secret" - } + "secretName": "prod-db-secret" } } - }] + ], + "containers": [ + { + "name": "db-client-container", + "image": "myClientImage", + "volumeMounts": [ + { + "name": "secret-volume", + "readOnly": true, + "mountPath": "/etc/secret-volume" + } + ] + } + ] } - } -}, -{ - "id": "test-db-client-pod", - "kind": "Pod", - "apiVersion":"v1beta2", - "labels": { - "name": "test-db-client" }, - "desiredState": { - "manifest": { - "version": "v1beta1", - "id": "test-db-pod", - "containers": [{ - "name": "db-client-container", - "image": "myClientImage", - "volumeMounts": [{ + { + "kind": "Pod", + "apiVersion": "v1beta3", + "metadata": { + "name": "test-db-client-pod", + "labels": { + "name": "test-db-client" + } + }, + "spec": { + "volumes": [ + { "name": "secret-volume", - "mountPath": "/etc/secret-volume", - "readOnly": true - }] - }], - "volumes": [{ - "name": "secret-volume", - "source": { "secret": { - "target": { - "kind": "Secret", - "namespace": "example", - "name": "test-db-secret" - } + "secretName": "test-db-secret" } } - }] + ], + "containers": [ + { + "name": "db-client-container", + "image": "myClientImage", + "volumeMounts": [ + { + "name": "secret-volume", + "readOnly": true, + "mountPath": "/etc/secret-volume" + } + ] + } + ] } - } -}] + }] +} ``` The specs for the two pods differ only in the value of the object referred to by the secret volume -- cgit v1.2.3 From 7934ee41659f70d1f5b309abc907693988a523eb Mon Sep 17 00:00:00 2001 From: Paul Morie Date: Fri, 22 May 2015 18:21:03 -0400 Subject: Make kubelet expand var refs in cmd, args, env --- expansion.md | 35 ++++++++++------------------------- 1 file changed, 10 insertions(+), 25 deletions(-) diff --git a/expansion.md b/expansion.md index 00c32797..d15f2501 100644 --- a/expansion.md +++ b/expansion.md @@ -207,9 +207,8 @@ The necessary changes to implement this functionality are: 2. Introduce `third_party/golang/expansion` package that provides: 1. An `Expand(string, func(string) string) string` function 2. A `MappingFuncFor(ObjectEventRecorder, ...map[string]string) string` function -3. Add a new EnvVarSource for expansions and associated tests -4. Make the kubelet expand environment correctly -5. Make the kubelet expand command correctly +3. Make the kubelet expand environment correctly +4. Make the kubelet expand command correctly #### Event Recording @@ -277,31 +276,17 @@ func Expand(input string, mapping func(string) string) string { } ``` -#### Expansion `EnvVarSource` - -In order to avoid changing the existing behavior of the `EnvVar.Value` field, there should be a new -`EnvVarSource` that represents a variable expansion that an env var's value should come from: - -```go -// EnvVarSource represents a source for the value of an EnvVar. -type EnvVarSource struct { - // Other fields omitted - - Expansion *EnvVarExpansion -} - -type EnvVarExpansion struct { - // The input string to be expanded - Expand string -} -``` - #### Kubelet changes -The Kubelet should change to: +The Kubelet should be made to correctly expand variables references in a container's environment, +command, and args. Changes will need to be made to: -1. Correctly expand environment variables with `Expansion` sources -2. Correctly expand references in the Command and Args +1. The `makeEnvironmentVariables` function in the kubelet; this is used by + `GenerateRunContainerOptions`, which is used by both the docker and rkt container runtimes +2. The docker manager `setEntrypointAndCommand` func has to be changed to perform variable + expansion +3. The rkt runtime should be made to support expansion in command and args when support for it is + implemented ### Examples -- cgit v1.2.3 From bd70869deb13da3d7193141bc85d50591d349aa4 Mon Sep 17 00:00:00 2001 From: Anastasis Andronidis Date: Thu, 21 May 2015 22:53:10 +0200 Subject: rename run-container to run in kubectl --- developer-guides/vagrant.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/developer-guides/vagrant.md b/developer-guides/vagrant.md index d0c07f3f..31ad79f1 100644 --- a/developer-guides/vagrant.md +++ b/developer-guides/vagrant.md @@ -191,7 +191,7 @@ NAME IMAGE(S SELECTOR REPLICAS Start a container running nginx with a replication controller and three replicas ``` -$ cluster/kubectl.sh run-container my-nginx --image=nginx --replicas=3 --port=80 +$ cluster/kubectl.sh run my-nginx --image=nginx --replicas=3 --port=80 ``` When listing the pods, you will see that three containers have been started and are in Waiting state: -- cgit v1.2.3 From 4636961f5a4077462e01f7f4514d852801081c74 Mon Sep 17 00:00:00 2001 From: Anastasis Andronidis Date: Thu, 21 May 2015 23:10:25 +0200 Subject: rename resize to scale --- developer-guides/vagrant.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/developer-guides/vagrant.md b/developer-guides/vagrant.md index 31ad79f1..e51b7187 100644 --- a/developer-guides/vagrant.md +++ b/developer-guides/vagrant.md @@ -243,10 +243,10 @@ myNginx nginx name=my-nginx 3 We did not start any services, hence there are none listed. But we see three replicas displayed properly. Check the [guestbook](../../examples/guestbook/README.md) application to learn how to create a service. -You can already play with resizing the replicas with: +You can already play with scaling the replicas with: ```sh -$ ./cluster/kubectl.sh resize rc my-nginx --replicas=2 +$ ./cluster/kubectl.sh scale rc my-nginx --replicas=2 $ ./cluster/kubectl.sh get pods NAME IMAGE(S) HOST LABELS STATUS 7813c8bd-3ffe-11e4-9036-0800279696e1 nginx 10.245.2.2/10.245.2.2 name=myNginx Running -- cgit v1.2.3 From 68ae206a68d52babea39f37b7c9b5a8b5ea0523e Mon Sep 17 00:00:00 2001 From: Anastasis Andronidis Date: Thu, 21 May 2015 23:10:25 +0200 Subject: rename resize to scale --- autoscaling.md | 184 ++++++++++++++++++++++++++++----------------------------- 1 file changed, 92 insertions(+), 92 deletions(-) diff --git a/autoscaling.md b/autoscaling.md index a2838743..29d20c82 100644 --- a/autoscaling.md +++ b/autoscaling.md @@ -1,18 +1,18 @@ ## Abstract -Auto-scaling is a data-driven feature that allows users to increase or decrease capacity as needed by controlling the +Auto-scaling is a data-driven feature that allows users to increase or decrease capacity as needed by controlling the number of pods deployed within the system automatically. ## Motivation -Applications experience peaks and valleys in usage. In order to respond to increases and decreases in load, administrators -scale their applications by adding computing resources. In the cloud computing environment this can be +Applications experience peaks and valleys in usage. In order to respond to increases and decreases in load, administrators +scale their applications by adding computing resources. In the cloud computing environment this can be done automatically based on statistical analysis and thresholds. ### Goals * Provide a concrete proposal for implementing auto-scaling pods within Kubernetes -* Implementation proposal should be in line with current discussions in existing issues: - * Resize verb - [1629](https://github.com/GoogleCloudPlatform/kubernetes/issues/1629) +* Implementation proposal should be in line with current discussions in existing issues: + * Scale verb - [1629](https://github.com/GoogleCloudPlatform/kubernetes/issues/1629) * Config conflicts - [Config](https://github.com/GoogleCloudPlatform/kubernetes/blob/c7cb991987193d4ca33544137a5cb7d0292cf7df/docs/config.md#automated-re-configuration-processes) * Rolling updates - [1353](https://github.com/GoogleCloudPlatform/kubernetes/issues/1353) * Multiple scalable types - [1624](https://github.com/GoogleCloudPlatform/kubernetes/issues/1624) @@ -20,45 +20,45 @@ done automatically based on statistical analysis and thresholds. ## Constraints and Assumptions * This proposal is for horizontal scaling only. Vertical scaling will be handled in [issue 2072](https://github.com/GoogleCloudPlatform/kubernetes/issues/2072) -* `ReplicationControllers` will not know about the auto-scaler, they are the target of the auto-scaler. The `ReplicationController` responsibilities are +* `ReplicationControllers` will not know about the auto-scaler, they are the target of the auto-scaler. The `ReplicationController` responsibilities are constrained to only ensuring that the desired number of pods are operational per the [Replication Controller Design](http://docs.k8s.io/replication-controller.md#responsibilities-of-the-replication-controller) * Auto-scalers will be loosely coupled with data gathering components in order to allow a wide variety of input sources -* Auto-scalable resources will support a resize verb ([1629](https://github.com/GoogleCloudPlatform/kubernetes/issues/1629)) +* Auto-scalable resources will support a scale verb ([1629](https://github.com/GoogleCloudPlatform/kubernetes/issues/1629)) such that the auto-scaler does not directly manipulate the underlying resource. -* Initially, most thresholds will be set by application administrators. It should be possible for an autoscaler to be +* Initially, most thresholds will be set by application administrators. It should be possible for an autoscaler to be written later that sets thresholds automatically based on past behavior (CPU used vs incoming requests). -* The auto-scaler must be aware of user defined actions so it does not override them unintentionally (for instance someone +* The auto-scaler must be aware of user defined actions so it does not override them unintentionally (for instance someone explicitly setting the replica count to 0 should mean that the auto-scaler does not try to scale the application up) * It should be possible to write and deploy a custom auto-scaler without modifying existing auto-scalers -* Auto-scalers must be able to monitor multiple replication controllers while only targeting a single scalable -object (for now a ReplicationController, but in the future it could be a job or any resource that implements resize) +* Auto-scalers must be able to monitor multiple replication controllers while only targeting a single scalable +object (for now a ReplicationController, but in the future it could be a job or any resource that implements scale) ## Use Cases ### Scaling based on traffic -The current, most obvious, use case is scaling an application based on network traffic like requests per second. Most -applications will expose one or more network endpoints for clients to connect to. Many of those endpoints will be load -balanced or situated behind a proxy - the data from those proxies and load balancers can be used to estimate client to +The current, most obvious, use case is scaling an application based on network traffic like requests per second. Most +applications will expose one or more network endpoints for clients to connect to. Many of those endpoints will be load +balanced or situated behind a proxy - the data from those proxies and load balancers can be used to estimate client to server traffic for applications. This is the primary, but not sole, source of data for making decisions. -Within Kubernetes a [kube proxy](http://docs.k8s.io/services.md#ips-and-portals) +Within Kubernetes a [kube proxy](http://docs.k8s.io/services.md#ips-and-portals) running on each node directs service requests to the underlying implementation. -While the proxy provides internal inter-pod connections, there will be L3 and L7 proxies and load balancers that manage -traffic to backends. OpenShift, for instance, adds a "route" resource for defining external to internal traffic flow. -The "routers" are HAProxy or Apache load balancers that aggregate many different services and pods and can serve as a +While the proxy provides internal inter-pod connections, there will be L3 and L7 proxies and load balancers that manage +traffic to backends. OpenShift, for instance, adds a "route" resource for defining external to internal traffic flow. +The "routers" are HAProxy or Apache load balancers that aggregate many different services and pods and can serve as a data source for the number of backends. ### Scaling based on predictive analysis -Scaling may also occur based on predictions of system state like anticipated load, historical data, etc. Hand in hand +Scaling may also occur based on predictions of system state like anticipated load, historical data, etc. Hand in hand with scaling based on traffic, predictive analysis may be used to determine anticipated system load and scale the application automatically. ### Scaling based on arbitrary data Administrators may wish to scale the application based on any number of arbitrary data points such as job execution time or -duration of active sessions. There are any number of reasons an administrator may wish to increase or decrease capacity which +duration of active sessions. There are any number of reasons an administrator may wish to increase or decrease capacity which means the auto-scaler must be a configurable, extensible component. ## Specification @@ -68,23 +68,23 @@ In order to facilitate talking about auto-scaling the following definitions are * `ReplicationController` - the first building block of auto scaling. Pods are deployed and scaled by a `ReplicationController`. * kube proxy - The proxy handles internal inter-pod traffic, an example of a data source to drive an auto-scaler * L3/L7 proxies - A routing layer handling outside to inside traffic requests, an example of a data source to drive an auto-scaler -* auto-scaler - scales replicas up and down by using the `resize` endpoint provided by scalable resources (`ReplicationController`) +* auto-scaler - scales replicas up and down by using the `scale` endpoint provided by scalable resources (`ReplicationController`) ### Auto-Scaler -The Auto-Scaler is a state reconciler responsible for checking data against configured scaling thresholds -and calling the `resize` endpoint to change the number of replicas. The scaler will -use a client/cache implementation to receive watch data from the data aggregators and respond to them by -scaling the application. Auto-scalers are created and defined like other resources via REST endpoints and belong to the +The Auto-Scaler is a state reconciler responsible for checking data against configured scaling thresholds +and calling the `scale` endpoint to change the number of replicas. The scaler will +use a client/cache implementation to receive watch data from the data aggregators and respond to them by +scaling the application. Auto-scalers are created and defined like other resources via REST endpoints and belong to the namespace just as a `ReplicationController` or `Service`. Since an auto-scaler is a durable object it is best represented as a resource. ```go //The auto scaler interface - type AutoScalerInterface interface { - //ScaleApplication adjusts a resource's replica count. Calls resize endpoint. + type AutoScalerInterface interface { + //ScaleApplication adjusts a resource's replica count. Calls scale endpoint. //Args to this are based on what the endpoint //can support. See https://github.com/GoogleCloudPlatform/kubernetes/issues/1629 ScaleApplication(num int) error @@ -95,162 +95,162 @@ Since an auto-scaler is a durable object it is best represented as a resource. TypeMeta //common construct ObjectMeta - - //Spec defines the configuration options that drive the behavior for this auto-scaler + + //Spec defines the configuration options that drive the behavior for this auto-scaler Spec AutoScalerSpec - + //Status defines the current status of this auto-scaler. - Status AutoScalerStatus + Status AutoScalerStatus } - + type AutoScalerSpec struct { //AutoScaleThresholds holds a collection of AutoScaleThresholds that drive the auto scaler AutoScaleThresholds []AutoScaleThreshold - + //Enabled turns auto scaling on or off - Enabled boolean - + Enabled boolean + //MaxAutoScaleCount defines the max replicas that the auto scaler can use. //This value must be greater than 0 and >= MinAutoScaleCount MaxAutoScaleCount int - - //MinAutoScaleCount defines the minimum number replicas that the auto scaler can reduce to, - //0 means that the application is allowed to idle - MinAutoScaleCount int - - //TargetSelector provides the resizeable target(s). Right now this is a ReplicationController - //in the future it could be a job or any resource that implements resize. + + //MinAutoScaleCount defines the minimum number replicas that the auto scaler can reduce to, + //0 means that the application is allowed to idle + MinAutoScaleCount int + + //TargetSelector provides the scalable target(s). Right now this is a ReplicationController + //in the future it could be a job or any resource that implements scale. TargetSelector map[string]string - - //MonitorSelector defines a set of capacity that the auto-scaler is monitoring + + //MonitorSelector defines a set of capacity that the auto-scaler is monitoring //(replication controllers). Monitored objects are used by thresholds to examine //statistics. Example: get statistic X for object Y to see if threshold is passed MonitorSelector map[string]string } - + type AutoScalerStatus struct { // TODO: open for discussion on what meaningful information can be reported in the status // The status may return the replica count here but we may want more information // such as if the count reflects a threshold being passed - } - - + } + + //AutoScaleThresholdInterface abstracts the data analysis from the auto-scaler - //example: scale by 1 (Increment) when RequestsPerSecond (Type) pass + //example: scale by 1 (Increment) when RequestsPerSecond (Type) pass //comparison (Comparison) of 50 (Value) for 30 seconds (Duration) type AutoScaleThresholdInterface interface { //called by the auto-scaler to determine if this threshold is met or not ShouldScale() boolean } - - + + //AutoScaleThreshold is a single statistic used to drive the auto-scaler in scaling decisions type AutoScaleThreshold struct { // Type is the type of threshold being used, intention or value Type AutoScaleThresholdType - + // ValueConfig holds the config for value based thresholds ValueConfig AutoScaleValueThresholdConfig - + // IntentionConfig holds the config for intention based thresholds - IntentionConfig AutoScaleIntentionThresholdConfig - } - + IntentionConfig AutoScaleIntentionThresholdConfig + } + // AutoScaleIntentionThresholdConfig holds configuration for intention based thresholds - // a intention based threshold defines no increment, the scaler will adjust by 1 accordingly - // and maintain once the intention is reached. Also, no selector is defined, the intention - // should dictate the selector used for statistics. Same for duration although we + // a intention based threshold defines no increment, the scaler will adjust by 1 accordingly + // and maintain once the intention is reached. Also, no selector is defined, the intention + // should dictate the selector used for statistics. Same for duration although we // may want a configurable duration later so intentions are more customizable. type AutoScaleIntentionThresholdConfig struct { // Intent is the lexicon of what intention is requested Intent AutoScaleIntentionType - - // Value is intention dependent in terms of above, below, equal and represents + + // Value is intention dependent in terms of above, below, equal and represents // the value to check against Value float } - + // AutoScaleValueThresholdConfig holds configuration for value based thresholds type AutoScaleValueThresholdConfig struct { - //Increment determines how the auot-scaler should scale up or down (positive number to + //Increment determines how the auot-scaler should scale up or down (positive number to //scale up based on this threshold negative number to scale down by this threshold) Increment int //Selector represents the retrieval mechanism for a statistic value from statistics //storage. Once statistics are better defined the retrieval mechanism may change. - //Ultimately, the selector returns a representation of a statistic that can be + //Ultimately, the selector returns a representation of a statistic that can be //compared against the threshold value. - Selector map[string]string + Selector map[string]string //Duration is the time lapse after which this threshold is considered passed Duration time.Duration - //Value is the number at which, after the duration is passed, this threshold is considered + //Value is the number at which, after the duration is passed, this threshold is considered //to be triggered Value float //Comparison component to be applied to the value. Comparison string } - + // AutoScaleThresholdType is either intention based or value based - type AutoScaleThresholdType string - - // AutoScaleIntentionType is a lexicon for intentions such as "cpu-utilization", + type AutoScaleThresholdType string + + // AutoScaleIntentionType is a lexicon for intentions such as "cpu-utilization", // "max-rps-per-endpoint" type AutoScaleIntentionType string ``` - -#### Boundary Definitions + +#### Boundary Definitions The `AutoScaleThreshold` definitions provide the boundaries for the auto-scaler. By defining comparisons that form a range -along with positive and negative increments you may define bi-directional scaling. For example the upper bound may be -specified as "when requests per second rise above 50 for 30 seconds scale the application up by 1" and a lower bound may +along with positive and negative increments you may define bi-directional scaling. For example the upper bound may be +specified as "when requests per second rise above 50 for 30 seconds scale the application up by 1" and a lower bound may be specified as "when requests per second fall below 25 for 30 seconds scale the application down by 1 (implemented by using -1)". ### Data Aggregator -This section has intentionally been left empty. I will defer to folks who have more experience gathering and analyzing +This section has intentionally been left empty. I will defer to folks who have more experience gathering and analyzing time series statistics. -Data aggregation is opaque to the the auto-scaler resource. The auto-scaler is configured to use `AutoScaleThresholds` -that know how to work with the underlying data in order to know if an application must be scaled up or down. Data aggregation -must feed a common data structure to ease the development of `AutoScaleThreshold`s but it does not matter to the +Data aggregation is opaque to the the auto-scaler resource. The auto-scaler is configured to use `AutoScaleThresholds` +that know how to work with the underlying data in order to know if an application must be scaled up or down. Data aggregation +must feed a common data structure to ease the development of `AutoScaleThreshold`s but it does not matter to the auto-scaler whether this occurs in a push or pull implementation, whether or not the data is stored at a granular level, -or what algorithm is used to determine the final statistics value. Ultimately, the auto-scaler only requires that a statistic +or what algorithm is used to determine the final statistics value. Ultimately, the auto-scaler only requires that a statistic resolves to a value that can be checked against a configured threshold. Of note: If the statistics gathering mechanisms can be initialized with a registry other components storing statistics can potentially piggyback on this registry. ### Multi-target Scaling Policy -If multiple resizable targets satisfy the `TargetSelector` criteria the auto-scaler should be configurable as to which -target(s) are resized. To begin with, if multiple targets are found the auto-scaler will scale the largest target up +If multiple scalable targets satisfy the `TargetSelector` criteria the auto-scaler should be configurable as to which +target(s) are scaled. To begin with, if multiple targets are found the auto-scaler will scale the largest target up or down as appropriate. In the future this may be more configurable. ### Interactions with a deployment In a deployment it is likely that multiple replication controllers must be monitored. For instance, in a [rolling deployment](http://docs.k8s.io/replication-controller.md#rolling-updates) -there will be multiple replication controllers, with one scaling up and another scaling down. This means that an -auto-scaler must be aware of the entire set of capacity that backs a service so it does not fight with the deployer. `AutoScalerSpec.MonitorSelector` -is what provides this ability. By using a selector that spans the entire service the auto-scaler can monitor capacity -of multiple replication controllers and check that capacity against the `AutoScalerSpec.MaxAutoScaleCount` and +there will be multiple replication controllers, with one scaling up and another scaling down. This means that an +auto-scaler must be aware of the entire set of capacity that backs a service so it does not fight with the deployer. `AutoScalerSpec.MonitorSelector` +is what provides this ability. By using a selector that spans the entire service the auto-scaler can monitor capacity +of multiple replication controllers and check that capacity against the `AutoScalerSpec.MaxAutoScaleCount` and `AutoScalerSpec.MinAutoScaleCount` while still only targeting a specific set of `ReplicationController`s with `TargetSelector`. In the course of a deployment it is up to the deployment orchestration to decide how to manage the labels -on the replication controllers if it needs to ensure that only specific replication controllers are targeted by +on the replication controllers if it needs to ensure that only specific replication controllers are targeted by the auto-scaler. By default, the auto-scaler will scale the largest replication controller that meets the target label selector criteria. - + During deployment orchestration the auto-scaler may be making decisions to scale its target up or down. In order to prevent the scaler from fighting with a deployment process that is scaling one replication controller up and scaling another one -down the deployment process must assume that the current replica count may be changed by objects other than itself and +down the deployment process must assume that the current replica count may be changed by objects other than itself and account for this in the scale up or down process. Therefore, the deployment process may no longer target an exact number of instances to be deployed. It must be satisfied that the replica count for the deployment meets or exceeds the number of requested instances. -Auto-scaling down in a deployment scenario is a special case. In order for the deployment to complete successfully the +Auto-scaling down in a deployment scenario is a special case. In order for the deployment to complete successfully the deployment orchestration must ensure that the desired number of instances that are supposed to be deployed has been met. If the auto-scaler is trying to scale the application down (due to no traffic, or other statistics) then the deployment process and auto-scaler are fighting to increase and decrease the count of the targeted replication controller. In order -to prevent this, deployment orchestration should notify the auto-scaler that a deployment is occurring. This will -temporarily disable negative decrement thresholds until the deployment process is completed. It is more important for -an auto-scaler to be able to grow capacity during a deployment than to shrink the number of instances precisely. +to prevent this, deployment orchestration should notify the auto-scaler that a deployment is occurring. This will +temporarily disable negative decrement thresholds until the deployment process is completed. It is more important for +an auto-scaler to be able to grow capacity during a deployment than to shrink the number of instances precisely. -- cgit v1.2.3 From a5e20a975cbae982e5f8fd3960fd9b0680b9e5d3 Mon Sep 17 00:00:00 2001 From: Wojciech Tyczynski Date: Thu, 28 May 2015 17:41:42 +0200 Subject: Update instructions on conversions. --- api_changes.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/api_changes.md b/api_changes.md index 4627c6df..17278c6e 100644 --- a/api_changes.md +++ b/api_changes.md @@ -254,6 +254,12 @@ regenerate auto-generated ones. To regenerate them: $ hack/update-generated-conversions.sh ``` +If running the above script is impossible due to compile errors, the easiest +workaround is to comment out the code causing errors and let the script to +regenerate it. If the auto-generated conversion methods are not used by the +manually-written ones, it's fine to just remove the whole file and let the +generator to create it from scratch. + Unsurprisingly, adding manually written conversion also requires you to add tests to `pkg/api//conversion_test.go`. -- cgit v1.2.3 From b04be7e742cd98482e332a1caa3d5b71bbbcf636 Mon Sep 17 00:00:00 2001 From: Tim Hockin Date: Sat, 23 May 2015 13:41:11 -0700 Subject: Rename 'portal IP' to 'cluster IP' most everywhere This covers obvious transforms, but not --portal_net, $PORTAL_NET and similar. --- networking.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/networking.md b/networking.md index f351629e..cd2bd0c5 100644 --- a/networking.md +++ b/networking.md @@ -83,7 +83,7 @@ We want to be able to assign IP addresses externally from Docker ([Docker issue In addition to enabling self-registration with 3rd-party discovery mechanisms, we'd like to setup DDNS automatically ([Issue #146](https://github.com/GoogleCloudPlatform/kubernetes/issues/146)). hostname, $HOSTNAME, etc. should return a name for the pod ([Issue #298](https://github.com/GoogleCloudPlatform/kubernetes/issues/298)), and gethostbyname should be able to resolve names of other pods. Probably we need to set up a DNS resolver to do the latter ([Docker issue #2267](https://github.com/dotcloud/docker/issues/2267)), so that we don't need to keep /etc/hosts files up to date dynamically. -[Service](http://docs.k8s.io/services.md) endpoints are currently found through environment variables. Both [Docker-links-compatible](https://docs.docker.com/userguide/dockerlinks/) variables and kubernetes-specific variables ({NAME}_SERVICE_HOST and {NAME}_SERVICE_BAR) are supported, and resolve to ports opened by the service proxy. We don't actually use [the Docker ambassador pattern](https://docs.docker.com/articles/ambassador_pattern_linking/) to link containers because we don't require applications to identify all clients at configuration time, yet. While services today are managed by the service proxy, this is an implementation detail that applications should not rely on. Clients should instead use the [service portal IP](http://docs.k8s.io/services.md) (which the above environment variables will resolve to). However, a flat service namespace doesn't scale and environment variables don't permit dynamic updates, which complicates service deployment by imposing implicit ordering constraints. We intend to register each service portal IP in DNS, and for that to become the preferred resolution protocol. +[Service](http://docs.k8s.io/services.md) endpoints are currently found through environment variables. Both [Docker-links-compatible](https://docs.docker.com/userguide/dockerlinks/) variables and kubernetes-specific variables ({NAME}_SERVICE_HOST and {NAME}_SERVICE_BAR) are supported, and resolve to ports opened by the service proxy. We don't actually use [the Docker ambassador pattern](https://docs.docker.com/articles/ambassador_pattern_linking/) to link containers because we don't require applications to identify all clients at configuration time, yet. While services today are managed by the service proxy, this is an implementation detail that applications should not rely on. Clients should instead use the [service IP](http://docs.k8s.io/services.md) (which the above environment variables will resolve to). However, a flat service namespace doesn't scale and environment variables don't permit dynamic updates, which complicates service deployment by imposing implicit ordering constraints. We intend to register each service's IP in DNS, and for that to become the preferred resolution protocol. We'd also like to accommodate other load-balancing solutions (e.g., HAProxy), non-load-balanced services ([Issue #260](https://github.com/GoogleCloudPlatform/kubernetes/issues/260)), and other types of groups (worker pools, etc.). Providing the ability to Watch a label selector applied to pod addresses would enable efficient monitoring of group membership, which could be directly consumed or synced with a discovery mechanism. Event hooks ([Issue #140](https://github.com/GoogleCloudPlatform/kubernetes/issues/140)) for join/leave events would probably make this even easier. -- cgit v1.2.3 From f21879b3d766d7c89983e429f3ee068bf73854d5 Mon Sep 17 00:00:00 2001 From: Tim Hockin Date: Sat, 23 May 2015 13:41:11 -0700 Subject: Rename 'portal IP' to 'cluster IP' most everywhere This covers obvious transforms, but not --portal_net, $PORTAL_NET and similar. --- autoscaling.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/autoscaling.md b/autoscaling.md index 29d20c82..31374448 100644 --- a/autoscaling.md +++ b/autoscaling.md @@ -42,7 +42,7 @@ applications will expose one or more network endpoints for clients to connect to balanced or situated behind a proxy - the data from those proxies and load balancers can be used to estimate client to server traffic for applications. This is the primary, but not sole, source of data for making decisions. -Within Kubernetes a [kube proxy](http://docs.k8s.io/services.md#ips-and-portals) +Within Kubernetes a [kube proxy](http://docs.k8s.io/services.md#ips-and-vips) running on each node directs service requests to the underlying implementation. While the proxy provides internal inter-pod connections, there will be L3 and L7 proxies and load balancers that manage -- cgit v1.2.3 From f2a6d63ddaf19110bf33d32e24710831a1bd9938 Mon Sep 17 00:00:00 2001 From: Paul Morie Date: Fri, 29 May 2015 01:00:36 -0400 Subject: Corrections to examples in expansion docs --- expansion.md | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/expansion.md b/expansion.md index d15f2501..b3ef161b 100644 --- a/expansion.md +++ b/expansion.md @@ -359,9 +359,7 @@ spec: command: [ "/bin/sh", "-c", "env" ] env: - name: PUBLIC_URL - valueFrom: - expansion: - expand: "http://$(GITSERVER_SERVICE_HOST):$(GITSERVER_SERVICE_PORT)" + value: "http://$(GITSERVER_SERVICE_HOST):$(GITSERVER_SERVICE_PORT)" restartPolicy: Never ``` @@ -383,9 +381,7 @@ spec: fieldRef: fieldPath: "metadata.namespace" - name: PUBLIC_URL - valueFrom: - expansion: - expand: "http://gitserver.$(POD_NAMESPACE):$(SERVICE_PORT)" + value: "http://gitserver.$(POD_NAMESPACE):$(SERVICE_PORT)" restartPolicy: Never ``` -- cgit v1.2.3 From 4bdc5177692b9c9b946c039aad134a8d7fbda5e0 Mon Sep 17 00:00:00 2001 From: Ben McCann Date: Mon, 1 Jun 2015 20:10:45 -0700 Subject: Document how a secrets server like Vault or Keywhiz might fit into Kubernetes --- secrets.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/secrets.md b/secrets.md index 5f8cb501..e96b0d89 100644 --- a/secrets.md +++ b/secrets.md @@ -148,7 +148,8 @@ have different preferences for the central store of secret data. Some possibili 1. An etcd collection alongside the storage for other API resources 2. A collocated [HSM](http://en.wikipedia.org/wiki/Hardware_security_module) -3. An external datastore such as an external etcd, RDBMS, etc. +3. A secrets server like [Vault](https://www.vaultproject.io/) or [Keywhiz](https://square.github.io/keywhiz/) +4. An external datastore such as an external etcd, RDBMS, etc. #### Size limit for secrets -- cgit v1.2.3 From bd8e7d842472ad28f3b56e6a425939ad48a1274d Mon Sep 17 00:00:00 2001 From: Eric Tune Date: Thu, 28 May 2015 17:21:32 -0700 Subject: Explain that file-based pods cannot use secrets. --- secrets.md | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/secrets.md b/secrets.md index 5f8cb501..cbf93ee2 100644 --- a/secrets.md +++ b/secrets.md @@ -1,4 +1,3 @@ -# Secret Distribution ## Abstract @@ -184,6 +183,11 @@ For now, we will not implement validations around these limits. Cluster operato much node storage is allocated to secrets. It will be the operator's responsibility to ensure that the allocated storage is sufficient for the workload scheduled onto a node. +For now, kubelets will only attach secrets to api-sourced pods, and not file- or http-sourced +ones. Doing so would: + - confuse the secrets admission controller in the case of mirror pods. + - create an apiserver-liveness dependency -- avoiding this dependency is a main reason to use non-api-source pods. + ### Use-Case: Kubelet read of secrets for node The use-case where the kubelet reads secrets has several additional requirements: -- cgit v1.2.3 From 1351801078e0cfac27bfdbfacc431a43de88b94f Mon Sep 17 00:00:00 2001 From: Alex Robinson Date: Thu, 4 Jun 2015 21:32:29 +0000 Subject: Fix broken links in the vagrant developer guide. --- developer-guides/vagrant.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/developer-guides/vagrant.md b/developer-guides/vagrant.md index e51b7187..332ac3d5 100644 --- a/developer-guides/vagrant.md +++ b/developer-guides/vagrant.md @@ -9,7 +9,7 @@ Running kubernetes with Vagrant (and VirtualBox) is an easy way to run/test/deve 2. [VMWare Fusion](https://www.vmware.com/products/fusion/) version 5 or greater as well as the appropriate [Vagrant VMWare Fusion provider](https://www.vagrantup.com/vmware) 3. [VMWare Workstation](https://www.vmware.com/products/workstation/) version 9 or greater as well as the [Vagrant VMWare Workstation provider](https://www.vagrantup.com/vmware) 4. [Parallels Desktop](https://www.parallels.com/products/desktop/) version 9 or greater as well as the [Vagrant Parallels provider](https://parallels.github.io/vagrant-parallels/) -3. Get or build a [binary release](../../getting-started-guides/binary_release.md) +3. Get or build a [binary release](/docs/getting-started-guides/binary_release.md) ### Setup @@ -242,7 +242,7 @@ myNginx nginx name=my-nginx 3 ``` We did not start any services, hence there are none listed. But we see three replicas displayed properly. -Check the [guestbook](../../examples/guestbook/README.md) application to learn how to create a service. +Check the [guestbook](/examples/guestbook/README.md) application to learn how to create a service. You can already play with scaling the replicas with: ```sh -- cgit v1.2.3 From 1bb3ed53eeed2d0cc493f7974b09e4168c35ad9f Mon Sep 17 00:00:00 2001 From: Scott Konzem Date: Fri, 5 Jun 2015 11:35:17 -0400 Subject: Fix misspellings in documentation --- secrets.md | 2 +- service_accounts.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/secrets.md b/secrets.md index cbf93ee2..0bacb8d4 100644 --- a/secrets.md +++ b/secrets.md @@ -122,7 +122,7 @@ We should consider what the best way to allow this is; there are a few different 3. Give secrets attributes that allow the user to express that the secret should be presented to the container as an environment variable. The container's environment would contain the - desired values and the software in the container could use them without accomodation the + desired values and the software in the container could use them without accommodation the command or setup script. For our initial work, we will treat all secrets as files to narrow the problem space. There will diff --git a/service_accounts.md b/service_accounts.md index 9e6bc099..72a10207 100644 --- a/service_accounts.md +++ b/service_accounts.md @@ -149,7 +149,7 @@ First, if it finds pods which have a `Pod.Spec.ServiceAccountUsername` but no `P then it copies in the referenced securityContext and secrets references for the corresponding `serviceAccount`. Second, if ServiceAccount definitions change, it may take some actions. -**TODO**: decide what actions it takes when a serviceAccount defintion changes. Does it stop pods, or just +**TODO**: decide what actions it takes when a serviceAccount definition changes. Does it stop pods, or just allow someone to list ones that out out of spec? In general, people may want to customize this? Third, if a new namespace is created, it may create a new serviceAccount for that namespace. This may include -- cgit v1.2.3 From a4e70adc69cdcafe328666420f941c995c233173 Mon Sep 17 00:00:00 2001 From: Scott Konzem Date: Fri, 5 Jun 2015 11:35:17 -0400 Subject: Fix misspellings in documentation --- high-availability.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/high-availability.md b/high-availability.md index 909903a2..f8d88e6b 100644 --- a/high-availability.md +++ b/high-availability.md @@ -4,7 +4,7 @@ This document serves as a proposal for high availability of the scheduler and co ## Design Options For complete reference see [this](https://www.ibm.com/developerworks/community/blogs/RohitShetty/entry/high_availability_cold_warm_hot?lang=en) -1. Hot Standby: In this scenario, data and state are shared between the two components such that an immediate failure in one component causes the the standby deamon to take over exactly where the failed component had left off. This would be an ideal solution for kubernetes, however it poses a series of challenges in the case of controllers where component-state is cached locally and not persisted in a transactional way to a storage facility. This would also introduce additional load on the apiserver, which is not desireable. As a result, we are **NOT** planning on this approach at this time. +1. Hot Standby: In this scenario, data and state are shared between the two components such that an immediate failure in one component causes the the standby deamon to take over exactly where the failed component had left off. This would be an ideal solution for kubernetes, however it poses a series of challenges in the case of controllers where component-state is cached locally and not persisted in a transactional way to a storage facility. This would also introduce additional load on the apiserver, which is not desirable. As a result, we are **NOT** planning on this approach at this time. 2. **Warm Standby**: In this scenario there is only one active component acting as the master and additional components running but not providing service or responding to requests. Data and state are not shared between the active and standby components. When a failure occurs, the standby component that becomes the master must determine the current state of the system before resuming functionality. This is the apprach that this proposal will leverage. -- cgit v1.2.3 From db372e1f640d4b903c228d6e0d536f55206a2bd2 Mon Sep 17 00:00:00 2001 From: Kris Rousey Date: Fri, 5 Jun 2015 12:47:15 -0700 Subject: Updating docs/ to v1 --- expansion.md | 4 ++-- namespaces.md | 8 ++++---- persistent-storage.md | 6 +++--- secrets.md | 16 ++++++++-------- 4 files changed, 17 insertions(+), 17 deletions(-) diff --git a/expansion.md b/expansion.md index b3ef161b..f4c85e8d 100644 --- a/expansion.md +++ b/expansion.md @@ -348,7 +348,7 @@ No other variables are defined. Notice the `$(var)` syntax. ```yaml -apiVersion: v1beta3 +apiVersion: v1 kind: Pod metadata: name: expansion-pod @@ -366,7 +366,7 @@ spec: #### In a pod: building a URL using downward API ```yaml -apiVersion: v1beta3 +apiVersion: v1 kind: Pod metadata: name: expansion-pod diff --git a/namespaces.md b/namespaces.md index c4a1a90d..0fef2bed 100644 --- a/namespaces.md +++ b/namespaces.md @@ -231,7 +231,7 @@ OpenShift creates a Namespace in Kubernetes ``` { - "apiVersion":"v1beta3", + "apiVersion":"v1", "kind": "Namespace", "metadata": { "name": "development", @@ -256,7 +256,7 @@ User deletes the Namespace in Kubernetes, and Namespace now has following state: ``` { - "apiVersion":"v1beta3", + "apiVersion":"v1", "kind": "Namespace", "metadata": { "name": "development", @@ -281,7 +281,7 @@ removing *kubernetes* from the list of finalizers: ``` { - "apiVersion":"v1beta3", + "apiVersion":"v1", "kind": "Namespace", "metadata": { "name": "development", @@ -309,7 +309,7 @@ This results in the following state: ``` { - "apiVersion":"v1beta3", + "apiVersion":"v1", "kind": "Namespace", "metadata": { "name": "development", diff --git a/persistent-storage.md b/persistent-storage.md index b52e6b71..21a5650d 100644 --- a/persistent-storage.md +++ b/persistent-storage.md @@ -98,7 +98,7 @@ An administrator provisions storage by posting PVs to the API. Various way to a POST: kind: PersistentVolume -apiVersion: v1beta3 +apiVersion: v1 metadata: name: pv0001 spec: @@ -128,7 +128,7 @@ The user must be within a namespace to create PVCs. POST: kind: PersistentVolumeClaim -apiVersion: v1beta3 +apiVersion: v1 metadata: name: myclaim-1 spec: @@ -179,7 +179,7 @@ The claim holder owns the claim and its data for as long as the claim exists. T POST: kind: Pod -apiVersion: v1beta3 +apiVersion: v1 metadata: name: mypod spec: diff --git a/secrets.md b/secrets.md index 0bacb8d4..4d74e68b 100644 --- a/secrets.md +++ b/secrets.md @@ -394,7 +394,7 @@ To create a pod that uses an ssh key stored as a secret, we first need to create ```json { "kind": "Secret", - "apiVersion": "v1beta3", + "apiVersion": "v1", "metadata": { "name": "ssh-key-secret" }, @@ -414,7 +414,7 @@ Now we can create a pod which references the secret with the ssh key and consume ```json { "kind": "Pod", - "apiVersion": "v1beta3", + "apiVersion": "v1", "metadata": { "name": "secret-test-pod", "labels": { @@ -464,12 +464,12 @@ The secrets: ```json { - "apiVersion": "v1beta3", + "apiVersion": "v1", "kind": "List", "items": [{ "kind": "Secret", - "apiVersion": "v1beta3", + "apiVersion": "v1", "metadata": { "name": "prod-db-secret" }, @@ -480,7 +480,7 @@ The secrets: }, { "kind": "Secret", - "apiVersion": "v1beta3", + "apiVersion": "v1", "metadata": { "name": "test-db-secret" }, @@ -496,12 +496,12 @@ The pods: ```json { - "apiVersion": "v1beta3", + "apiVersion": "v1", "kind": "List", "items": [{ "kind": "Pod", - "apiVersion": "v1beta3", + "apiVersion": "v1", "metadata": { "name": "prod-db-client-pod", "labels": { @@ -534,7 +534,7 @@ The pods: }, { "kind": "Pod", - "apiVersion": "v1beta3", + "apiVersion": "v1", "metadata": { "name": "test-db-client-pod", "labels": { -- cgit v1.2.3 From 28951be8bb0fcaa9af59f0cad444c56ae7ecda21 Mon Sep 17 00:00:00 2001 From: Kris Rousey Date: Fri, 5 Jun 2015 12:47:15 -0700 Subject: Updating docs/ to v1 --- flaky-tests.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/flaky-tests.md b/flaky-tests.md index 7870517f..5eb09ec9 100644 --- a/flaky-tests.md +++ b/flaky-tests.md @@ -11,7 +11,7 @@ There is a testing image ```brendanburns/flake``` up on the docker hub. We will Create a replication controller with the following config: ```yaml -apiVersion: v1beta3 +apiVersion: v1 kind: ReplicationController metadata: name: flakecontroller -- cgit v1.2.3 From 867825849dbdd7105fd037239a64397bf2c8d969 Mon Sep 17 00:00:00 2001 From: Brendan Burns Date: Fri, 5 Jun 2015 14:50:11 -0700 Subject: Purge cluster/kubectl.sh from nearly all docs. Mark cluster/kubectl.sh as deprecated. --- persistent-storage.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/persistent-storage.md b/persistent-storage.md index 21a5650d..3729f30e 100644 --- a/persistent-storage.md +++ b/persistent-storage.md @@ -110,7 +110,7 @@ spec: -------------------------------------------------- -cluster/kubectl.sh get pv +kubectl get pv NAME LABELS CAPACITY ACCESSMODES STATUS CLAIM pv0001 map[] 10737418240 RWO Pending @@ -140,7 +140,7 @@ spec: -------------------------------------------------- -cluster/kubectl.sh get pvc +kubectl get pvc NAME LABELS STATUS VOLUME @@ -155,13 +155,13 @@ myclaim-1 map[] pending ``` -cluster/kubectl.sh get pv +kubectl get pv NAME LABELS CAPACITY ACCESSMODES STATUS CLAIM pv0001 map[] 10737418240 RWO Bound myclaim-1 / f4b3d283-c0ef-11e4-8be4-80e6500a981e -cluster/kubectl.sh get pvc +kubectl get pvc NAME LABELS STATUS VOLUME myclaim-1 map[] Bound b16e91d6-c0ef-11e4-8be4-80e6500a981e @@ -205,7 +205,7 @@ When a claim holder is finished with their data, they can delete their claim. ``` -cluster/kubectl.sh delete pvc myclaim-1 +kubectl delete pvc myclaim-1 ``` -- cgit v1.2.3 From 2f18beac68176d99d4137a59faee0e653571ff63 Mon Sep 17 00:00:00 2001 From: Brendan Burns Date: Fri, 5 Jun 2015 14:50:11 -0700 Subject: Purge cluster/kubectl.sh from nearly all docs. Mark cluster/kubectl.sh as deprecated. --- flaky-tests.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/flaky-tests.md b/flaky-tests.md index 5eb09ec9..da5549c8 100644 --- a/flaky-tests.md +++ b/flaky-tests.md @@ -33,7 +33,9 @@ spec: ``` Note that we omit the labels and the selector fields of the replication controller, because they will be populated from the labels field of the pod template by default. -```./cluster/kubectl.sh create -f controller.yaml``` +``` +kubectl create -f controller.yaml +``` This will spin up 24 instances of the test. They will run to completion, then exit, and the kubelet will restart them, accumulating more and more runs of the test. You can examine the recent runs of the test by calling ```docker ps -a``` and looking for tasks that exited with non-zero exit codes. Unfortunately, docker ps -a only keeps around the exit status of the last 15-20 containers with the same image, so you have to check them frequently. @@ -52,7 +54,7 @@ grep "Exited ([^0])" output.txt Eventually you will have sufficient runs for your purposes. At that point you can stop and delete the replication controller by running: ```sh -./cluster/kubectl.sh stop replicationcontroller flakecontroller +kubectl stop replicationcontroller flakecontroller ``` If you do a final check for flakes with ```docker ps -a```, ignore tasks that exited -1, since that's what happens when you stop the replication controller. -- cgit v1.2.3 From 9e09c1101a2e808dba17e0a56741a812b4d4cbf4 Mon Sep 17 00:00:00 2001 From: Jeffrey Paine Date: Mon, 8 Jun 2015 16:32:28 -0400 Subject: Consolidate git setup documentation. Closes #9091 --- development.md | 79 +++++++++++++++++++++++++++++++++---------------------- git_workflow.png | Bin 0 -> 90004 bytes 2 files changed, 48 insertions(+), 31 deletions(-) create mode 100644 git_workflow.png diff --git a/development.md b/development.md index 02b513cc..2e540bcb 100644 --- a/development.md +++ b/development.md @@ -8,23 +8,62 @@ Official releases are built in Docker containers. Details are [here](../../buil Kubernetes is written in [Go](http://golang.org) programming language. If you haven't set up Go development environment, please follow [this instruction](http://golang.org/doc/code.html) to install go tool and set up GOPATH. Ensure your version of Go is at least 1.3. -## Clone kubernetes into GOPATH +## Git Setup -We highly recommend to put kubernetes' code into your GOPATH. For example, the following commands will download kubernetes' code under the current user's GOPATH (Assuming there's only one directory in GOPATH.): +Below, we outline one of the more common git workflows that core developers use. Other git workflows are also valid. + +### Visual overview +![Git workflow](git_workflow.png) + +### Fork the main repository + +1. Go to https://github.com/GoogleCloudPlatform/kubernetes +2. Click the "Fork" button (at the top right) + +### Clone your fork + +The commands below require that you have $GOPATH set ([$GOPATH docs](https://golang.org/doc/code.html#GOPATH)). We highly recommend you put kubernetes' code into your GOPATH. Note: the commands below will not work if there is more than one directory in your `$GOPATH`. ``` -$ echo $GOPATH -/home/user/goproj $ mkdir -p $GOPATH/src/github.com/GoogleCloudPlatform/ $ cd $GOPATH/src/github.com/GoogleCloudPlatform/ -$ git clone https://github.com/GoogleCloudPlatform/kubernetes.git +# Replace "$YOUR_GITHUB_USERNAME" below with your github username +$ git clone https://github.com/$YOUR_GITHUB_USERNAME/kubernetes.git +$ cd kubernetes +$ git remote add upstream 'https://github.com/GoogleCloudPlatform/kubernetes.git' +``` + +### Create a branch and make changes + +``` +$ git checkout -b myfeature +# Make your code changes +``` + +### Keeping your development fork in sync + +``` +$ git fetch upstream +$ git rebase upstream/master +``` + +Note: If you have write access to the main repository at github.com/GoogleCloudPlatform/kubernetes, you should modify your git configuration so that you can't accidentally push to upstream: + +``` +git remote set-url --push upstream no_push ``` -The commands above will not work if there are more than one directory in ``$GOPATH``. +### Commiting changes to your fork + +``` +$ git commit +$ git push -f origin myfeature +``` + +### Creating a pull request +1. Visit http://github.com/$YOUR_GITHUB_USERNAME/kubernetes +2. Click the "Compare and pull request" button next to your "myfeature" branch. -If you plan to do development, read about the -[Kubernetes Github Flow](https://docs.google.com/presentation/d/1HVxKSnvlc2WJJq8b9KCYtact5ZRrzDzkWgKEfm0QO_o/pub?start=false&loop=false&delayms=3000), -and then clone your own fork of Kubernetes as described there. ## godep and dependency management @@ -240,28 +279,6 @@ See [conformance-test.sh](../../hack/conformance-test.sh). ## Testing out flaky tests [Instructions here](flaky-tests.md) -## Keeping your development fork in sync - -One time after cloning your forked repo: - -``` -git remote add upstream https://github.com/GoogleCloudPlatform/kubernetes.git -``` - -Then each time you want to sync to upstream: - -``` -git fetch upstream -git rebase upstream/master -``` - -If you have write access to the main repository, you should modify your git configuration so that -you can't accidentally push to upstream: - -``` -git remote set-url --push upstream no_push -``` - ## Regenerating the CLI documentation ``` diff --git a/git_workflow.png b/git_workflow.png new file mode 100644 index 00000000..e3bd70da Binary files /dev/null and b/git_workflow.png differ -- cgit v1.2.3 From a407b64a3d2be8e3ddca9192609c72e92b64a6a9 Mon Sep 17 00:00:00 2001 From: Ed Costello Date: Thu, 11 Jun 2015 01:11:44 -0400 Subject: Copy edits for spelling errors and typos Signed-off-by: Ed Costello --- clustering.md | 2 +- event_compression.md | 2 +- expansion.md | 4 ++-- security.md | 4 ++-- service_accounts.md | 2 +- simple-rolling-update.md | 2 +- 6 files changed, 8 insertions(+), 8 deletions(-) diff --git a/clustering.md b/clustering.md index d57d631d..4cef06f8 100644 --- a/clustering.md +++ b/clustering.md @@ -38,7 +38,7 @@ The proposed solution will provide a range of options for setting up and maintai The building blocks of an easier solution: -* **Move to TLS** We will move to using TLS for all intra-cluster communication. We will explicitly idenitfy the trust chain (the set of trusted CAs) as opposed to trusting the system CAs. We will also use client certificates for all AuthN. +* **Move to TLS** We will move to using TLS for all intra-cluster communication. We will explicitly identify the trust chain (the set of trusted CAs) as opposed to trusting the system CAs. We will also use client certificates for all AuthN. * [optional] **API driven CA** Optionally, we will run a CA in the master that will mint certificates for the nodes/kubelets. There will be pluggable policies that will automatically approve certificate requests here as appropriate. * **CA approval policy** This is a pluggable policy object that can automatically approve CA signing requests. Stock policies will include `always-reject`, `queue` and `insecure-always-approve`. With `queue` there would be an API for evaluating and accepting/rejecting requests. Cloud providers could implement a policy here that verifies other out of band information and automatically approves/rejects based on other external factors. * **Scoped Kubelet Accounts** These accounts are per-minion and (optionally) give a minion permission to register itself. diff --git a/event_compression.md b/event_compression.md index db0337f0..74aba66f 100644 --- a/event_compression.md +++ b/event_compression.md @@ -25,7 +25,7 @@ Instead of a single Timestamp, each event object [contains](https://github.com/G Each binary that generates events: * Maintains a historical record of previously generated events: - * Implmented with ["Least Recently Used Cache"](https://github.com/golang/groupcache/blob/master/lru/lru.go) in [```pkg/client/record/events_cache.go```](https://github.com/GoogleCloudPlatform/kubernetes/tree/master/pkg/client/record/events_cache.go). + * Implemented with ["Least Recently Used Cache"](https://github.com/golang/groupcache/blob/master/lru/lru.go) in [```pkg/client/record/events_cache.go```](https://github.com/GoogleCloudPlatform/kubernetes/tree/master/pkg/client/record/events_cache.go). * The key in the cache is generated from the event object minus timestamps/count/transient fields, specifically the following events fields are used to construct a unique key for an event: * ```event.Source.Component``` * ```event.Source.Host``` diff --git a/expansion.md b/expansion.md index f4c85e8d..8b31526a 100644 --- a/expansion.md +++ b/expansion.md @@ -55,7 +55,7 @@ available to subsequent expansions. ### Use Case: Variable expansion in command Users frequently need to pass the values of environment variables to a container's command. -Currently, Kubernetes does not perform any expansion of varibles. The workaround is to invoke a +Currently, Kubernetes does not perform any expansion of variables. The workaround is to invoke a shell in the container's command and have the shell perform the substitution, or to write a wrapper script that sets up the environment and runs the command. This has a number of drawbacks: @@ -116,7 +116,7 @@ expanded, then `$(VARIABLE_NAME)` should be present in the output. Although the `$(var)` syntax does overlap with the `$(command)` form of command substitution supported by many shells, because unexpanded variables are present verbatim in the output, we -expect this will not present a problem to many users. If there is a collision between a varible +expect this will not present a problem to many users. If there is a collision between a variable name and command substitution syntax, the syntax can be escaped with the form `$$(VARIABLE_NAME)`, which will evaluate to `$(VARIABLE_NAME)` whether `VARIABLE_NAME` can be expanded or not. diff --git a/security.md b/security.md index 26d543c9..6ea611b7 100644 --- a/security.md +++ b/security.md @@ -22,13 +22,13 @@ While Kubernetes today is not primarily a multi-tenant system, the long term evo We define "user" as a unique identity accessing the Kubernetes API server, which may be a human or an automated process. Human users fall into the following categories: -1. k8s admin - administers a kubernetes cluster and has access to the undelying components of the system +1. k8s admin - administers a kubernetes cluster and has access to the underlying components of the system 2. k8s project administrator - administrates the security of a small subset of the cluster 3. k8s developer - launches pods on a kubernetes cluster and consumes cluster resources Automated process users fall into the following categories: -1. k8s container user - a user that processes running inside a container (on the cluster) can use to access other cluster resources indepedent of the human users attached to a project +1. k8s container user - a user that processes running inside a container (on the cluster) can use to access other cluster resources independent of the human users attached to a project 2. k8s infrastructure user - the user that kubernetes infrastructure components use to perform cluster functions with clearly defined roles diff --git a/service_accounts.md b/service_accounts.md index 72a10207..e87e8e6c 100644 --- a/service_accounts.md +++ b/service_accounts.md @@ -13,7 +13,7 @@ Processes in Pods may need to call the Kubernetes API. For example: They also may interact with services other than the Kubernetes API, such as: - an image repository, such as docker -- both when the images are pulled to start the containers, and for writing images in the case of pods that generate images. - - accessing other cloud services, such as blob storage, in the context of a larged, integrated, cloud offering (hosted + - accessing other cloud services, such as blob storage, in the context of a large, integrated, cloud offering (hosted or private). - accessing files in an NFS volume attached to the pod diff --git a/simple-rolling-update.md b/simple-rolling-update.md index fed1b84f..e5b47d98 100644 --- a/simple-rolling-update.md +++ b/simple-rolling-update.md @@ -22,7 +22,7 @@ The value of that label is the hash of the complete JSON representation of the`` If a rollout fails or is terminated in the middle, it is important that the user be able to resume the roll out. To facilitate recovery in the case of a crash of the updating process itself, we add the following annotations to each replicaController in the ```kubernetes.io/``` annotation namespace: * ```desired-replicas``` The desired number of replicas for this controller (either N or zero) - * ```update-partner``` A pointer to the replicaiton controller resource that is the other half of this update (syntax `````` the namespace is assumed to be identical to the namespace of this replication controller.) + * ```update-partner``` A pointer to the replication controller resource that is the other half of this update (syntax `````` the namespace is assumed to be identical to the namespace of this replication controller.) Recovery is achieved by issuing the same command again: -- cgit v1.2.3 From 2c9669befd4fe4580bc77bc6f3c40236a07bc651 Mon Sep 17 00:00:00 2001 From: Ed Costello Date: Thu, 11 Jun 2015 01:11:44 -0400 Subject: Copy edits for spelling errors and typos Signed-off-by: Ed Costello --- collab.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/collab.md b/collab.md index 293cd6f4..b424f502 100644 --- a/collab.md +++ b/collab.md @@ -8,7 +8,7 @@ First and foremost: as a potential contributor, your changes and ideas are welco ## Code reviews -All changes must be code reviewed. For non-maintainers this is obvious, since you can't commit anyway. But even for maintainers, we want all changes to get at least one review, preferably (for non-trivial changes obligately) from someone who knows the areas the change touches. For non-trivial changes we may want two reviewers. The primary reviewer will make this decision and nominate a second reviewer, if needed. Except for trivial changes, PRs should not be committed until relevant parties (e.g. owners of the subsystem affected by the PR) have had a reasonable chance to look at PR in their local business hours. +All changes must be code reviewed. For non-maintainers this is obvious, since you can't commit anyway. But even for maintainers, we want all changes to get at least one review, preferably (for non-trivial changes obligatorily) from someone who knows the areas the change touches. For non-trivial changes we may want two reviewers. The primary reviewer will make this decision and nominate a second reviewer, if needed. Except for trivial changes, PRs should not be committed until relevant parties (e.g. owners of the subsystem affected by the PR) have had a reasonable chance to look at PR in their local business hours. Most PRs will find reviewers organically. If a maintainer intends to be the primary reviewer of a PR they should set themselves as the assignee on GitHub and say so in a reply to the PR. Only the primary reviewer of a change should actually do the merge, except in rare cases (e.g. they are unavailable in a reasonable timeframe). -- cgit v1.2.3 From c136464397bedc2fe29c515cbe171944d708de82 Mon Sep 17 00:00:00 2001 From: Ed Costello Date: Thu, 11 Jun 2015 01:11:44 -0400 Subject: Copy edits for spelling errors and typos Signed-off-by: Ed Costello --- high-availability.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/high-availability.md b/high-availability.md index f8d88e6b..60ccfce6 100644 --- a/high-availability.md +++ b/high-availability.md @@ -4,9 +4,9 @@ This document serves as a proposal for high availability of the scheduler and co ## Design Options For complete reference see [this](https://www.ibm.com/developerworks/community/blogs/RohitShetty/entry/high_availability_cold_warm_hot?lang=en) -1. Hot Standby: In this scenario, data and state are shared between the two components such that an immediate failure in one component causes the the standby deamon to take over exactly where the failed component had left off. This would be an ideal solution for kubernetes, however it poses a series of challenges in the case of controllers where component-state is cached locally and not persisted in a transactional way to a storage facility. This would also introduce additional load on the apiserver, which is not desirable. As a result, we are **NOT** planning on this approach at this time. +1. Hot Standby: In this scenario, data and state are shared between the two components such that an immediate failure in one component causes the the standby daemon to take over exactly where the failed component had left off. This would be an ideal solution for kubernetes, however it poses a series of challenges in the case of controllers where component-state is cached locally and not persisted in a transactional way to a storage facility. This would also introduce additional load on the apiserver, which is not desirable. As a result, we are **NOT** planning on this approach at this time. -2. **Warm Standby**: In this scenario there is only one active component acting as the master and additional components running but not providing service or responding to requests. Data and state are not shared between the active and standby components. When a failure occurs, the standby component that becomes the master must determine the current state of the system before resuming functionality. This is the apprach that this proposal will leverage. +2. **Warm Standby**: In this scenario there is only one active component acting as the master and additional components running but not providing service or responding to requests. Data and state are not shared between the active and standby components. When a failure occurs, the standby component that becomes the master must determine the current state of the system before resuming functionality. This is the approach that this proposal will leverage. 3. Active-Active (Load Balanced): Clients can simply load-balance across any number of servers that are currently running. Their general availability can be continuously updated, or published, such that load balancing only occurs across active participants. This aspect of HA is outside of the scope of *this* proposal because there is already a partial implementation in the apiserver. @@ -16,7 +16,7 @@ Implementation References: * [etcd](https://groups.google.com/forum/#!topic/etcd-dev/EbAa4fjypb4) * [initialPOC](https://github.com/rrati/etcd-ha) -In HA, the apiserver will provide an api for sets of replicated clients to do master election: acquire the lease, renew the lease, and release the lease. This api is component agnostic, so a client will need to provide the component type and the lease duration when attemping to become master. The lease duration should be tuned per component. The apiserver will attempt to create a key in etcd based on the component type that contains the client's hostname/ip and port information. This key will be created with a ttl from the lease duration provided in the request. Failure to create this key means there is already a master of that component type, and the error from etcd will propigate to the client. Successfully creating the key means the client making the request is the master. Only the current master can renew the lease. When renewing the lease, the apiserver will update the existing key with a new ttl. The location in etcd for the HA keys is TBD. +In HA, the apiserver will provide an api for sets of replicated clients to do master election: acquire the lease, renew the lease, and release the lease. This api is component agnostic, so a client will need to provide the component type and the lease duration when attempting to become master. The lease duration should be tuned per component. The apiserver will attempt to create a key in etcd based on the component type that contains the client's hostname/ip and port information. This key will be created with a ttl from the lease duration provided in the request. Failure to create this key means there is already a master of that component type, and the error from etcd will propagate to the client. Successfully creating the key means the client making the request is the master. Only the current master can renew the lease. When renewing the lease, the apiserver will update the existing key with a new ttl. The location in etcd for the HA keys is TBD. The first component to request leadership will become the master. All other components of that type will fail until the current leader releases the lease, or fails to renew the lease within the expiration time. On startup, all components should attempt to become master. The component that succeeds becomes the master, and should perform all functions of that component. The components that fail to become the master should not perform any tasks and sleep for their lease duration and then attempt to become the master again. A clean shutdown of the leader will cause a release of the lease and a new master will be elected. -- cgit v1.2.3 From 16355903a3e2954988791e55864edfdf2d82fd5d Mon Sep 17 00:00:00 2001 From: Marek Biskup Date: Wed, 17 Jun 2015 12:36:19 +0200 Subject: double dash replaced by html mdash --- networking.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/networking.md b/networking.md index cd2bd0c5..66234e6b 100644 --- a/networking.md +++ b/networking.md @@ -10,11 +10,11 @@ With the IP-per-pod model, all user containers within a pod behave as if they ar In addition to avoiding the aforementioned problems with dynamic port allocation, this approach reduces friction for applications moving from the world of uncontainerized apps on physical or virtual hosts to containers within pods. People running application stacks together on the same host have already figured out how to make ports not conflict (e.g., by configuring them through environment variables) and have arranged for clients to find them. -The approach does reduce isolation between containers within a pod -- ports could conflict, and there couldn't be private ports across containers within a pod, but applications requiring their own port spaces could just run as separate pods and processes requiring private communication could run within the same container. Besides, the premise of pods is that containers within a pod share some resources (volumes, cpu, ram, etc.) and therefore expect and tolerate reduced isolation. Additionally, the user can control what containers belong to the same pod whereas, in general, they don't control what pods land together on a host. +The approach does reduce isolation between containers within a pod — ports could conflict, and there couldn't be private ports across containers within a pod, but applications requiring their own port spaces could just run as separate pods and processes requiring private communication could run within the same container. Besides, the premise of pods is that containers within a pod share some resources (volumes, cpu, ram, etc.) and therefore expect and tolerate reduced isolation. Additionally, the user can control what containers belong to the same pod whereas, in general, they don't control what pods land together on a host. -When any container calls SIOCGIFADDR, it sees the IP that any peer container would see them coming from -- each pod has its own IP address that other pods can know. By making IP addresses and ports the same within and outside the containers and pods, we create a NAT-less, flat address space. "ip addr show" should work as expected. This would enable all existing naming/discovery mechanisms to work out of the box, including self-registration mechanisms and applications that distribute IP addresses. (We should test that with etcd and perhaps one other option, such as Eureka (used by Acme Air) or Consul.) We should be optimizing for inter-pod network communication. Within a pod, containers are more likely to use communication through volumes (e.g., tmpfs) or IPC. +When any container calls SIOCGIFADDR, it sees the IP that any peer container would see them coming from — each pod has its own IP address that other pods can know. By making IP addresses and ports the same within and outside the containers and pods, we create a NAT-less, flat address space. "ip addr show" should work as expected. This would enable all existing naming/discovery mechanisms to work out of the box, including self-registration mechanisms and applications that distribute IP addresses. (We should test that with etcd and perhaps one other option, such as Eureka (used by Acme Air) or Consul.) We should be optimizing for inter-pod network communication. Within a pod, containers are more likely to use communication through volumes (e.g., tmpfs) or IPC. -This is different from the standard Docker model. In that mode, each container gets an IP in the 172-dot space and would only see that 172-dot address from SIOCGIFADDR. If these containers connect to another container the peer would see the connect coming from a different IP than the container itself knows. In short - you can never self-register anything from a container, because a container can not be reached on its private IP. +This is different from the standard Docker model. In that mode, each container gets an IP in the 172-dot space and would only see that 172-dot address from SIOCGIFADDR. If these containers connect to another container the peer would see the connect coming from a different IP than the container itself knows. In short — you can never self-register anything from a container, because a container can not be reached on its private IP. An alternative we considered was an additional layer of addressing: pod-centric IP per container. Each container would have its own local IP address, visible only within that pod. This would perhaps make it easier for containerized applications to move from physical/virtual hosts to pods, but would be more complex to implement (e.g., requiring a bridge per pod, split-horizon/VP DNS) and to reason about, due to the additional layer of address translation, and would break self-registration and IP distribution mechanisms. @@ -53,7 +53,7 @@ GCE itself does not know anything about these IPs, though. These are not externally routable, though, so containers that need to communicate with the outside world need to use host networking. To set up an external IP that forwards to the VM, it will only forward to the VM's primary IP (which is assigned to no pod). So we use docker's -p flag to map published ports to the main interface. This has the side effect of disallowing two pods from exposing the same port. (More discussion on this in [Issue #390](https://github.com/GoogleCloudPlatform/kubernetes/issues/390).) -We create a container to use for the pod network namespace -- a single loopback device and a single veth device. All the user's containers get their network namespaces from this pod networking container. +We create a container to use for the pod network namespace — a single loopback device and a single veth device. All the user's containers get their network namespaces from this pod networking container. Docker allocates IP addresses from a bridge we create on each node, using its “container” networking mode. @@ -89,7 +89,7 @@ We'd also like to accommodate other load-balancing solutions (e.g., HAProxy), no ### External routability -We want traffic between containers to use the pod IP addresses across nodes. Say we have Node A with a container IP space of 10.244.1.0/24 and Node B with a container IP space of 10.244.2.0/24. And we have Container A1 at 10.244.1.1 and Container B1 at 10.244.2.1. We want Container A1 to talk to Container B1 directly with no NAT. B1 should see the "source" in the IP packets of 10.244.1.1 -- not the "primary" host IP for Node A. That means that we want to turn off NAT for traffic between containers (and also between VMs and containers). +We want traffic between containers to use the pod IP addresses across nodes. Say we have Node A with a container IP space of 10.244.1.0/24 and Node B with a container IP space of 10.244.2.0/24. And we have Container A1 at 10.244.1.1 and Container B1 at 10.244.2.1. We want Container A1 to talk to Container B1 directly with no NAT. B1 should see the "source" in the IP packets of 10.244.1.1 — not the "primary" host IP for Node A. That means that we want to turn off NAT for traffic between containers (and also between VMs and containers). We'd also like to make pods directly routable from the external internet. However, we can't yet support the extra container IPs that we've provisioned talking to the internet directly. So, we don't map external IPs to the container IPs. Instead, we solve that problem by having traffic that isn't to the internal network (! 10.0.0.0/8) get NATed through the primary host IP address so that it can get 1:1 NATed by the GCE networking when talking to the internet. Similarly, incoming traffic from the internet has to get NATed/proxied through the host IP. -- cgit v1.2.3 From 9e35c48d4abfa4b1bae2b4ed3a81047d6604985e Mon Sep 17 00:00:00 2001 From: RichieEscarez Date: Tue, 16 Jun 2015 14:48:51 -0700 Subject: Qualified all references to "controller" so that references to "replication controller" are clear. fixes #9404 Also ran hacks/run-gendocs.sh --- access.md | 2 +- service_accounts.md | 2 +- simple-rolling-update.md | 14 +++++++------- 3 files changed, 9 insertions(+), 9 deletions(-) diff --git a/access.md b/access.md index 647ce552..dd64784e 100644 --- a/access.md +++ b/access.md @@ -193,7 +193,7 @@ K8s authorization should: - Allow for a range of maturity levels, from single-user for those test driving the system, to integration with existing to enterprise authorization systems. - Allow for centralized management of users and policies. In some organizations, this will mean that the definition of users and access policies needs to reside on a system other than k8s and encompass other web services (such as a storage service). - Allow processes running in K8s Pods to take on identity, and to allow narrow scoping of permissions for those identities in order to limit damage from software faults. -- Have Authorization Policies exposed as API objects so that a single config file can create or delete Pods, Controllers, Services, and the identities and policies for those Pods and Controllers. +- Have Authorization Policies exposed as API objects so that a single config file can create or delete Pods, Replication Controllers, Services, and the identities and policies for those Pods and Replication Controllers. - Be separate as much as practical from Authentication, to allow Authentication methods to change over time and space, without impacting Authorization policies. K8s will implement a relatively simple diff --git a/service_accounts.md b/service_accounts.md index e87e8e6c..63c12a30 100644 --- a/service_accounts.md +++ b/service_accounts.md @@ -5,7 +5,7 @@ Processes in Pods may need to call the Kubernetes API. For example: - scheduler - replication controller - - minion controller + - node controller - a map-reduce type framework which has a controller that then tries to make a dynamically determined number of workers and watch them - continuous build and push system - monitoring system diff --git a/simple-rolling-update.md b/simple-rolling-update.md index e5b47d98..0208b609 100644 --- a/simple-rolling-update.md +++ b/simple-rolling-update.md @@ -8,20 +8,20 @@ Assume that we have a current replication controller named ```foo``` and it is r ```kubectl rolling-update rc foo [foo-v2] --image=myimage:v2``` -If the user doesn't specify a name for the 'next' controller, then the 'next' controller is renamed to -the name of the original controller. +If the user doesn't specify a name for the 'next' replication controller, then the 'next' replication controller is renamed to +the name of the original replication controller. Obviously there is a race here, where if you kill the client between delete foo, and creating the new version of 'foo' you might be surprised about what is there, but I think that's ok. See [Recovery](#recovery) below -If the user does specify a name for the 'next' controller, then the 'next' controller is retained with its existing name, -and the old 'foo' controller is deleted. For the purposes of the rollout, we add a unique-ifying label ```kubernetes.io/deployment``` to both the ```foo``` and ```foo-next``` controllers. -The value of that label is the hash of the complete JSON representation of the```foo-next``` or```foo``` controller. The name of this label can be overridden by the user with the ```--deployment-label-key``` flag. +If the user does specify a name for the 'next' replication controller, then the 'next' replication controller is retained with its existing name, +and the old 'foo' replication controller is deleted. For the purposes of the rollout, we add a unique-ifying label ```kubernetes.io/deployment``` to both the ```foo``` and ```foo-next``` replication controllers. +The value of that label is the hash of the complete JSON representation of the```foo-next``` or```foo``` replication controller. The name of this label can be overridden by the user with the ```--deployment-label-key``` flag. #### Recovery If a rollout fails or is terminated in the middle, it is important that the user be able to resume the roll out. -To facilitate recovery in the case of a crash of the updating process itself, we add the following annotations to each replicaController in the ```kubernetes.io/``` annotation namespace: - * ```desired-replicas``` The desired number of replicas for this controller (either N or zero) +To facilitate recovery in the case of a crash of the updating process itself, we add the following annotations to each replication controller in the ```kubernetes.io/``` annotation namespace: + * ```desired-replicas``` The desired number of replicas for this replication controller (either N or zero) * ```update-partner``` A pointer to the replication controller resource that is the other half of this update (syntax `````` the namespace is assumed to be identical to the namespace of this replication controller.) Recovery is achieved by issuing the same command again: -- cgit v1.2.3 From 98ebb76f76a840d96487141106b13bfa071ed94f Mon Sep 17 00:00:00 2001 From: Alex Robinson Date: Thu, 18 Jun 2015 00:14:27 +0000 Subject: Add devel doc laying out the steps to add new metrics to the code base. --- instrumentation.md | 36 ++++++++++++++++++++++++++++++++++++ 1 file changed, 36 insertions(+) create mode 100644 instrumentation.md diff --git a/instrumentation.md b/instrumentation.md new file mode 100644 index 00000000..b52480d2 --- /dev/null +++ b/instrumentation.md @@ -0,0 +1,36 @@ +Instrumenting Kubernetes with a new metric +=================== + +The following is a step-by-step guide for adding a new metric to the Kubernetes code base. + +We use the Prometheus monitoring system's golang client library for instrumenting our code. Once you've picked out a file that you want to add a metric to, you should: + +1. Import "github.com/prometheus/client_golang/prometheus". + +2. Create a top-level var to define the metric. For this, you have to: + 1. Pick the type of metric. Use a Gauge for things you want to set to a particular value, a Counter for things you want to increment, or a Histogram or Summary for histograms/distributions of values (typically for latency). Histograms are better if you're going to aggregate the values across jobs, while summaries are better if you just want the job to give you a useful summary of the values. + 2. Give the metric a name and description. + 3. Pick whether you want to distinguish different categories of things using labels on the metric. If so, add "Vec" to the name of the type of metric you want and add a slice of the label names to the definition. + + https://github.com/GoogleCloudPlatform/kubernetes/blob/cd3299307d44665564e1a5c77d0daa0286603ff5/pkg/apiserver/apiserver.go#L53 + https://github.com/GoogleCloudPlatform/kubernetes/blob/cd3299307d44665564e1a5c77d0daa0286603ff5/pkg/kubelet/metrics/metrics.go#L31 + +3. Register the metric so that prometheus will know to export it. + + https://github.com/GoogleCloudPlatform/kubernetes/blob/cd3299307d44665564e1a5c77d0daa0286603ff5/pkg/kubelet/metrics/metrics.go#L74 + https://github.com/GoogleCloudPlatform/kubernetes/blob/cd3299307d44665564e1a5c77d0daa0286603ff5/pkg/apiserver/apiserver.go#L78 + +4. Use the metric by calling the appropriate method for your metric type (Set, Inc/Add, or Observe, respectively for Gauge, Counter, or Histogram/Summary), first calling WithLabelValues if your metric has any labels + + https://github.com/GoogleCloudPlatform/kubernetes/blob/3ce7fe8310ff081dbbd3d95490193e1d5250d2c9/pkg/kubelet/kubelet.go#L1384 + https://github.com/GoogleCloudPlatform/kubernetes/blob/cd3299307d44665564e1a5c77d0daa0286603ff5/pkg/apiserver/apiserver.go#L87 + + +These are the metric type definitions if you're curious to learn about them or need more information: +https://github.com/prometheus/client_golang/blob/master/prometheus/gauge.go +https://github.com/prometheus/client_golang/blob/master/prometheus/counter.go +https://github.com/prometheus/client_golang/blob/master/prometheus/histogram.go +https://github.com/prometheus/client_golang/blob/master/prometheus/summary.go + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/instrumentation.md?pixel)]() -- cgit v1.2.3 From dab63f2280e28641cc0b0890c919106059243cbf Mon Sep 17 00:00:00 2001 From: Marek Biskup Date: Fri, 19 Jun 2015 17:41:12 +0200 Subject: add links to unlinked documents; move making-release-notes.md to docs/devel --- README.md | 2 ++ making-release-notes.md | 33 +++++++++++++++++++++++++++++++++ 2 files changed, 35 insertions(+) create mode 100644 making-release-notes.md diff --git a/README.md b/README.md index 13ccc42d..3ee8a244 100644 --- a/README.md +++ b/README.md @@ -6,6 +6,8 @@ Docs in this directory relate to developing Kubernetes. * **Development Guide** ([development.md](development.md)): Setting up your environment tests. +* **Making release notes** ([making-release-notes.md](making-release-notes.md)): Generating release nodes for a new release. + * **Hunting flaky tests** ([flaky-tests.md](flaky-tests.md)): We have a goal of 99.9% flake free tests. Here's how to run your tests many times. diff --git a/making-release-notes.md b/making-release-notes.md new file mode 100644 index 00000000..823bff64 --- /dev/null +++ b/making-release-notes.md @@ -0,0 +1,33 @@ +## Making release notes +This documents the process for making release notes for a release. + +### 1) Note the PR number of the previous release +Find the PR that was merged with the previous release. Remember this number +_TODO_: Figure out a way to record this somewhere to save the next release engineer time. + +### 2) Build the release-notes tool +```bash +${KUBERNETES_ROOT}/build/make-release-notes.sh +``` + +### 3) Trim the release notes +This generates a list of the entire set of PRs merged since the last release. It is likely long +and many PRs aren't worth mentioning. + +Open up ```candidate-notes.md``` in your favorite editor. + +Remove, regroup, organize to your hearts content. + + +### 4) Update CHANGELOG.md +With the final markdown all set, cut and paste it to the top of ```CHANGELOG.md``` + +### 5) Update the Release page + * Switch to the [releases](https://github.com/GoogleCloudPlatform/kubernetes/releases) page. + * Open up the release you are working on. + * Cut and paste the final markdown from above into the release notes + * Press Save. + + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/making-release-notes.md?pixel)]() -- cgit v1.2.3 From 7cf9d2ca9006732cfb199e48d011c2d520461ec9 Mon Sep 17 00:00:00 2001 From: Mike Danese Date: Fri, 19 Jun 2015 09:59:27 -0700 Subject: fix master precommit hook --- making-release-notes.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/making-release-notes.md b/making-release-notes.md index 823bff64..ffccf6d3 100644 --- a/making-release-notes.md +++ b/making-release-notes.md @@ -30,4 +30,4 @@ With the final markdown all set, cut and paste it to the top of ```CHANGELOG.md` -[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/making-release-notes.md?pixel)]() +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/making-release-notes.md?pixel)]() -- cgit v1.2.3 From add7066dad3a40e2b8f6891e5dd2cb1943e4bb6c Mon Sep 17 00:00:00 2001 From: goltermann Date: Tue, 23 Jun 2015 11:46:19 -0700 Subject: Add PR merge policy for RC. Link to ok-to-merge label --- pull-requests.md | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/pull-requests.md b/pull-requests.md index 627bc64e..1b5c30e6 100644 --- a/pull-requests.md +++ b/pull-requests.md @@ -15,5 +15,17 @@ We want to limit the total number of PRs in flight to: * Remove old PRs that would be difficult to rebase as the underlying code has changed over time * Encourage code velocity +RC to v1.0 Pull Requests +------------------------ + +Between the first RC build (~6/22) and v1.0, we will adopt a higher bar for PR merges. For v1.0 to be a stable release, we need to ensure that any fixes going in are very well tested and have a low risk of breaking anything. Refactors and complex changes will be rejected in favor of more strategic and smaller workarounds. + +These PRs require: +* A risk assessment by the code author in the PR. This should outline which parts of the code are being touched, the risk of regression, and complexity of the code. +* Two LGTMs from experienced reviewers. + +Once those requirements are met, they will be labeled [ok-to-merge](https://github.com/GoogleCloudPlatform/kubernetes/pulls?utf8=%E2%9C%93&q=is%3Aopen+is%3Apr+label%3Aok-to-merge) and can be merged. + +These restrictions will be relaxed after v1.0 is released. [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/pull-requests.md?pixel)]() -- cgit v1.2.3 From 076fe1da6660e66243622d03d4d719ba3be35914 Mon Sep 17 00:00:00 2001 From: Marek Biskup Date: Thu, 25 Jun 2015 08:36:44 +0200 Subject: add missing document links --- README.md | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/README.md b/README.md index 3ee8a244..dc2909ff 100644 --- a/README.md +++ b/README.md @@ -22,5 +22,13 @@ Docs in this directory relate to developing Kubernetes. * **Profiling Kubernetes** ([profiling.md](profiling.md)): How to plug in go pprof profiler to Kubernetes. +* **Instrumenting Kubernetes with a new metric** + ([instrumentation.md](instrumentation.md)): How to add a new metrics to the + Kubernetes code base. + +* **Coding Conventions** ([coding-conventions.md](coding-conventions.md)): + Coding style advice for contributors. + +* **Faster PR reviews** ([faster_reviews.md](faster_reviews.md)): How to get faster PR reviews. [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/README.md?pixel)]() -- cgit v1.2.3 From 51c7094856a53f8eb96204a4f5b6ee56815ac73d Mon Sep 17 00:00:00 2001 From: Quinton Hoole Date: Thu, 25 Jun 2015 19:15:59 -0700 Subject: Update federation.md --- federation.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/federation.md b/federation.md index a2d30017..748f8110 100644 --- a/federation.md +++ b/federation.md @@ -6,7 +6,7 @@ ## _by Quinton Hoole ([quinton@google.com](mailto:quinton@google.com))_ _Initial revision: 2015-03-05_ _Last updated: 2015-03-09_ -This doc: [tinyurl.com/ubernetes](http://tinyurl.com/ubernetes) +This doc: [tinyurl.com/ubernetes](http://tinyurl.com/ubernetesv2) Slides: [tinyurl.com/ubernetes-slides](http://tinyurl.com/ubernetes-slides) ## Introduction -- cgit v1.2.3 From 0b9ca955f4ee89ed39f1d8215ec850ac7bf7bbd0 Mon Sep 17 00:00:00 2001 From: Salvatore Dario Minonne Date: Fri, 26 Jun 2015 09:44:28 +0200 Subject: Adding IANA_SVC_NAME definition to docs/design/identifiers.md --- identifiers.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/identifiers.md b/identifiers.md index b75577c2..23b976d3 100644 --- a/identifiers.md +++ b/identifiers.md @@ -20,6 +20,8 @@ Name [rfc4122](http://www.ietf.org/rfc/rfc4122.txt) universally unique identifier (UUID) : A 128 bit generated value that is extremely unlikely to collide across time and space and requires no central coordination +[rfc6335](https://tools.ietf.org/rfc/rfc6335.txt) port name (IANA_SVC_NAME) +: An alphanumeric (a-z, and 0-9) string, with a maximum length of 15 characters, with the '-' character allowed anywhere except the first or the last character or adjacent to another '-' character, it must contain at least a (a-z) character ## Objectives for names and UIDs -- cgit v1.2.3 From 9ed56207140ce5ba468e9db71f7eaaf97789b871 Mon Sep 17 00:00:00 2001 From: Mike Danese Date: Fri, 26 Jun 2015 14:42:48 -0700 Subject: add documentation and script on how to get recent and "nightly" builds --- README.md | 2 ++ getting-builds.md | 24 ++++++++++++++++++++++++ 2 files changed, 26 insertions(+) create mode 100644 getting-builds.md diff --git a/README.md b/README.md index dc2909ff..5957902f 100644 --- a/README.md +++ b/README.md @@ -31,4 +31,6 @@ Docs in this directory relate to developing Kubernetes. * **Faster PR reviews** ([faster_reviews.md](faster_reviews.md)): How to get faster PR reviews. +* **Getting Recent Builds** ([getting-builds.md](getting-builds.md)): How to get recent builds including the latest builds to pass CI. + [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/README.md?pixel)]() diff --git a/getting-builds.md b/getting-builds.md new file mode 100644 index 00000000..dbad8f3a --- /dev/null +++ b/getting-builds.md @@ -0,0 +1,24 @@ +# Getting Kubernetes Builds + +You can use [hack/get-build.sh](../../hack/get-build.sh) to or use as a reference on how to get the most recent builds with curl. With `get-build.sh` you can grab the most recent stable build, the most recent release candidate, or the most recent build to pass our ci and gce e2e tests (essentially a nightly build). + +``` +usage: + ./hack/get-build.sh [stable|release|latest|latest-green] + + stable: latest stable version + release: latest release candidate + latest: latest ci build + latest-green: latest ci build to pass gce e2e +``` + +You can also use the gsutil tool to explore the Google Cloud Storage release bucket. Here are some examples: +``` +gsutil cat gs://kubernetes-release/ci/latest.txt # output the latest ci version number +gsutil cat gs://kubernetes-release/ci/latest-green.txt # output the latest ci version number that passed gce e2e +gsutil ls gs://kubernetes-release/ci/v0.20.0-29-g29a55cc/ # list the contents of a ci release +gsutil ls gs://kubernetes-release/release # list all official releases and rcs +``` + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/getting-builds.md?pixel)]() -- cgit v1.2.3 From 7c21cef64b9ea25e0a160c0e7384c3af4ccdd258 Mon Sep 17 00:00:00 2001 From: David Oppenheimer Date: Tue, 30 Jun 2015 00:51:16 -0700 Subject: Initial design doc for scheduler. --- scheduler.md | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 50 insertions(+) create mode 100644 scheduler.md diff --git a/scheduler.md b/scheduler.md new file mode 100644 index 00000000..e2a9f35d --- /dev/null +++ b/scheduler.md @@ -0,0 +1,50 @@ + +# The Kubernetes Scheduler + +The Kubernetes scheduler runs as a process alongside the other master +components such as the API server. Its interface to the API server is to watch +for Pods with an empty PodSpec.NodeName, and for each Pod, it posts a Binding +indicating where the Pod should be scheduled. + +## The scheduling process + +The scheduler tries to find a node for each Pod, one at a time, as it notices +these Pods via watch. There are three steps. First it applies a set of "predicates" that filter out +inappropriate nodes. For example, if the PodSpec specifies resource limits, then the scheduler +will filter out nodes that don't have at least that much resources available (computed +as the capacity of the node minus the sum of the resource limits of the containers that +are already running on the node). Second, it applies a set of "priority functions" +that rank the nodes that weren't filtered out by the predicate check. For example, +it tries to spread Pods across nodes while at the same time favoring the least-loaded +nodes (where "load" here is sum of the resource limits of the containers running on the node, +divided by the node's capacity). +Finally, the node with the highest priority is chosen +(or, if there are multiple such nodes, then one of them is chosen at random). The code +for this main scheduling loop is in the function `Schedule()` in +[plugin/pkg/scheduler/generic_scheduler.go](../../plugin/pkg/scheduler/generic_scheduler.go) + +## Scheduler extensibility + +The scheduler is extensible: the cluster administrator can choose which of the pre-defined +scheduling policies to apply, and can add new ones. The built-in predicates and priorities are +defined in [plugin/pkg/scheduler/algorithm/predicates/predicates.go](../../plugin/pkg/scheduler/algorithm/predicates/predicates.go) and +[plugin/pkg/scheduler/algorithm/priorities/priorities.go](../../plugin/pkg/scheduler/algorithm/priorities/priorities.go), respectively. +The policies that are applied when scheduling can be chosen in one of two ways. Normally, +the policies used are selected by the functions `defaultPredicates()` and `defaultPriorities()` in +[plugin/pkg/scheduler/algorithmprovider/defaults/defaults.go](../../plugin/pkg/scheduler/algorithmprovider/defaults/defaults.go). +However, the choice of policies +can be overridden by passing the command-line flag `--policy-config-file` to the scheduler, pointing to a JSON +file specifying which scheduling policies to use. See +[examples/scheduler-policy-config.json](../../examples/scheduler-policy-config.json) for an example +config file. (Note that the config file format is versioned; the API is defined in +[plugin/pkg/scheduler/api/](../../plugin/pkg/scheduler/api/)). +Thus to add a new scheduling policy, you should modify predicates.go or priorities.go, +and either register the policy in `defaultPredicates()` or `defaultPriorities()`, or use a policy config file. + +## Exploring the code + +If you want to get a global picture of how the scheduler works, you can start in +[plugin/cmd/kube-scheduler/app/server.go](../../plugin/cmd/kube-scheduler/app/server.go) + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/scheduler.md?pixel)]() -- cgit v1.2.3 From 38a8d7b84534e5c162bb4b1f4a9ce05ab32b132c Mon Sep 17 00:00:00 2001 From: Quinton Hoole Date: Wed, 1 Jul 2015 14:58:08 -0700 Subject: Fix incorrect doc link in Federation.md --- federation.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/federation.md b/federation.md index 748f8110..efdd726a 100644 --- a/federation.md +++ b/federation.md @@ -6,7 +6,7 @@ ## _by Quinton Hoole ([quinton@google.com](mailto:quinton@google.com))_ _Initial revision: 2015-03-05_ _Last updated: 2015-03-09_ -This doc: [tinyurl.com/ubernetes](http://tinyurl.com/ubernetesv2) +This doc: [tinyurl.com/ubernetesv2](http://tinyurl.com/ubernetesv2) Slides: [tinyurl.com/ubernetes-slides](http://tinyurl.com/ubernetes-slides) ## Introduction -- cgit v1.2.3 From 667782f84caa0247eb227350810b414a2d84c3e0 Mon Sep 17 00:00:00 2001 From: Eric Tune Date: Tue, 30 Jun 2015 13:27:31 -0700 Subject: Add user-oriented compute resource doc. Adds docs/compute_resources.md with user-oriented explanation of compute resources. Reveals detail gradually and includes examples and troubleshooting. Examples are tested. Moves design-focused docs/resources.md to docs/design/resources.md. Updates links to that. --- access.md | 2 +- resources.md | 216 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 217 insertions(+), 1 deletion(-) create mode 100644 resources.md diff --git a/access.md b/access.md index dd64784e..72ca969c 100644 --- a/access.md +++ b/access.md @@ -212,7 +212,7 @@ Policy objects may be applicable only to a single namespace or to all namespaces ## Accounting -The API should have a `quota` concept (see https://github.com/GoogleCloudPlatform/kubernetes/issues/442). A quota object relates a namespace (and optionally a label selector) to a maximum quantity of resources that may be used (see [resources.md](/docs/resources.md)). +The API should have a `quota` concept (see https://github.com/GoogleCloudPlatform/kubernetes/issues/442). A quota object relates a namespace (and optionally a label selector) to a maximum quantity of resources that may be used (see [resources design doc](resources.md)). Initially: - a `quota` object is immutable. diff --git a/resources.md b/resources.md new file mode 100644 index 00000000..17bb5c18 --- /dev/null +++ b/resources.md @@ -0,0 +1,216 @@ +**Note: this is a design doc, which describes features that have not been completely implemented. +User documentation of the current state is [here](../resources.md). The tracking issue for +implementation of this model is +[#168](https://github.com/GoogleCloudPlatform/kubernetes/issues/168). Currently, only memory and +cpu limits on containers (not pods) are supported. "memory" is in bytes and "cpu" is in +milli-cores.** + +# The Kubernetes resource model + +To do good pod placement, Kubernetes needs to know how big pods are, as well as the sizes of the nodes onto which they are being placed. The definition of "how big" is given by the Kubernetes resource model — the subject of this document. + +The resource model aims to be: +* simple, for common cases; +* extensible, to accommodate future growth; +* regular, with few special cases; and +* precise, to avoid misunderstandings and promote pod portability. + +## The resource model +A Kubernetes _resource_ is something that can be requested by, allocated to, or consumed by a pod or container. Examples include memory (RAM), CPU, disk-time, and network bandwidth. + +Once resources on a node have been allocated to one pod, they should not be allocated to another until that pod is removed or exits. This means that Kubernetes schedulers should ensure that the sum of the resources allocated (requested and granted) to its pods never exceeds the usable capacity of the node. Testing whether a pod will fit on a node is called _feasibility checking_. + +Note that the resource model currently prohibits over-committing resources; we will want to relax that restriction later. + +### Resource types + +All resources have a _type_ that is identified by their _typename_ (a string, e.g., "memory"). Several resource types are predefined by Kubernetes (a full list is below), although only two will be supported at first: CPU and memory. Users and system administrators can define their own resource types if they wish (e.g., Hadoop slots). + +A fully-qualified resource typename is constructed from a DNS-style _subdomain_, followed by a slash `/`, followed by a name. +* The subdomain must conform to [RFC 1123](http://www.ietf.org/rfc/rfc1123.txt) (e.g., `kubernetes.io`, `example.com`). +* The name must be not more than 63 characters, consisting of upper- or lower-case alphanumeric characters, with the `-`, `_`, and `.` characters allowed anywhere except the first or last character. +* As a shorthand, any resource typename that does not start with a subdomain and a slash will automatically be prefixed with the built-in Kubernetes _namespace_, `kubernetes.io/` in order to fully-qualify it. This namespace is reserved for code in the open source Kubernetes repository; as a result, all user typenames MUST be fully qualified, and cannot be created in this namespace. + +Some example typenames include `memory` (which will be fully-qualified as `kubernetes.io/memory`), and `example.com/Shiny_New-Resource.Type`. + +For future reference, note that some resources, such as CPU and network bandwidth, are _compressible_, which means that their usage can potentially be throttled in a relatively benign manner. All other resources are _incompressible_, which means that any attempt to throttle them is likely to cause grief. This distinction will be important if a Kubernetes implementation supports over-committing of resources. + +### Resource quantities + +Initially, all Kubernetes resource types are _quantitative_, and have an associated _unit_ for quantities of the associated resource (e.g., bytes for memory, bytes per seconds for bandwidth, instances for software licences). The units will always be a resource type's natural base units (e.g., bytes, not MB), to avoid confusion between binary and decimal multipliers and the underlying unit multiplier (e.g., is memory measured in MiB, MB, or GB?). + +Resource quantities can be added and subtracted: for example, a node has a fixed quantity of each resource type that can be allocated to pods/containers; once such an allocation has been made, the allocated resources cannot be made available to other pods/containers without over-committing the resources. + +To make life easier for people, quantities can be represented externally as unadorned integers, or as fixed-point integers with one of these SI suffices (E, P, T, G, M, K, m) or their power-of-two equivalents (Ei, Pi, Ti, Gi, Mi, Ki). For example, the following represent roughly the same value: 128974848, "129e6", "129M" , "123Mi". Small quantities can be represented directly as decimals (e.g., 0.3), or using milli-units (e.g., "300m"). + * "Externally" means in user interfaces, reports, graphs, and in JSON or YAML resource specifications that might be generated or read by people. + * Case is significant: "m" and "M" are not the same, so "k" is not a valid SI suffix. There are no power-of-two equivalents for SI suffixes that represent multipliers less than 1. + * These conventions only apply to resource quantities, not arbitrary values. + +Internally (i.e., everywhere else), Kubernetes will represent resource quantities as integers so it can avoid problems with rounding errors, and will not use strings to represent numeric values. To achieve this, quantities that naturally have fractional parts (e.g., CPU seconds/second) will be scaled to integral numbers of milli-units (e.g., milli-CPUs) as soon as they are read in. Internal APIs, data structures, and protobufs will use these scaled integer units. Raw measurement data such as usage may still need to be tracked and calculated using floating point values, but internally they should be rescaled to avoid some values being in milli-units and some not. + * Note that reading in a resource quantity and writing it out again may change the way its values are represented, and truncate precision (e.g., 1.0001 may become 1.000), so comparison and difference operations (e.g., by an updater) must be done on the internal representations. + * Avoiding milli-units in external representations has advantages for people who will use Kubernetes, but runs the risk of developers forgetting to rescale or accidentally using floating-point representations. That seems like the right choice. We will try to reduce the risk by providing libraries that automatically do the quantization for JSON/YAML inputs. + +### Resource specifications + +Both users and a number of system components, such as schedulers, (horizontal) auto-scalers, (vertical) auto-sizers, load balancers, and worker-pool managers need to reason about resource requirements of workloads, resource capacities of nodes, and resource usage. Kubernetes divides specifications of *desired state*, aka the Spec, and representations of *current state*, aka the Status. Resource requirements and total node capacity fall into the specification category, while resource usage, characterizations derived from usage (e.g., maximum usage, histograms), and other resource demand signals (e.g., CPU load) clearly fall into the status category and are discussed in the Appendix for now. + +Resource requirements for a container or pod should have the following form: +``` +resourceRequirementSpec: [ + request: [ cpu: 2.5, memory: "40Mi" ], + limit: [ cpu: 4.0, memory: "99Mi" ], +] +``` +Where: +* _request_ [optional]: the amount of resources being requested, or that were requested and have been allocated. Scheduler algorithms will use these quantities to test feasibility (whether a pod will fit onto a node). If a container (or pod) tries to use more resources than its _request_, any associated SLOs are voided — e.g., the program it is running may be throttled (compressible resource types), or the attempt may be denied. If _request_ is omitted for a container, it defaults to _limit_ if that is explicitly specified, otherwise to an implementation-defined value; this will always be 0 for a user-defined resource type. If _request_ is omitted for a pod, it defaults to the sum of the (explicit or implicit) _request_ values for the containers it encloses. + +* _limit_ [optional]: an upper bound or cap on the maximum amount of resources that will be made available to a container or pod; if a container or pod uses more resources than its _limit_, it may be terminated. The _limit_ defaults to "unbounded"; in practice, this probably means the capacity of an enclosing container, pod, or node, but may result in non-deterministic behavior, especially for memory. + +Total capacity for a node should have a similar structure: +``` +resourceCapacitySpec: [ + total: [ cpu: 12, memory: "128Gi" ] +] +``` +Where: +* _total_: the total allocatable resources of a node. Initially, the resources at a given scope will bound the resources of the sum of inner scopes. + +#### Notes + + * It is an error to specify the same resource type more than once in each list. + + * It is an error for the _request_ or _limit_ values for a pod to be less than the sum of the (explicit or defaulted) values for the containers it encloses. (We may relax this later.) + + * If multiple pods are running on the same node and attempting to use more resources than they have requested, the result is implementation-defined. For example: unallocated or unused resources might be spread equally across claimants, or the assignment might be weighted by the size of the original request, or as a function of limits, or priority, or the phase of the moon, perhaps modulated by the direction of the tide. Thus, although it's not mandatory to provide a _request_, it's probably a good idea. (Note that the _request_ could be filled in by an automated system that is observing actual usage and/or historical data.) + + * Internally, the Kubernetes master can decide the defaulting behavior and the kubelet implementation may expected an absolute specification. For example, if the master decided that "the default is unbounded" it would pass 2^64 to the kubelet. + + + +## Kubernetes-defined resource types +The following resource types are predefined ("reserved") by Kubernetes in the `kubernetes.io` namespace, and so cannot be used for user-defined resources. Note that the syntax of all resource types in the resource spec is deliberately similar, but some resource types (e.g., CPU) may receive significantly more support than simply tracking quantities in the schedulers and/or the Kubelet. + +### Processor cycles + * Name: `cpu` (or `kubernetes.io/cpu`) + * Units: Kubernetes Compute Unit seconds/second (i.e., CPU cores normalized to a canonical "Kubernetes CPU") + * Internal representation: milli-KCUs + * Compressible? yes + * Qualities: this is a placeholder for the kind of thing that may be supported in the future — see [#147](https://github.com/GoogleCloudPlatform/kubernetes/issues/147) + * [future] `schedulingLatency`: as per lmctfy + * [future] `cpuConversionFactor`: property of a node: the speed of a CPU core on the node's processor divided by the speed of the canonical Kubernetes CPU (a floating point value; default = 1.0). + +To reduce performance portability problems for pods, and to avoid worse-case provisioning behavior, the units of CPU will be normalized to a canonical "Kubernetes Compute Unit" (KCU, pronounced ˈko͝oko͞o), which will roughly be equivalent to a single CPU hyperthreaded core for some recent x86 processor. The normalization may be implementation-defined, although some reasonable defaults will be provided in the open-source Kubernetes code. + +Note that requesting 2 KCU won't guarantee that precisely 2 physical cores will be allocated — control of aspects like this will be handled by resource _qualities_ (a future feature). + + +### Memory + * Name: `memory` (or `kubernetes.io/memory`) + * Units: bytes + * Compressible? no (at least initially) + +The precise meaning of what "memory" means is implementation dependent, but the basic idea is to rely on the underlying `memcg` mechanisms, support, and definitions. + +Note that most people will want to use power-of-two suffixes (Mi, Gi) for memory quantities +rather than decimal ones: "64MiB" rather than "64MB". + + +## Resource metadata +A resource type may have an associated read-only ResourceType structure, that contains metadata about the type. For example: +``` +resourceTypes: [ + "kubernetes.io/memory": [ + isCompressible: false, ... + ] + "kubernetes.io/cpu": [ + isCompressible: true, internalScaleExponent: 3, ... + ] + "kubernetes.io/disk-space": [ ... } +] +``` + +Kubernetes will provide ResourceType metadata for its predefined types. If no resource metadata can be found for a resource type, Kubernetes will assume that it is a quantified, incompressible resource that is not specified in milli-units, and has no default value. + +The defined properties are as follows: + +| field name | type | contents | +| ---------- | ---- | -------- | +| name | string, required | the typename, as a fully-qualified string (e.g., `kubernetes.io/cpu`) | +| internalScaleExponent | int, default=0 | external values are multiplied by 10 to this power for internal storage (e.g., 3 for milli-units) | +| units | string, required | format: `unit* [per unit+]` (e.g., `second`, `byte per second`). An empty unit field means "dimensionless". | +| isCompressible | bool, default=false | true if the resource type is compressible | +| defaultRequest | string, default=none | in the same format as a user-supplied value | +| _[future]_ quantization | number, default=1 | smallest granularity of allocation: requests may be rounded up to a multiple of this unit; implementation-defined unit (e.g., the page size for RAM). | + + +# Appendix: future extensions + +The following are planned future extensions to the resource model, included here to encourage comments. + +## Usage data + +Because resource usage and related metrics change continuously, need to be tracked over time (i.e., historically), can be characterized in a variety of ways, and are fairly voluminous, we will not include usage in core API objects, such as [Pods](pods.md) and Nodes, but will provide separate APIs for accessing and managing that data. See the Appendix for possible representations of usage data, but the representation we'll use is TBD. + +Singleton values for observed and predicted future usage will rapidly prove inadequate, so we will support the following structure for extended usage information: + +``` +resourceStatus: [ + usage: [ cpu: , memory: ], + maxusage: [ cpu: , memory: ], + predicted: [ cpu: , memory: ], +] +``` + +where a `` or `` structure looks like this: +``` +{ + mean: # arithmetic mean + max: # minimum value + min: # maximum value + count: # number of data points + percentiles: [ # map from %iles to values + "10": <10th-percentile-value>, + "50": , + "99": <99th-percentile-value>, + "99.9": <99.9th-percentile-value>, + ... + ] + } +``` +All parts of this structure are optional, although we strongly encourage including quantities for 50, 90, 95, 99, 99.5, and 99.9 percentiles. _[In practice, it will be important to include additional info such as the length of the time window over which the averages are calculated, the confidence level, and information-quality metrics such as the number of dropped or discarded data points.]_ +and predicted + +## Future resource types + +### _[future] Network bandwidth_ + * Name: "network-bandwidth" (or `kubernetes.io/network-bandwidth`) + * Units: bytes per second + * Compressible? yes + +### _[future] Network operations_ + * Name: "network-iops" (or `kubernetes.io/network-iops`) + * Units: operations (messages) per second + * Compressible? yes + +### _[future] Storage space_ + * Name: "storage-space" (or `kubernetes.io/storage-space`) + * Units: bytes + * Compressible? no + +The amount of secondary storage space available to a container. The main target is local disk drives and SSDs, although this could also be used to qualify remotely-mounted volumes. Specifying whether a resource is a raw disk, an SSD, a disk array, or a file system fronting any of these, is left for future work. + +### _[future] Storage time_ + * Name: storage-time (or `kubernetes.io/storage-time`) + * Units: seconds per second of disk time + * Internal representation: milli-units + * Compressible? yes + +This is the amount of time a container spends accessing disk, including actuator and transfer time. A standard disk drive provides 1.0 diskTime seconds per second. + +### _[future] Storage operations_ + * Name: "storage-iops" (or `kubernetes.io/storage-iops`) + * Units: operations per second + * Compressible? yes + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/resources.md?pixel)]() -- cgit v1.2.3 From b2a3f3fbbed6e3fc57151e8016e9de02782f822b Mon Sep 17 00:00:00 2001 From: Tim Hockin Date: Mon, 6 Jul 2015 15:58:00 -0700 Subject: De-dup,overhaul networking docs --- networking.md | 247 +++++++++++++++++++++++++++++++++++++--------------------- 1 file changed, 157 insertions(+), 90 deletions(-) diff --git a/networking.md b/networking.md index 66234e6b..8bf03437 100644 --- a/networking.md +++ b/networking.md @@ -1,107 +1,174 @@ # Networking -## Model and motivation - -Kubernetes deviates from the default Docker networking model. The goal is for each pod to have an IP in a flat shared networking namespace that has full communication with other physical computers and containers across the network. IP-per-pod creates a clean, backward-compatible model where pods can be treated much like VMs or physical hosts from the perspectives of port allocation, networking, naming, service discovery, load balancing, application configuration, and migration. - -OTOH, dynamic port allocation requires supporting both static ports (e.g., for externally accessible services) and dynamically allocated ports, requires partitioning centrally allocated and locally acquired dynamic ports, complicates scheduling (since ports are a scarce resource), is inconvenient for users, complicates application configuration, is plagued by port conflicts and reuse and exhaustion, requires non-standard approaches to naming (e.g., etcd rather than DNS), requires proxies and/or redirection for programs using standard naming/addressing mechanisms (e.g., web browsers), requires watching and cache invalidation for address/port changes for instances in addition to watching group membership changes, and obstructs container/pod migration (e.g., using CRIU). NAT introduces additional complexity by fragmenting the addressing space, which breaks self-registration mechanisms, among other problems. - -With the IP-per-pod model, all user containers within a pod behave as if they are on the same host with regard to networking. They can all reach each other’s ports on localhost. Ports which are published to the host interface are done so in the normal Docker way. All containers in all pods can talk to all other containers in all other pods by their 10-dot addresses. - -In addition to avoiding the aforementioned problems with dynamic port allocation, this approach reduces friction for applications moving from the world of uncontainerized apps on physical or virtual hosts to containers within pods. People running application stacks together on the same host have already figured out how to make ports not conflict (e.g., by configuring them through environment variables) and have arranged for clients to find them. - -The approach does reduce isolation between containers within a pod — ports could conflict, and there couldn't be private ports across containers within a pod, but applications requiring their own port spaces could just run as separate pods and processes requiring private communication could run within the same container. Besides, the premise of pods is that containers within a pod share some resources (volumes, cpu, ram, etc.) and therefore expect and tolerate reduced isolation. Additionally, the user can control what containers belong to the same pod whereas, in general, they don't control what pods land together on a host. - -When any container calls SIOCGIFADDR, it sees the IP that any peer container would see them coming from — each pod has its own IP address that other pods can know. By making IP addresses and ports the same within and outside the containers and pods, we create a NAT-less, flat address space. "ip addr show" should work as expected. This would enable all existing naming/discovery mechanisms to work out of the box, including self-registration mechanisms and applications that distribute IP addresses. (We should test that with etcd and perhaps one other option, such as Eureka (used by Acme Air) or Consul.) We should be optimizing for inter-pod network communication. Within a pod, containers are more likely to use communication through volumes (e.g., tmpfs) or IPC. - -This is different from the standard Docker model. In that mode, each container gets an IP in the 172-dot space and would only see that 172-dot address from SIOCGIFADDR. If these containers connect to another container the peer would see the connect coming from a different IP than the container itself knows. In short — you can never self-register anything from a container, because a container can not be reached on its private IP. - -An alternative we considered was an additional layer of addressing: pod-centric IP per container. Each container would have its own local IP address, visible only within that pod. This would perhaps make it easier for containerized applications to move from physical/virtual hosts to pods, but would be more complex to implement (e.g., requiring a bridge per pod, split-horizon/VP DNS) and to reason about, due to the additional layer of address translation, and would break self-registration and IP distribution mechanisms. - -## Current implementation - -For the Google Compute Engine cluster configuration scripts, [advanced routing](https://developers.google.com/compute/docs/networking#routing) is set up so that each VM has an extra 256 IP addresses that get routed to it. This is in addition to the 'main' IP address assigned to the VM that is NAT-ed for Internet access. The networking bridge (called `cbr0` to differentiate it from `docker0`) is set up outside of Docker proper and only does NAT for egress network traffic that isn't aimed at the virtual network. - -Ports mapped in from the 'main IP' (and hence the internet if the right firewall rules are set up) are proxied in user mode by Docker. In the future, this should be done with `iptables` by either the Kubelet or Docker: [Issue #15](https://github.com/GoogleCloudPlatform/kubernetes/issues/15). - -We start Docker with: - DOCKER_OPTS="--bridge cbr0 --iptables=false" - -We set up this bridge on each node with SaltStack, in [container_bridge.py](cluster/saltbase/salt/_states/container_bridge.py). - - cbr0: - container_bridge.ensure: - - cidr: {{ grains['cbr-cidr'] }} - ... - grains: - roles: - - kubernetes-pool - cbr-cidr: $MINION_IP_RANGE - -We make these addresses routable in GCE: - - gcloud compute routes add "${MINION_NAMES[$i]}" \ - --project "${PROJECT}" \ - --destination-range "${MINION_IP_RANGES[$i]}" \ - --network "${NETWORK}" \ - --next-hop-instance "${MINION_NAMES[$i]}" \ - --next-hop-instance-zone "${ZONE}" & - -The minion IP ranges are /24s in the 10-dot space. +There are 4 distinct networking problems to solve: +1. Highly-coupled container-to-container communications +2. Pod-to-Pod communications +3. Pod-to-Service communications +4. External-to-internal communications -GCE itself does not know anything about these IPs, though. - -These are not externally routable, though, so containers that need to communicate with the outside world need to use host networking. To set up an external IP that forwards to the VM, it will only forward to the VM's primary IP (which is assigned to no pod). So we use docker's -p flag to map published ports to the main interface. This has the side effect of disallowing two pods from exposing the same port. (More discussion on this in [Issue #390](https://github.com/GoogleCloudPlatform/kubernetes/issues/390).) - -We create a container to use for the pod network namespace — a single loopback device and a single veth device. All the user's containers get their network namespaces from this pod networking container. - -Docker allocates IP addresses from a bridge we create on each node, using its “container” networking mode. - -1. Create a normal (in the networking sense) container which uses a minimal image and runs a command that blocks forever. This is not a user-defined container, and gets a special well-known name. - - creates a new network namespace (netns) and loopback device - - creates a new pair of veth devices and binds them to the netns - - auto-assigns an IP from docker’s IP range - -2. Create the user containers and specify the name of the pod infra container as their “POD” argument. Docker finds the PID of the command running in the pod infra container and attaches to the netns and ipcns of that PID. +## Model and motivation -### Other networking implementation examples -With the primary aim of providing IP-per-pod-model, other implementations exist to serve the purpose outside of GCE. +Kubernetes deviates from the default Docker networking model (though as of +Docker 1.8 their network plugins are getting closer). The goal is for each pod +to have an IP in a flat shared networking namespace that has full communication +with other physical computers and containers across the network. IP-per-pod +creates a clean, backward-compatible model where pods can be treated much like +VMs or physical hosts from the perspectives of port allocation, networking, +naming, service discovery, load balancing, application configuration, and +migration. + +Dynamic port allocation, on the other hand, requires supporting both static +ports (e.g., for externally accessible services) and dynamically allocated +ports, requires partitioning centrally allocated and locally acquired dynamic +ports, complicates scheduling (since ports are a scarce resource), is +inconvenient for users, complicates application configuration, is plagued by +port conflicts and reuse and exhaustion, requires non-standard approaches to +naming (e.g. consul or etcd rather than DNS), requires proxies and/or +redirection for programs using standard naming/addressing mechanisms (e.g. web +browsers), requires watching and cache invalidation for address/port changes +for instances in addition to watching group membership changes, and obstructs +container/pod migration (e.g. using CRIU). NAT introduces additional complexity +by fragmenting the addressing space, which breaks self-registration mechanisms, +among other problems. + +## Container to container + +All containers within a pod behave as if they are on the same host with regard +to networking. They can all reach each other’s ports on localhost. This offers +simplicity (static ports know a priori), security (ports bound to localhost +are visible within the pod but never outside it), and performance. This also +reduces friction for applications moving from the world of uncontainerized apps +on physical or virtual hosts. People running application stacks together on +the same host have already figured out how to make ports not conflict and have +arranged for clients to find them. + +The approach does reduce isolation between containers within a pod — +ports could conflict, and there can be no container-private ports, but these +seem to be relatively minor issues with plausible future workarounds. Besides, +the premise of pods is that containers within a pod share some resources +(volumes, cpu, ram, etc.) and therefore expect and tolerate reduced isolation. +Additionally, the user can control what containers belong to the same pod +whereas, in general, they don't control what pods land together on a host. + +## Pod to pod + +Because every pod gets a "real" (not machine-private) IP address, pods can +communicate without proxies or translations. The can use well-known port +numbers and can avoid the use of higher-level service discovery systems like +DNS-SD, Consul, or Etcd. + +When any container calls ioctl(SIOCGIFADDR) (get the address of an interface), +it sees the same IP that any peer container would see them coming from — +each pod has its own IP address that other pods can know. By making IP addresses +and ports the same both inside and outside the pods, we create a NAT-less, flat +address space. Running "ip addr show" should work as expected. This would enable +all existing naming/discovery mechanisms to work out of the box, including +self-registration mechanisms and applications that distribute IP addresses. We +should be optimizing for inter-pod network communication. Within a pod, +containers are more likely to use communication through volumes (e.g., tmpfs) or +IPC. + +This is different from the standard Docker model. In that mode, each container +gets an IP in the 172-dot space and would only see that 172-dot address from +SIOCGIFADDR. If these containers connect to another container the peer would see +the connect coming from a different IP than the container itself knows. In short +— you can never self-register anything from a container, because a +container can not be reached on its private IP. + +An alternative we considered was an additional layer of addressing: pod-centric +IP per container. Each container would have its own local IP address, visible +only within that pod. This would perhaps make it easier for containerized +applications to move from physical/virtual hosts to pods, but would be more +complex to implement (e.g., requiring a bridge per pod, split-horizon/VP DNS) +and to reason about, due to the additional layer of address translation, and +would break self-registration and IP distribution mechanisms. + +Like Docker, ports can still be published to the host node's interface(s), but +the need for this is radically diminished. + +## Implementation + +For the Google Compute Engine cluster configuration scripts, we use [advanced +routing rules](https://developers.google.com/compute/docs/networking#routing) +and ip-forwarding-enabled VMs so that each VM has an extra 256 IP addresses that +get routed to it. This is in addition to the 'main' IP address assigned to the +VM that is NAT-ed for Internet access. The container bridge (called `cbr0` to +differentiate it from `docker0`) is set up outside of Docker proper. + +Example of GCE's advanced routing rules: + +``` +gcloud compute routes add "${MINION_NAMES[$i]}" \ + --project "${PROJECT}" \ + --destination-range "${MINION_IP_RANGES[$i]}" \ + --network "${NETWORK}" \ + --next-hop-instance "${MINION_NAMES[$i]}" \ + --next-hop-instance-zone "${ZONE}" & +``` + +GCE itself does not know anything about these IPs, though. This means that when +a pod tries to egress beyond GCE's project the packets must be SNAT'ed +(masqueraded) to the VM's IP, which GCE recognizes and allows. + +### Other implementations + +With the primary aim of providing IP-per-pod-model, other implementations exist +to serve the purpose outside of GCE. - [OpenVSwitch with GRE/VxLAN](../ovs-networking.md) - [Flannel](https://github.com/coreos/flannel#flannel) + - [L2 networks](http://blog.oddbit.com/2014/08/11/four-ways-to-connect-a-docker/) + ("With Linux Bridge devices" section) + - [Weave](https://github.com/zettio/weave) is yet another way to build an + overlay network, primarily aiming at Docker integration. + - [Calico](https://github.com/Metaswitch/calico) uses BGP to enable real + container IPs. + +## Pod to service + +The [service](../services.md) abstraction provides a way to group pods under a +common access policy (e.g. load-balanced). The implementation of this creates a +virtual IP which clients can access and which is transparantly proxied to the +pods in a Service. Each node runs a kube-proxy process which programs +`iptables` rules to trap access to service IPs and redirect them to the correct +backends. This provides a highly-available load-balancing solution with low +performance overhead by balancing client traffic from a node on that same node. + +## External to internal + +So far the discussion has been about how to access a pod or service from within +the cluster. Accessing a pod from outside the cluster is a bit more tricky. We +want to offer highly-available, high-performance load balancing to target +Kubernetes Services. Most public cloud providers are simply not flexible enough +yet. + +The way this is generally implemented is to set up external load balancers (e.g. +GCE's ForwardingRules or AWS's ELB) which target all nodes in a cluster. When +traffic arrives at a node it is recognized as being part of a particular Service +and routed to an appropriate backend Pod. This does mean that some traffic will +get double-bounced on the network. Once cloud providers have better offerings +we can take advantage of those. ## Challenges and future work ### Docker API -Right now, docker inspect doesn't show the networking configuration of the containers, since they derive it from another container. That information should be exposed somehow. +Right now, docker inspect doesn't show the networking configuration of the +containers, since they derive it from another container. That information should +be exposed somehow. ### External IP assignment -We want to be able to assign IP addresses externally from Docker ([Docker issue #6743](https://github.com/dotcloud/docker/issues/6743)) so that we don't need to statically allocate fixed-size IP ranges to each node, so that IP addresses can be made stable across pod infra container restarts ([Docker issue #2801](https://github.com/dotcloud/docker/issues/2801)), and to facilitate pod migration. Right now, if the pod infra container dies, all the user containers must be stopped and restarted because the netns of the pod infra container will change on restart, and any subsequent user container restart will join that new netns, thereby not being able to see its peers. Additionally, a change in IP address would encounter DNS caching/TTL problems. External IP assignment would also simplify DNS support (see below). - -### Naming, discovery, and load balancing - -In addition to enabling self-registration with 3rd-party discovery mechanisms, we'd like to setup DDNS automatically ([Issue #146](https://github.com/GoogleCloudPlatform/kubernetes/issues/146)). hostname, $HOSTNAME, etc. should return a name for the pod ([Issue #298](https://github.com/GoogleCloudPlatform/kubernetes/issues/298)), and gethostbyname should be able to resolve names of other pods. Probably we need to set up a DNS resolver to do the latter ([Docker issue #2267](https://github.com/dotcloud/docker/issues/2267)), so that we don't need to keep /etc/hosts files up to date dynamically. - -[Service](http://docs.k8s.io/services.md) endpoints are currently found through environment variables. Both [Docker-links-compatible](https://docs.docker.com/userguide/dockerlinks/) variables and kubernetes-specific variables ({NAME}_SERVICE_HOST and {NAME}_SERVICE_BAR) are supported, and resolve to ports opened by the service proxy. We don't actually use [the Docker ambassador pattern](https://docs.docker.com/articles/ambassador_pattern_linking/) to link containers because we don't require applications to identify all clients at configuration time, yet. While services today are managed by the service proxy, this is an implementation detail that applications should not rely on. Clients should instead use the [service IP](http://docs.k8s.io/services.md) (which the above environment variables will resolve to). However, a flat service namespace doesn't scale and environment variables don't permit dynamic updates, which complicates service deployment by imposing implicit ordering constraints. We intend to register each service's IP in DNS, and for that to become the preferred resolution protocol. - -We'd also like to accommodate other load-balancing solutions (e.g., HAProxy), non-load-balanced services ([Issue #260](https://github.com/GoogleCloudPlatform/kubernetes/issues/260)), and other types of groups (worker pools, etc.). Providing the ability to Watch a label selector applied to pod addresses would enable efficient monitoring of group membership, which could be directly consumed or synced with a discovery mechanism. Event hooks ([Issue #140](https://github.com/GoogleCloudPlatform/kubernetes/issues/140)) for join/leave events would probably make this even easier. - -### External routability - -We want traffic between containers to use the pod IP addresses across nodes. Say we have Node A with a container IP space of 10.244.1.0/24 and Node B with a container IP space of 10.244.2.0/24. And we have Container A1 at 10.244.1.1 and Container B1 at 10.244.2.1. We want Container A1 to talk to Container B1 directly with no NAT. B1 should see the "source" in the IP packets of 10.244.1.1 — not the "primary" host IP for Node A. That means that we want to turn off NAT for traffic between containers (and also between VMs and containers). - -We'd also like to make pods directly routable from the external internet. However, we can't yet support the extra container IPs that we've provisioned talking to the internet directly. So, we don't map external IPs to the container IPs. Instead, we solve that problem by having traffic that isn't to the internal network (! 10.0.0.0/8) get NATed through the primary host IP address so that it can get 1:1 NATed by the GCE networking when talking to the internet. Similarly, incoming traffic from the internet has to get NATed/proxied through the host IP. - -So we end up with 3 cases: - -1. Container -> Container or Container <-> VM. These should use 10. addresses directly and there should be no NAT. - -2. Container -> Internet. These have to get mapped to the primary host IP so that GCE knows how to egress that traffic. There is actually 2 layers of NAT here: Container IP -> Internal Host IP -> External Host IP. The first level happens in the guest with IP tables and the second happens as part of GCE networking. The first one (Container IP -> internal host IP) does dynamic port allocation while the second maps ports 1:1. - -3. Internet -> Container. This also has to go through the primary host IP and also has 2 levels of NAT, ideally. However, the path currently is a proxy with (External Host IP -> Internal Host IP -> Docker) -> (Docker -> Container IP). Once [issue #15](https://github.com/GoogleCloudPlatform/kubernetes/issues/15) is closed, it should be External Host IP -> Internal Host IP -> Container IP. But to get that second arrow we have to set up the port forwarding iptables rules per mapped port. - -Another approach could be to create a new host interface alias for each pod, if we had a way to route an external IP to it. This would eliminate the scheduling constraints resulting from using the host's IP address. +We want to be able to assign IP addresses externally from Docker +[#6743](https://github.com/dotcloud/docker/issues/6743) so that we don't need +to statically allocate fixed-size IP ranges to each node, so that IP addresses +can be made stable across pod infra container restarts +([#2801](https://github.com/dotcloud/docker/issues/2801)), and to facilitate +pod migration. Right now, if the pod infra container dies, all the user +containers must be stopped and restarted because the netns of the pod infra +container will change on restart, and any subsequent user container restart +will join that new netns, thereby not being able to see its peers. +Additionally, a change in IP address would encounter DNS caching/TTL problems. +External IP assignment would also simplify DNS support (see below). ### IPv6 -- cgit v1.2.3 From ca9ef4abe107ab2c3b7f763bdf49aeff8f2a3d0c Mon Sep 17 00:00:00 2001 From: David Oppenheimer Date: Tue, 7 Jul 2015 13:06:19 -0700 Subject: Move scheduler overview from docs/design/ to docs/devel/ --- scheduler.md | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 50 insertions(+) create mode 100644 scheduler.md diff --git a/scheduler.md b/scheduler.md new file mode 100644 index 00000000..ac01e6db --- /dev/null +++ b/scheduler.md @@ -0,0 +1,50 @@ + +# The Kubernetes Scheduler + +The Kubernetes scheduler runs as a process alongside the other master +components such as the API server. Its interface to the API server is to watch +for Pods with an empty PodSpec.NodeName, and for each Pod, it posts a Binding +indicating where the Pod should be scheduled. + +## The scheduling process + +The scheduler tries to find a node for each Pod, one at a time, as it notices +these Pods via watch. There are three steps. First it applies a set of "predicates" that filter out +inappropriate nodes. For example, if the PodSpec specifies resource limits, then the scheduler +will filter out nodes that don't have at least that much resources available (computed +as the capacity of the node minus the sum of the resource limits of the containers that +are already running on the node). Second, it applies a set of "priority functions" +that rank the nodes that weren't filtered out by the predicate check. For example, +it tries to spread Pods across nodes while at the same time favoring the least-loaded +nodes (where "load" here is sum of the resource limits of the containers running on the node, +divided by the node's capacity). +Finally, the node with the highest priority is chosen +(or, if there are multiple such nodes, then one of them is chosen at random). The code +for this main scheduling loop is in the function `Schedule()` in +[plugin/pkg/scheduler/generic_scheduler.go](../../plugin/pkg/scheduler/generic_scheduler.go) + +## Scheduler extensibility + +The scheduler is extensible: the cluster administrator can choose which of the pre-defined +scheduling policies to apply, and can add new ones. The built-in predicates and priorities are +defined in [plugin/pkg/scheduler/algorithm/predicates/predicates.go](../../plugin/pkg/scheduler/algorithm/predicates/predicates.go) and +[plugin/pkg/scheduler/algorithm/priorities/priorities.go](../../plugin/pkg/scheduler/algorithm/priorities/priorities.go), respectively. +The policies that are applied when scheduling can be chosen in one of two ways. Normally, +the policies used are selected by the functions `defaultPredicates()` and `defaultPriorities()` in +[plugin/pkg/scheduler/algorithmprovider/defaults/defaults.go](../../plugin/pkg/scheduler/algorithmprovider/defaults/defaults.go). +However, the choice of policies +can be overridden by passing the command-line flag `--policy-config-file` to the scheduler, pointing to a JSON +file specifying which scheduling policies to use. See +[examples/scheduler-policy-config.json](../../examples/scheduler-policy-config.json) for an example +config file. (Note that the config file format is versioned; the API is defined in +[plugin/pkg/scheduler/api/](../../plugin/pkg/scheduler/api/)). +Thus to add a new scheduling policy, you should modify predicates.go or priorities.go, +and either register the policy in `defaultPredicates()` or `defaultPriorities()`, or use a policy config file. + +## Exploring the code + +If you want to get a global picture of how the scheduler works, you can start in +[plugin/cmd/kube-scheduler/app/server.go](../../plugin/cmd/kube-scheduler/app/server.go) + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/scheduler.md?pixel)]() -- cgit v1.2.3 From b4354021c3968c7fb46996e48e540750a246fdfb Mon Sep 17 00:00:00 2001 From: David Oppenheimer Date: Tue, 7 Jul 2015 13:06:19 -0700 Subject: Move scheduler overview from docs/design/ to docs/devel/ --- scheduler.md | 50 -------------------------------------------------- 1 file changed, 50 deletions(-) delete mode 100644 scheduler.md diff --git a/scheduler.md b/scheduler.md deleted file mode 100644 index e2a9f35d..00000000 --- a/scheduler.md +++ /dev/null @@ -1,50 +0,0 @@ - -# The Kubernetes Scheduler - -The Kubernetes scheduler runs as a process alongside the other master -components such as the API server. Its interface to the API server is to watch -for Pods with an empty PodSpec.NodeName, and for each Pod, it posts a Binding -indicating where the Pod should be scheduled. - -## The scheduling process - -The scheduler tries to find a node for each Pod, one at a time, as it notices -these Pods via watch. There are three steps. First it applies a set of "predicates" that filter out -inappropriate nodes. For example, if the PodSpec specifies resource limits, then the scheduler -will filter out nodes that don't have at least that much resources available (computed -as the capacity of the node minus the sum of the resource limits of the containers that -are already running on the node). Second, it applies a set of "priority functions" -that rank the nodes that weren't filtered out by the predicate check. For example, -it tries to spread Pods across nodes while at the same time favoring the least-loaded -nodes (where "load" here is sum of the resource limits of the containers running on the node, -divided by the node's capacity). -Finally, the node with the highest priority is chosen -(or, if there are multiple such nodes, then one of them is chosen at random). The code -for this main scheduling loop is in the function `Schedule()` in -[plugin/pkg/scheduler/generic_scheduler.go](../../plugin/pkg/scheduler/generic_scheduler.go) - -## Scheduler extensibility - -The scheduler is extensible: the cluster administrator can choose which of the pre-defined -scheduling policies to apply, and can add new ones. The built-in predicates and priorities are -defined in [plugin/pkg/scheduler/algorithm/predicates/predicates.go](../../plugin/pkg/scheduler/algorithm/predicates/predicates.go) and -[plugin/pkg/scheduler/algorithm/priorities/priorities.go](../../plugin/pkg/scheduler/algorithm/priorities/priorities.go), respectively. -The policies that are applied when scheduling can be chosen in one of two ways. Normally, -the policies used are selected by the functions `defaultPredicates()` and `defaultPriorities()` in -[plugin/pkg/scheduler/algorithmprovider/defaults/defaults.go](../../plugin/pkg/scheduler/algorithmprovider/defaults/defaults.go). -However, the choice of policies -can be overridden by passing the command-line flag `--policy-config-file` to the scheduler, pointing to a JSON -file specifying which scheduling policies to use. See -[examples/scheduler-policy-config.json](../../examples/scheduler-policy-config.json) for an example -config file. (Note that the config file format is versioned; the API is defined in -[plugin/pkg/scheduler/api/](../../plugin/pkg/scheduler/api/)). -Thus to add a new scheduling policy, you should modify predicates.go or priorities.go, -and either register the policy in `defaultPredicates()` or `defaultPriorities()`, or use a policy config file. - -## Exploring the code - -If you want to get a global picture of how the scheduler works, you can start in -[plugin/cmd/kube-scheduler/app/server.go](../../plugin/cmd/kube-scheduler/app/server.go) - - -[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/scheduler.md?pixel)]() -- cgit v1.2.3 From 1b3281d5d27c495145931aaebdd034c79b55717f Mon Sep 17 00:00:00 2001 From: Tim Hockin Date: Thu, 2 Jul 2015 09:42:49 -0700 Subject: Make docs links be relative so we can version them --- secrets.md | 6 +++--- security.md | 20 ++++++++++---------- 2 files changed, 13 insertions(+), 13 deletions(-) diff --git a/secrets.md b/secrets.md index 423ce529..d91a950a 100644 --- a/secrets.md +++ b/secrets.md @@ -71,7 +71,7 @@ service would also consume the secrets associated with the MySQL service. ### Use-Case: Secrets associated with service accounts -[Service Accounts](http://docs.k8s.io/design/service_accounts.md) are proposed as a +[Service Accounts](./service_accounts.md) are proposed as a mechanism to decouple capabilities and security contexts from individual human users. A `ServiceAccount` contains references to some number of secrets. A `Pod` can specify that it is associated with a `ServiceAccount`. Secrets should have a `Type` field to allow the Kubelet and @@ -241,7 +241,7 @@ memory overcommit on the node. #### Secret data on the node: isolation -Every pod will have a [security context](http://docs.k8s.io/design/security_context.md). +Every pod will have a [security context](./security_context.md). Secret data on the node should be isolated according to the security context of the container. The Kubelet volume plugin API will be changed so that a volume plugin receives the security context of a volume along with the volume spec. This will allow volume plugins to implement setting the @@ -253,7 +253,7 @@ Several proposals / upstream patches are notable as background for this proposal 1. [Docker vault proposal](https://github.com/docker/docker/issues/10310) 2. [Specification for image/container standardization based on volumes](https://github.com/docker/docker/issues/9277) -3. [Kubernetes service account proposal](http://docs.k8s.io/design/service_accounts.md) +3. [Kubernetes service account proposal](./service_accounts.md) 4. [Secrets proposal for docker (1)](https://github.com/docker/docker/pull/6075) 5. [Secrets proposal for docker (2)](https://github.com/docker/docker/pull/6697) diff --git a/security.md b/security.md index 6ea611b7..733f6818 100644 --- a/security.md +++ b/security.md @@ -63,14 +63,14 @@ Automated process users fall into the following categories: A pod runs in a *security context* under a *service account* that is defined by an administrator or project administrator, and the *secrets* a pod has access to is limited by that *service account*. -1. The API should authenticate and authorize user actions [authn and authz](http://docs.k8s.io/design/access.md) +1. The API should authenticate and authorize user actions [authn and authz](./access.md) 2. All infrastructure components (kubelets, kube-proxies, controllers, scheduler) should have an infrastructure user that they can authenticate with and be authorized to perform only the functions they require against the API. 3. Most infrastructure components should use the API as a way of exchanging data and changing the system, and only the API should have access to the underlying data store (etcd) -4. When containers run on the cluster and need to talk to other containers or the API server, they should be identified and authorized clearly as an autonomous process via a [service account](http://docs.k8s.io/design/service_accounts.md) +4. When containers run on the cluster and need to talk to other containers or the API server, they should be identified and authorized clearly as an autonomous process via a [service account](./service_accounts.md) 1. If the user who started a long-lived process is removed from access to the cluster, the process should be able to continue without interruption 2. If the user who started processes are removed from the cluster, administrators may wish to terminate their processes in bulk 3. When containers run with a service account, the user that created / triggered the service account behavior must be associated with the container's action -5. When container processes run on the cluster, they should run in a [security context](http://docs.k8s.io/design/security_context.md) that isolates those processes via Linux user security, user namespaces, and permissions. +5. When container processes run on the cluster, they should run in a [security context](./security_context.md) that isolates those processes via Linux user security, user namespaces, and permissions. 1. Administrators should be able to configure the cluster to automatically confine all container processes as a non-root, randomly assigned UID 2. Administrators should be able to ensure that container processes within the same namespace are all assigned the same unix user UID 3. Administrators should be able to limit which developers and project administrators have access to higher privilege actions @@ -79,7 +79,7 @@ A pod runs in a *security context* under a *service account* that is defined by 6. Developers may need to ensure their images work within higher security requirements specified by administrators 7. When available, Linux kernel user namespaces can be used to ensure 5.2 and 5.4 are met. 8. When application developers want to share filesytem data via distributed filesystems, the Unix user ids on those filesystems must be consistent across different container processes -6. Developers should be able to define [secrets](http://docs.k8s.io/design/secrets.md) that are automatically added to the containers when pods are run +6. Developers should be able to define [secrets](./secrets.md) that are automatically added to the containers when pods are run 1. Secrets are files injected into the container whose values should not be displayed within a pod. Examples: 1. An SSH private key for git cloning remote data 2. A client certificate for accessing a remote system @@ -93,12 +93,12 @@ A pod runs in a *security context* under a *service account* that is defined by ### Related design discussion -* Authorization and authentication http://docs.k8s.io/design/access.md -* Secret distribution via files https://github.com/GoogleCloudPlatform/kubernetes/pull/2030 -* Docker secrets https://github.com/docker/docker/pull/6697 -* Docker vault https://github.com/docker/docker/issues/10310 -* Service Accounts: http://docs.k8s.io/design/service_accounts.md -* Secret volumes https://github.com/GoogleCloudPlatform/kubernetes/4126 +* [Authorization and authentication](./access.md) +* [Secret distribution via files](https://github.com/GoogleCloudPlatform/kubernetes/pull/2030) +* [Docker secrets](https://github.com/docker/docker/pull/6697) +* [Docker vault](https://github.com/docker/docker/issues/10310) +* [Service Accounts:](./service_accounts.md) +* [Secret volumes](https://github.com/GoogleCloudPlatform/kubernetes/pull/4126) ## Specific Design Points -- cgit v1.2.3 From ded4807c0df92ee0a20ec44bbc398a428b54736a Mon Sep 17 00:00:00 2001 From: Tim Hockin Date: Thu, 2 Jul 2015 09:42:49 -0700 Subject: Make docs links be relative so we can version them --- autoscaling.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/autoscaling.md b/autoscaling.md index 31374448..3acaf298 100644 --- a/autoscaling.md +++ b/autoscaling.md @@ -21,7 +21,7 @@ done automatically based on statistical analysis and thresholds. * This proposal is for horizontal scaling only. Vertical scaling will be handled in [issue 2072](https://github.com/GoogleCloudPlatform/kubernetes/issues/2072) * `ReplicationControllers` will not know about the auto-scaler, they are the target of the auto-scaler. The `ReplicationController` responsibilities are -constrained to only ensuring that the desired number of pods are operational per the [Replication Controller Design](http://docs.k8s.io/replication-controller.md#responsibilities-of-the-replication-controller) +constrained to only ensuring that the desired number of pods are operational per the [Replication Controller Design](../replication-controller.md#responsibilities-of-the-replication-controller) * Auto-scalers will be loosely coupled with data gathering components in order to allow a wide variety of input sources * Auto-scalable resources will support a scale verb ([1629](https://github.com/GoogleCloudPlatform/kubernetes/issues/1629)) such that the auto-scaler does not directly manipulate the underlying resource. @@ -42,7 +42,7 @@ applications will expose one or more network endpoints for clients to connect to balanced or situated behind a proxy - the data from those proxies and load balancers can be used to estimate client to server traffic for applications. This is the primary, but not sole, source of data for making decisions. -Within Kubernetes a [kube proxy](http://docs.k8s.io/services.md#ips-and-vips) +Within Kubernetes a [kube proxy](../services.md#ips-and-vips) running on each node directs service requests to the underlying implementation. While the proxy provides internal inter-pod connections, there will be L3 and L7 proxies and load balancers that manage @@ -225,7 +225,7 @@ or down as appropriate. In the future this may be more configurable. ### Interactions with a deployment -In a deployment it is likely that multiple replication controllers must be monitored. For instance, in a [rolling deployment](http://docs.k8s.io/replication-controller.md#rolling-updates) +In a deployment it is likely that multiple replication controllers must be monitored. For instance, in a [rolling deployment](../replication-controller.md#rolling-updates) there will be multiple replication controllers, with one scaling up and another scaling down. This means that an auto-scaler must be aware of the entire set of capacity that backs a service so it does not fight with the deployer. `AutoScalerSpec.MonitorSelector` is what provides this ability. By using a selector that spans the entire service the auto-scaler can monitor capacity -- cgit v1.2.3 From 2b8e318ccafb353dc06bb7066ccb8671591bbaba Mon Sep 17 00:00:00 2001 From: Alex Mohr Date: Tue, 7 Jul 2015 16:29:18 -0700 Subject: Update release notes tool and documentation --- making-release-notes.md | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/making-release-notes.md b/making-release-notes.md index ffccf6d3..5d08ac50 100644 --- a/making-release-notes.md +++ b/making-release-notes.md @@ -2,17 +2,21 @@ This documents the process for making release notes for a release. ### 1) Note the PR number of the previous release -Find the PR that was merged with the previous release. Remember this number +Find the most-recent PR that was merged with the previous .0 release. Remember this as $LASTPR. _TODO_: Figure out a way to record this somewhere to save the next release engineer time. -### 2) Build the release-notes tool +Find the most-recent PR that was merged with the current .0 release. Remeber this as $CURRENTPR. + +### 2) Run the release-notes tool ```bash -${KUBERNETES_ROOT}/build/make-release-notes.sh +${KUBERNETES_ROOT}/build/make-release-notes.sh $LASTPR $CURRENTPR ``` ### 3) Trim the release notes -This generates a list of the entire set of PRs merged since the last release. It is likely long -and many PRs aren't worth mentioning. +This generates a list of the entire set of PRs merged since the last minor +release. It is likely long and many PRs aren't worth mentioning. If any of the +PRs were cherrypicked into patches on the last minor release, you should exclude +them from the current release's notes. Open up ```candidate-notes.md``` in your favorite editor. -- cgit v1.2.3 From 8c28498ca08ac4cd76ea1d23992836dec63581f6 Mon Sep 17 00:00:00 2001 From: Janet Kuo Date: Tue, 7 Jul 2015 18:02:21 -0700 Subject: Update kubectl get command in docs/devel/ --- developer-guides/vagrant.md | 70 +++++++++++++++++++++++---------------------- 1 file changed, 36 insertions(+), 34 deletions(-) diff --git a/developer-guides/vagrant.md b/developer-guides/vagrant.md index 332ac3d5..d8d7a1ec 100644 --- a/developer-guides/vagrant.md +++ b/developer-guides/vagrant.md @@ -36,14 +36,14 @@ Vagrant will provision each machine in the cluster with all the necessary compon By default, each VM in the cluster is running Fedora, and all of the Kubernetes services are installed into systemd. -To access the master or any minion: +To access the master or any node: ```sh vagrant ssh master vagrant ssh minion-1 ``` -If you are running more than one minion, you can access the others by: +If you are running more than one nodes, you can access the others by: ```sh vagrant ssh minion-2 @@ -97,12 +97,12 @@ Once your Vagrant machines are up and provisioned, the first thing to do is to c You may need to build the binaries first, you can do this with ```make``` ```sh -$ ./cluster/kubectl.sh get minions +$ ./cluster/kubectl.sh get nodes -NAME LABELS -10.245.1.4 -10.245.1.5 -10.245.1.3 +NAME LABELS STATUS +kubernetes-minion-0whl kubernetes.io/hostname=kubernetes-minion-0whl Ready +kubernetes-minion-4jdf kubernetes.io/hostname=kubernetes-minion-4jdf Ready +kubernetes-minion-epbe kubernetes.io/hostname=kubernetes-minion-epbe Ready ``` ### Interacting with your Kubernetes cluster with the `kube-*` scripts. @@ -153,23 +153,23 @@ cat ~/.kubernetes_vagrant_auth } ``` -You should now be set to use the `cluster/kubectl.sh` script. For example try to list the minions that you have started with: +You should now be set to use the `cluster/kubectl.sh` script. For example try to list the nodes that you have started with: ```sh -./cluster/kubectl.sh get minions +./cluster/kubectl.sh get nodes ``` ### Running containers -Your cluster is running, you can list the minions in your cluster: +Your cluster is running, you can list the nodes in your cluster: ```sh -$ ./cluster/kubectl.sh get minions +$ ./cluster/kubectl.sh get nodes -NAME LABELS -10.245.2.4 -10.245.2.3 -10.245.2.2 +NAME LABELS STATUS +kubernetes-minion-0whl kubernetes.io/hostname=kubernetes-minion-0whl Ready +kubernetes-minion-4jdf kubernetes.io/hostname=kubernetes-minion-4jdf Ready +kubernetes-minion-epbe kubernetes.io/hostname=kubernetes-minion-epbe Ready ``` Now start running some containers! @@ -179,29 +179,31 @@ Before starting a container there will be no pods, services and replication cont ``` $ cluster/kubectl.sh get pods -NAME IMAGE(S) HOST LABELS STATUS +NAME READY STATUS RESTARTS AGE $ cluster/kubectl.sh get services -NAME LABELS SELECTOR IP PORT +NAME LABELS SELECTOR IP(S) PORT(S) -$ cluster/kubectl.sh get replicationcontrollers -NAME IMAGE(S SELECTOR REPLICAS +$ cluster/kubectl.sh get rc +CONTROLLER CONTAINER(S) IMAGE(S) SELECTOR REPLICAS ``` Start a container running nginx with a replication controller and three replicas ``` $ cluster/kubectl.sh run my-nginx --image=nginx --replicas=3 --port=80 +CONTROLLER CONTAINER(S) IMAGE(S) SELECTOR REPLICAS +my-nginx my-nginx nginx run=my-nginx 3 ``` When listing the pods, you will see that three containers have been started and are in Waiting state: ``` $ cluster/kubectl.sh get pods -NAME IMAGE(S) HOST LABELS STATUS -781191ff-3ffe-11e4-9036-0800279696e1 nginx 10.245.2.4/10.245.2.4 name=myNginx Waiting -7813c8bd-3ffe-11e4-9036-0800279696e1 nginx 10.245.2.2/10.245.2.2 name=myNginx Waiting -78140853-3ffe-11e4-9036-0800279696e1 nginx 10.245.2.3/10.245.2.3 name=myNginx Waiting +NAME READY STATUS RESTARTS AGE +my-nginx-389da 1/1 Waiting 0 33s +my-nginx-kqdjk 1/1 Waiting 0 33s +my-nginx-nyj3x 1/1 Waiting 0 33s ``` You need to wait for the provisioning to complete, you can monitor the minions by doing: @@ -228,17 +230,17 @@ Going back to listing the pods, services and replicationcontrollers, you now hav ``` $ cluster/kubectl.sh get pods -NAME IMAGE(S) HOST LABELS STATUS -781191ff-3ffe-11e4-9036-0800279696e1 nginx 10.245.2.4/10.245.2.4 name=myNginx Running -7813c8bd-3ffe-11e4-9036-0800279696e1 nginx 10.245.2.2/10.245.2.2 name=myNginx Running -78140853-3ffe-11e4-9036-0800279696e1 nginx 10.245.2.3/10.245.2.3 name=myNginx Running +NAME READY STATUS RESTARTS AGE +my-nginx-389da 1/1 Running 0 33s +my-nginx-kqdjk 1/1 Running 0 33s +my-nginx-nyj3x 1/1 Running 0 33s $ cluster/kubectl.sh get services -NAME LABELS SELECTOR IP PORT +NAME LABELS SELECTOR IP(S) PORT(S) -$ cluster/kubectl.sh get replicationcontrollers -NAME IMAGE(S SELECTOR REPLICAS -myNginx nginx name=my-nginx 3 +$ cluster/kubectl.sh get rc +NAME IMAGE(S) SELECTOR REPLICAS +my-nginx nginx run=my-nginx 3 ``` We did not start any services, hence there are none listed. But we see three replicas displayed properly. @@ -248,9 +250,9 @@ You can already play with scaling the replicas with: ```sh $ ./cluster/kubectl.sh scale rc my-nginx --replicas=2 $ ./cluster/kubectl.sh get pods -NAME IMAGE(S) HOST LABELS STATUS -7813c8bd-3ffe-11e4-9036-0800279696e1 nginx 10.245.2.2/10.245.2.2 name=myNginx Running -78140853-3ffe-11e4-9036-0800279696e1 nginx 10.245.2.3/10.245.2.3 name=myNginx Running +NAME READY STATUS RESTARTS AGE +my-nginx-kqdjk 1/1 Running 0 13m +my-nginx-nyj3x 1/1 Running 0 13m ``` Congratulations! -- cgit v1.2.3 From 47df5ae18f4f16f0909a1299bb1d4a599dc63879 Mon Sep 17 00:00:00 2001 From: Janet Kuo Date: Wed, 8 Jul 2015 13:19:38 -0700 Subject: Update kubectl output in doc --- persistent-storage.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/persistent-storage.md b/persistent-storage.md index 3729f30e..8e7c6765 100644 --- a/persistent-storage.md +++ b/persistent-storage.md @@ -112,7 +112,7 @@ spec: kubectl get pv -NAME LABELS CAPACITY ACCESSMODES STATUS CLAIM +NAME LABELS CAPACITY ACCESSMODES STATUS CLAIM REASON pv0001 map[] 10737418240 RWO Pending @@ -157,7 +157,7 @@ myclaim-1 map[] pending kubectl get pv -NAME LABELS CAPACITY ACCESSMODES STATUS CLAIM +NAME LABELS CAPACITY ACCESSMODES STATUS CLAIM REASON pv0001 map[] 10737418240 RWO Bound myclaim-1 / f4b3d283-c0ef-11e4-8be4-80e6500a981e -- cgit v1.2.3 From 4cc2b50a4497051324ceacf0d1fd7acb92d274d4 Mon Sep 17 00:00:00 2001 From: Paul Morie Date: Wed, 8 Jul 2015 16:26:20 -0400 Subject: Change remaining instances of hostDir in docs to hostPath --- security.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/security.md b/security.md index 6ea611b7..1f43772e 100644 --- a/security.md +++ b/security.md @@ -54,7 +54,7 @@ Automated process users fall into the following categories: * are less focused on application security. Focused on operating system security. * protect the node from bad actors in containers, and properly-configured innocent containers from bad actors in other containers. * comfortable reasoning about the security properties of a system at the level of detail of Linux Capabilities, SELinux, AppArmor, etc. - * decides who can use which Linux Capabilities, run privileged containers, use hostDir, etc. + * decides who can use which Linux Capabilities, run privileged containers, use hostPath, etc. * e.g. a team that manages Ceph or a mysql server might be trusted to have raw access to storage devices in some organizations, but teams that develop the applications at higher layers would not. -- cgit v1.2.3 From 2fe55a7351c1beb2e07ed9ab470500737d08527f Mon Sep 17 00:00:00 2001 From: Zach Loafman Date: Thu, 2 Jul 2015 08:04:24 -0700 Subject: Update releasing.md with Kubernetes release process This updates releasing.md with actual instructions on how to cut a release, leaving the theory section of that document alone. Along the way, I streamlined tiny bits of the existing process as I was describing them. The instructions are possibly pedantic, but should be executable by anyone at this point, versus taking someone versant in the dark arts. Relies on #10910. Fixes #1883. --- releasing.md | 138 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 135 insertions(+), 3 deletions(-) diff --git a/releasing.md b/releasing.md index 803e321a..b621c526 100644 --- a/releasing.md +++ b/releasing.md @@ -1,7 +1,138 @@ # Releasing Kubernetes -This document explains how to create a Kubernetes release (as in version) and -how the version information gets embedded into the built binaries. +This document explains how to cut a release, and the theory behind it. If you +just want to cut a release and move on with your life, you can stop reading +after the first section. + +## How to cut a Kubernetes release + +Regardless of whether you are cutting a major or minor version, cutting a +release breaks down into four pieces: + +1. Selecting release components. +1. Tagging and merging the release in Git. +1. Building and pushing the binaries. +1. Writing release notes. + +You should progress in this strict order. + +### Building a New Major/Minor Version (`vX.Y.0`) + +#### Selecting Release Components + +When cutting a major/minor release, your first job is to find the branch +point. We cut `vX.Y.0` releases directly from `master`, which is also the the +branch that we have most continuous validation on. Go first to [the main GCE +Jenkins end-to-end job](http://go/k8s-test/job/kubernetes-e2e-gce) and next to [the +Critical Builds page](http://go/k8s-test/view/Critical%20Builds) and hopefully find a +recent Git hash that looks stable across at least `kubernetes-e2e-gce` and +`kubernetes-e2e-gke-ci`. First glance through builds and look for nice solid +rows of green builds, and then check temporally with the other Critical Builds +to make sure they're solid around then as well. Once you find some greens, you +can find the Git hash for a build by looking at the "Console Log", then look for +`githash=`. You should see a line line: + +``` ++ githash=v0.20.2-322-g974377b +``` + +Because Jenkins builds frequently, if you're looking between jobs +(e.g. `kubernetes-e2e-gke-ci` and `kubernetes-e2e-gce`), there may be no single +`githash` that's been run on both jobs. In that case, take the a green +`kubernetes-e2e-gce` build (but please check that it corresponds to a temporally +similar build that's green on `kubernetes-e2e-gke-ci`). Lastly, if you're having +trouble understanding why the GKE continuous integration clusters are failing +and you're trying to cut a release, don't hesistate to contact the GKE +oncall. + +Before proceeding to the next step: +``` +export BRANCHPOINT=v0.20.2-322-g974377b +``` +Where `v0.20.2-322-g974377b` is the git hash you decided on. This will become +our (retroactive) branch point. + +#### Branching, Tagging and Merging +Do the following: + +1. `export VER=x.y` (e.g. `0.20` for v0.20) +1. cd to the base of the repo +1. `git fetch upstream && git checkout -b release-${VER} ${BRANCHPOINT}` (you did set `${BRANCHPOINT}`, right?) +1. Make sure you don't have any files you care about littering your repo (they + better be checked in or outside the repo, or the next step will delete them). +1. `make clean && git reset --hard HEAD && git clean -xdf` +1. `make` (TBD: you really shouldn't have to do this, but the swagger output step requires it right now) +1. `./build/mark-new-version.sh v${VER}.0` to mark the new release and get further + instructions. This creates a series of commits on the branch you're working + on (`release-${VER}`), including forking our documentation for the release, + the release version commit (which is then tagged), and the post-release + version commit. +1. Follow the instructions given to you by that script. They are canon for the + remainder of the Git process. If you don't understand something in that + process, please ask! + +**TODO**: how to fix tags, etc., if you have to shift the release branchpoint. + +#### Building and Pushing Binaries + +In your git repo (you still have `${VER}` set from above right?): + +1. `git checkout upstream/master && build/build-official-release.sh v${VER}.0` (the `build-official-release.sh` script is version agnostic, so it's best to run it off `master` directly). +1. Follow the instructions given to you by that script. +1. At this point, you've done all the Git bits, you've got all the binary bits pushed, and you've got the template for the release started on GitHub. + +#### Writing Release Notes + +[This helpful guide](making-release-notes.md) describes how to write release +notes for a major/minor release. In the release template on GitHub, leave the +last PR number that the tool finds for the `.0` release, so the next releaser +doesn't have to hunt. + +### Building a New Patch Release (`vX.Y.Z` for `Z > 0`) + +#### Selecting Release Components + +We cut `vX.Y.Z` releases from the `release-vX.Y` branch after all cherry picks +to the branch have been resolved. You should ensure all outstanding cherry picks +have been reviewed and merged and the branch validated on Jenkins (validation +TBD). See the [Cherry Picks](cherry-picks.md) for more information on how to +manage cherry picks prior to cutting the release. + +#### Tagging and Merging + +Do the following (you still have `${VER}` set and you're still working on the +`release-${VER}` branch, right?): + +1. `export PATCH=Z` where `Z` is the patch level of `vX.Y.Z` +1. `make` (TBD: you really shouldn't have to do this, but the swagger output step requires it right now) +1. `./build/mark-new-version.sh v${VER}.${PATCH}` to mark the new release and get further + instructions. This creates a series of commits on the branch you're working + on (`release-${VER}`), including forking our documentation for the release, + the release version commit (which is then tagged), and the post-release + version commit. +1. Follow the instructions given to you by that script. They are canon for the + remainder of the Git process. If you don't understand something in that + process, please ask! + +**TODO**: how to fix tags, etc., if the release is changed. + +#### Building and Pushing Binaries + +In your git repo (you still have `${VER}` and `${PATCH}` set from above right?): + +1. `git checkout upstream/master && build/build-official-release.sh + v${VER}.${PATCH}` (the `build-official-release.sh` script is version + agnostic, so it's best to run it off `master` directly). +1. Follow the instructions given to you by that script. At this point, you've + done all the Git bits, you've got all the binary bits pushed, and you've got + the template for the release started on GitHub. + +#### Writing Release Notes + +Release notes for a patch release are relatives fast: `git log release-${VER}` +(If you followed the procedure in the first section, all the cherry-picks will +have the pull request number in the commit log). Unless there's some reason not +to, just include all the PRs back to the last release. ## Origin of the Sources @@ -116,7 +247,8 @@ We then send PR 100 with both commits in it. Once the PR is accepted, we can use `git tag -a` to create an annotated tag *pointing to the one commit* that has `v0.5` in `pkg/version/base.go` and push it to GitHub. (Unfortunately GitHub tags/releases are not annotated tags, so -this needs to be done from a git client and pushed to GitHub using SSH.) +this needs to be done from a git client and pushed to GitHub using SSH or +HTTPS.) ## Parallel Commits -- cgit v1.2.3 From 4b1d27f1ee0de2e4fdbd83286aea851ad5b29a4c Mon Sep 17 00:00:00 2001 From: Zach Loafman Date: Thu, 9 Jul 2015 14:24:02 -0700 Subject: Add a short doc on cherry picks --- cherry-picks.md | 32 ++++++++++++++++++++++++++++++++ 1 file changed, 32 insertions(+) create mode 100644 cherry-picks.md diff --git a/cherry-picks.md b/cherry-picks.md new file mode 100644 index 00000000..b6669110 --- /dev/null +++ b/cherry-picks.md @@ -0,0 +1,32 @@ +# Overview + +This document explains cherry picks are managed on release branches within the +Kubernetes projects. + +## Propose a Cherry Pick + +Any contributor can propose a cherry pick of any pull request, like so: + +``` +hack/cherry_pick_pull.sh 98765 upstream/release-3.14 +``` + +This will walk you through the steps to propose an automated cherry pick of pull + #98765 for remote branch `upstream/release-3.14`. + +## Cherry Pick Review + +Cherry pick pull requests are reviewed differently than normal pull requests. In +particular, they may be self-merged by the release branch owner without fanfare, +in the case the release branch owner knows the cherry pick was already +requested - this should not be the norm, but it may happen. + +[Contributor License Agreements](../../CONTRIBUTING.md) is considered implicit +for all code within cherry-pick pull requests, ***unless there is a large +conflict***. + +## Searching for Cherry Picks + +Now that we've structured cherry picks as PRs, searching for all cherry-picks +against a release is a GitHub query: For example, +[this query is all of the v0.21.x cherry-picks](https://github.com/GoogleCloudPlatform/kubernetes/pulls?utf8=%E2%9C%93&q=is%3Apr+%22automated+cherry+pick%22+base%3Arelease-0.21) -- cgit v1.2.3 From 3e5d853c22dc580e4c8c75616f5654c3ca10fe6e Mon Sep 17 00:00:00 2001 From: jiangyaoguo Date: Wed, 8 Jul 2015 01:37:40 +0800 Subject: change get minions cmd in docs --- development.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/development.md b/development.md index 2e540bcb..07b61c47 100644 --- a/development.md +++ b/development.md @@ -205,7 +205,7 @@ hack/test-integration.sh ## End-to-End tests -You can run an end-to-end test which will bring up a master and two minions, perform some tests, and then tear everything down. Make sure you have followed the getting started steps for your chosen cloud platform (which might involve changing the `KUBERNETES_PROVIDER` environment variable to something other than "gce". +You can run an end-to-end test which will bring up a master and two nodes, perform some tests, and then tear everything down. Make sure you have followed the getting started steps for your chosen cloud platform (which might involve changing the `KUBERNETES_PROVIDER` environment variable to something other than "gce". ``` cd kubernetes hack/e2e-test.sh -- cgit v1.2.3 From 7c1abe54bef9502d91d4b929497cc2c6d1a85c08 Mon Sep 17 00:00:00 2001 From: jiangyaoguo Date: Wed, 8 Jul 2015 01:37:40 +0800 Subject: change get minions cmd in docs --- clustering.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/clustering.md b/clustering.md index 4cef06f8..442cb4b6 100644 --- a/clustering.md +++ b/clustering.md @@ -41,7 +41,7 @@ The building blocks of an easier solution: * **Move to TLS** We will move to using TLS for all intra-cluster communication. We will explicitly identify the trust chain (the set of trusted CAs) as opposed to trusting the system CAs. We will also use client certificates for all AuthN. * [optional] **API driven CA** Optionally, we will run a CA in the master that will mint certificates for the nodes/kubelets. There will be pluggable policies that will automatically approve certificate requests here as appropriate. * **CA approval policy** This is a pluggable policy object that can automatically approve CA signing requests. Stock policies will include `always-reject`, `queue` and `insecure-always-approve`. With `queue` there would be an API for evaluating and accepting/rejecting requests. Cloud providers could implement a policy here that verifies other out of band information and automatically approves/rejects based on other external factors. -* **Scoped Kubelet Accounts** These accounts are per-minion and (optionally) give a minion permission to register itself. +* **Scoped Kubelet Accounts** These accounts are per-node and (optionally) give a node permission to register itself. * To start with, we'd have the kubelets generate a cert/account in the form of `kubelet:`. To start we would then hard code policy such that we give that particular account appropriate permissions. Over time, we can make the policy engine more generic. * [optional] **Bootstrap API endpoint** This is a helper service hosted outside of the Kubernetes cluster that helps with initial discovery of the master. -- cgit v1.2.3 From 75be32d08ec39a3ac3a5c9d450bd946e96077934 Mon Sep 17 00:00:00 2001 From: dingh Date: Wed, 8 Jul 2015 16:34:07 +0800 Subject: Create schedule_algorithm file This document explains briefly the schedule algorithm of Kubernetes and can be complementary to scheduler.md. --- scheduler_algorithm.md | 36 ++++++++++++++++++++++++++++++++++++ 1 file changed, 36 insertions(+) create mode 100644 scheduler_algorithm.md diff --git a/scheduler_algorithm.md b/scheduler_algorithm.md new file mode 100644 index 00000000..dbd0d7cd --- /dev/null +++ b/scheduler_algorithm.md @@ -0,0 +1,36 @@ +# Scheduler Algorithm in Kubernetes + +For each unscheduled Pod, the Kubernetes scheduler tries to find a node across the cluster according to a set of rules. A general introduction to the Kubernetes scheduler can be found at [docs/devel/scheduler.md](../../docs/devel/scheduler.md). In this document, the algorithm of how to select a node for the Pod is explained. There are two steps before a destination node of a Pod is chosen. The first step is filtering all the nodes and the second is ranking the remaining nodes to find a best fit for the Pod. + +## Filtering the nodes +The purpose of filtering the nodes is to filter out the nodes that do not meet certain requirements of the Pod. For example, if the free resource on a node (measured by the capacity minus the sum of the resource limits of all the Pods that already run on the node) is less than the Pod's required resource, the node should not be considered in the ranking phase so it is filtered out. Currently, there are several "predicates" implementing different filtering policies, including: + +- `NoDiskConflict`: Evaluate if a pod can fit due to the volumes it requests, and those that are already mounted. +- `PodFitsResources`: Check if the free resource (CPU and Memory) meets the requirement of the Pod. The free resource is measured by the capacity minus the sum of limits of all Pods on the node. +- `PodFitsPorts`: Check if any HostPort required by the Pod is already occupied on the node. +- `PodFitsHost`: Filter out all nodes except the one specified in the PodSpec's NodeName field. +- `PodSelectorMatches`: Check if the labels of the node match the labels specified in the Pod's `nodeSelector` field. +- `CheckNodeLabelPresence`: Check if all the specified labels exist on a node or not, regardless of the value. + +The details of the above predicates can be found in [plugin/pkg/scheduler/algorithm/predicates/predicates.go](../../plugin/pkg/scheduler/algorithm/predicates/predicates.go). All predicates mentioned above can be used in combination to perform a sophisticated filtering policy. Kubernetes uses some, but not all, of these predicates by default. You can see which ones are used by default in [plugin/pkg/scheduler/algorithmprovider/defaults/defaults.go](../../plugin/pkg/scheduler/algorithmprovider/defaults/defaults.go). + +## Ranking the nodes + +The filtered nodes are considered suitable to host the Pod, and it is often that there are more than one nodes remaining. Kubernetes prioritizes the remaining nodes to to find the "best" one for the Pod. The prioritization is performed by a set of priority functions. For each remaining node, a priority function gives a score which scales from 0-10 with 10 representing for "most preferred" and 0 for "least preferred". Each priority function is weighted by a positive number and the final score of each node is calculated by adding up all the weighted scores. For example, suppose there are two priority functions, `priorityFunc1` and `priorityFunc2` with weighting factors `weight1` and `weight2` respectively, the final score of some NodeA is: + + finalScoreNodeA = (weight1 * priorityFunc1) + (weight2 * priorityFunc2) + +After the scores of all nodes are calculated, the node with highest score is chosen as the host of the Pod. If there are more than one nodes with equal highest scores, a random one among them is chosen. + +Currently, Kubernetes scheduler provides some practical priority functions, including: + +- `LeastRequestedPriority`: The node is prioritized based on the fraction of the node that would be free if the new Pod were scheduled onto the node. (In other words, (capacity - sum of limits of all Pods already on the node - limit of Pod that is being scheduled) / capacity). CPU and memory are equally weighted. The node with the highest free fraction is the most preferred. Note that this priority function has the effect of spreading Pods across the nodes with respect to resource consumption. +- `CalculateNodeLabelPriority`: Prefer nodes that have the specified label. +- `BalancedResourceAllocation`: This priority function tries to put the Pod on a node such that the CPU and Memory utilization rate is balanced after the Pod is deployed. +- `CalculateSpreadPriority`: Spread Pods by minimizing the number of Pods belonging to the same service on the same node. +- `CalculateAntiAffinityPriority`: Spread Pods by minimizing the number of Pods belonging to the same service on nodes with the same value for a particular label. + +The details of the above priority functions can be found in [plugin/pkg/scheduler/algorithm/priorities](../../plugin/pkg/scheduler/algorithm/priorities). Kubernetes uses some, but not all, of these priority functions by default. You can see which ones are used by default in [/plugin/pkg/scheduler/algorithmprovider/defaults/defaults.go](../../plugin/pkg/scheduler/algorithmprovider/defaults/defaults.go). Similar as predicates, you can combine the above priority functions and assign weight factors (positive number) to them as you want (check [docs/devel/scheduler.md](../../docs/devel/scheduler.md) for how to customize). + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/scheduler_algorithm.md?pixel)]() -- cgit v1.2.3 From 581e4f7b0f6d64d15046061ccdd5addba6dc96c3 Mon Sep 17 00:00:00 2001 From: Daniel Smith Date: Thu, 9 Jul 2015 18:02:10 -0700 Subject: Auto-fixed docs --- developer-guides/vagrant.md | 4 ++-- releasing.md | 2 +- scheduler.md | 2 +- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/developer-guides/vagrant.md b/developer-guides/vagrant.md index d8d7a1ec..a561b446 100644 --- a/developer-guides/vagrant.md +++ b/developer-guides/vagrant.md @@ -9,7 +9,7 @@ Running kubernetes with Vagrant (and VirtualBox) is an easy way to run/test/deve 2. [VMWare Fusion](https://www.vmware.com/products/fusion/) version 5 or greater as well as the appropriate [Vagrant VMWare Fusion provider](https://www.vagrantup.com/vmware) 3. [VMWare Workstation](https://www.vmware.com/products/workstation/) version 9 or greater as well as the [Vagrant VMWare Workstation provider](https://www.vagrantup.com/vmware) 4. [Parallels Desktop](https://www.parallels.com/products/desktop/) version 9 or greater as well as the [Vagrant Parallels provider](https://parallels.github.io/vagrant-parallels/) -3. Get or build a [binary release](/docs/getting-started-guides/binary_release.md) +3. Get or build a [binary release](../../../docs/getting-started-guides/binary_release.md) ### Setup @@ -244,7 +244,7 @@ my-nginx nginx run=my-nginx 3 ``` We did not start any services, hence there are none listed. But we see three replicas displayed properly. -Check the [guestbook](/examples/guestbook/README.md) application to learn how to create a service. +Check the [guestbook](../../../examples/guestbook/README.md) application to learn how to create a service. You can already play with scaling the replicas with: ```sh diff --git a/releasing.md b/releasing.md index 803e321a..fe765244 100644 --- a/releasing.md +++ b/releasing.md @@ -97,7 +97,7 @@ others around it will either have `v0.4-dev` or `v0.5-dev`. The diagram below illustrates it. -![Diagram of git commits involved in the release](./releasing.png) +![Diagram of git commits involved in the release](releasing.png) After working on `v0.4-dev` and merging PR 99 we decide it is time to release `v0.5`. So we start a new branch, create one commit to update diff --git a/scheduler.md b/scheduler.md index ac01e6db..de05b014 100644 --- a/scheduler.md +++ b/scheduler.md @@ -37,7 +37,7 @@ can be overridden by passing the command-line flag `--policy-config-file` to the file specifying which scheduling policies to use. See [examples/scheduler-policy-config.json](../../examples/scheduler-policy-config.json) for an example config file. (Note that the config file format is versioned; the API is defined in -[plugin/pkg/scheduler/api/](../../plugin/pkg/scheduler/api/)). +[plugin/pkg/scheduler/api](../../plugin/pkg/scheduler/api/)). Thus to add a new scheduling policy, you should modify predicates.go or priorities.go, and either register the policy in `defaultPredicates()` or `defaultPriorities()`, or use a policy config file. -- cgit v1.2.3 From 66f367dcbb7a54e35c176b9737419e729b9eabea Mon Sep 17 00:00:00 2001 From: Daniel Smith Date: Thu, 9 Jul 2015 18:02:10 -0700 Subject: Auto-fixed docs --- access.md | 4 ++-- resources.md | 2 +- secrets.md | 6 +++--- security.md | 12 ++++++------ security_context.md | 2 +- service_accounts.md | 4 ++-- 6 files changed, 15 insertions(+), 15 deletions(-) diff --git a/access.md b/access.md index 72ca969c..85b9c8ec 100644 --- a/access.md +++ b/access.md @@ -141,7 +141,7 @@ Improvements: ###Namespaces K8s will have a have a `namespace` API object. It is similar to a Google Compute Engine `project`. It provides a namespace for objects created by a group of people co-operating together, preventing name collisions with non-cooperating groups. It also serves as a reference point for authorization policies. -Namespaces are described in [namespace.md](namespaces.md). +Namespaces are described in [namespaces.md](namespaces.md). In the Enterprise Profile: - a `userAccount` may have permission to access several `namespace`s. @@ -151,7 +151,7 @@ In the Simple Profile: Namespaces versus userAccount vs Labels: - `userAccount`s are intended for audit logging (both name and UID should be logged), and to define who has access to `namespace`s. -- `labels` (see [docs/labels.md](/docs/labels.md)) should be used to distinguish pods, users, and other objects that cooperate towards a common goal but are different in some way, such as version, or responsibilities. +- `labels` (see [docs/labels.md](../../docs/labels.md)) should be used to distinguish pods, users, and other objects that cooperate towards a common goal but are different in some way, such as version, or responsibilities. - `namespace`s prevent name collisions between uncoordinated groups of people, and provide a place to attach common policies for co-operating groups of people. diff --git a/resources.md b/resources.md index 17bb5c18..8c29a1f6 100644 --- a/resources.md +++ b/resources.md @@ -149,7 +149,7 @@ The following are planned future extensions to the resource model, included here ## Usage data -Because resource usage and related metrics change continuously, need to be tracked over time (i.e., historically), can be characterized in a variety of ways, and are fairly voluminous, we will not include usage in core API objects, such as [Pods](pods.md) and Nodes, but will provide separate APIs for accessing and managing that data. See the Appendix for possible representations of usage data, but the representation we'll use is TBD. +Because resource usage and related metrics change continuously, need to be tracked over time (i.e., historically), can be characterized in a variety of ways, and are fairly voluminous, we will not include usage in core API objects, such as [Pods](../pods.md) and Nodes, but will provide separate APIs for accessing and managing that data. See the Appendix for possible representations of usage data, but the representation we'll use is TBD. Singleton values for observed and predicted future usage will rapidly prove inadequate, so we will support the following structure for extended usage information: diff --git a/secrets.md b/secrets.md index d91a950a..979c07f0 100644 --- a/secrets.md +++ b/secrets.md @@ -71,7 +71,7 @@ service would also consume the secrets associated with the MySQL service. ### Use-Case: Secrets associated with service accounts -[Service Accounts](./service_accounts.md) are proposed as a +[Service Accounts](service_accounts.md) are proposed as a mechanism to decouple capabilities and security contexts from individual human users. A `ServiceAccount` contains references to some number of secrets. A `Pod` can specify that it is associated with a `ServiceAccount`. Secrets should have a `Type` field to allow the Kubelet and @@ -241,7 +241,7 @@ memory overcommit on the node. #### Secret data on the node: isolation -Every pod will have a [security context](./security_context.md). +Every pod will have a [security context](security_context.md). Secret data on the node should be isolated according to the security context of the container. The Kubelet volume plugin API will be changed so that a volume plugin receives the security context of a volume along with the volume spec. This will allow volume plugins to implement setting the @@ -253,7 +253,7 @@ Several proposals / upstream patches are notable as background for this proposal 1. [Docker vault proposal](https://github.com/docker/docker/issues/10310) 2. [Specification for image/container standardization based on volumes](https://github.com/docker/docker/issues/9277) -3. [Kubernetes service account proposal](./service_accounts.md) +3. [Kubernetes service account proposal](service_accounts.md) 4. [Secrets proposal for docker (1)](https://github.com/docker/docker/pull/6075) 5. [Secrets proposal for docker (2)](https://github.com/docker/docker/pull/6697) diff --git a/security.md b/security.md index c8f9bec7..4ea7d755 100644 --- a/security.md +++ b/security.md @@ -63,14 +63,14 @@ Automated process users fall into the following categories: A pod runs in a *security context* under a *service account* that is defined by an administrator or project administrator, and the *secrets* a pod has access to is limited by that *service account*. -1. The API should authenticate and authorize user actions [authn and authz](./access.md) +1. The API should authenticate and authorize user actions [authn and authz](access.md) 2. All infrastructure components (kubelets, kube-proxies, controllers, scheduler) should have an infrastructure user that they can authenticate with and be authorized to perform only the functions they require against the API. 3. Most infrastructure components should use the API as a way of exchanging data and changing the system, and only the API should have access to the underlying data store (etcd) -4. When containers run on the cluster and need to talk to other containers or the API server, they should be identified and authorized clearly as an autonomous process via a [service account](./service_accounts.md) +4. When containers run on the cluster and need to talk to other containers or the API server, they should be identified and authorized clearly as an autonomous process via a [service account](service_accounts.md) 1. If the user who started a long-lived process is removed from access to the cluster, the process should be able to continue without interruption 2. If the user who started processes are removed from the cluster, administrators may wish to terminate their processes in bulk 3. When containers run with a service account, the user that created / triggered the service account behavior must be associated with the container's action -5. When container processes run on the cluster, they should run in a [security context](./security_context.md) that isolates those processes via Linux user security, user namespaces, and permissions. +5. When container processes run on the cluster, they should run in a [security context](security_context.md) that isolates those processes via Linux user security, user namespaces, and permissions. 1. Administrators should be able to configure the cluster to automatically confine all container processes as a non-root, randomly assigned UID 2. Administrators should be able to ensure that container processes within the same namespace are all assigned the same unix user UID 3. Administrators should be able to limit which developers and project administrators have access to higher privilege actions @@ -79,7 +79,7 @@ A pod runs in a *security context* under a *service account* that is defined by 6. Developers may need to ensure their images work within higher security requirements specified by administrators 7. When available, Linux kernel user namespaces can be used to ensure 5.2 and 5.4 are met. 8. When application developers want to share filesytem data via distributed filesystems, the Unix user ids on those filesystems must be consistent across different container processes -6. Developers should be able to define [secrets](./secrets.md) that are automatically added to the containers when pods are run +6. Developers should be able to define [secrets](secrets.md) that are automatically added to the containers when pods are run 1. Secrets are files injected into the container whose values should not be displayed within a pod. Examples: 1. An SSH private key for git cloning remote data 2. A client certificate for accessing a remote system @@ -93,11 +93,11 @@ A pod runs in a *security context* under a *service account* that is defined by ### Related design discussion -* [Authorization and authentication](./access.md) +* [Authorization and authentication](access.md) * [Secret distribution via files](https://github.com/GoogleCloudPlatform/kubernetes/pull/2030) * [Docker secrets](https://github.com/docker/docker/pull/6697) * [Docker vault](https://github.com/docker/docker/issues/10310) -* [Service Accounts:](./service_accounts.md) +* [Service Accounts:](service_accounts.md) * [Secret volumes](https://github.com/GoogleCloudPlatform/kubernetes/pull/4126) ## Specific Design Points diff --git a/security_context.md b/security_context.md index fdacb173..61641297 100644 --- a/security_context.md +++ b/security_context.md @@ -32,7 +32,7 @@ Processes in pods will need to have consistent UID/GID/SELinux category labels i * The concept of a security context should not be tied to a particular security mechanism or platform (ie. SELinux, AppArmor) * Applying a different security context to a scope (namespace or pod) requires a solution such as the one proposed for - [service accounts](./service_accounts.md). + [service accounts](service_accounts.md). ## Use Cases diff --git a/service_accounts.md b/service_accounts.md index 63c12a30..bd10336f 100644 --- a/service_accounts.md +++ b/service_accounts.md @@ -21,9 +21,9 @@ They also may interact with services other than the Kubernetes API, such as: A service account binds together several things: - a *name*, understood by users, and perhaps by peripheral systems, for an identity - a *principal* that can be authenticated and [authorized](../authorization.md) - - a [security context](./security_context.md), which defines the Linux Capabilities, User IDs, Groups IDs, and other + - a [security context](security_context.md), which defines the Linux Capabilities, User IDs, Groups IDs, and other capabilities and controls on interaction with the file system and OS. - - a set of [secrets](./secrets.md), which a container may use to + - a set of [secrets](secrets.md), which a container may use to access various networked resources. ## Design Discussion -- cgit v1.2.3 From 7b06b56cdbb460df3dfda0db38c6219af4df0207 Mon Sep 17 00:00:00 2001 From: Daniel Smith Date: Thu, 9 Jul 2015 18:31:29 -0700 Subject: manual fixes --- writing-a-getting-started-guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/writing-a-getting-started-guide.md b/writing-a-getting-started-guide.md index 873fafcc..d7452c09 100644 --- a/writing-a-getting-started-guide.md +++ b/writing-a-getting-started-guide.md @@ -29,7 +29,7 @@ These guidelines say *what* to do. See the Rationale section for *why*. search for uses of flags by guides. - We may ask that you host binary assets or large amounts of code in our `contrib` directory or on your own repo. - - Setup a cluster and run the [conformance test](../../docs/devel/conformance-test.md) against it, and report the + - Setup a cluster and run the [conformance test](../../docs/devel/development.md#conformance-testing) against it, and report the results in your PR. - Add or update a row in [The Matrix](../../docs/getting-started-guides/README.md). - State the binary version of kubernetes that you tested clearly in your Guide doc and in The Matrix. -- cgit v1.2.3 From c36dd173e4ae2fee9d20fa198d118244f681f6b3 Mon Sep 17 00:00:00 2001 From: Daniel Smith Date: Thu, 9 Jul 2015 18:31:29 -0700 Subject: manual fixes --- resources.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/resources.md b/resources.md index 8c29a1f6..bb3c05e9 100644 --- a/resources.md +++ b/resources.md @@ -1,5 +1,5 @@ **Note: this is a design doc, which describes features that have not been completely implemented. -User documentation of the current state is [here](../resources.md). The tracking issue for +User documentation of the current state is [here](../compute_resources.md). The tracking issue for implementation of this model is [#168](https://github.com/GoogleCloudPlatform/kubernetes/issues/168). Currently, only memory and cpu limits on containers (not pods) are supported. "memory" is in bytes and "cpu" is in -- cgit v1.2.3 From 3a38ce4217962abf1ebd37e08ea51c5c857de70e Mon Sep 17 00:00:00 2001 From: Mike Danese Date: Fri, 10 Jul 2015 12:51:35 -0700 Subject: fix verify gendocs --- cherry-picks.md | 3 +++ scheduler_algorithm.md | 2 +- 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/cherry-picks.md b/cherry-picks.md index b6669110..5fbada99 100644 --- a/cherry-picks.md +++ b/cherry-picks.md @@ -30,3 +30,6 @@ conflict***. Now that we've structured cherry picks as PRs, searching for all cherry-picks against a release is a GitHub query: For example, [this query is all of the v0.21.x cherry-picks](https://github.com/GoogleCloudPlatform/kubernetes/pulls?utf8=%E2%9C%93&q=is%3Apr+%22automated+cherry+pick%22+base%3Arelease-0.21) + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/cherry-picks.md?pixel)]() diff --git a/scheduler_algorithm.md b/scheduler_algorithm.md index dbd0d7cd..f353a4ed 100644 --- a/scheduler_algorithm.md +++ b/scheduler_algorithm.md @@ -30,7 +30,7 @@ Currently, Kubernetes scheduler provides some practical priority functions, incl - `CalculateSpreadPriority`: Spread Pods by minimizing the number of Pods belonging to the same service on the same node. - `CalculateAntiAffinityPriority`: Spread Pods by minimizing the number of Pods belonging to the same service on nodes with the same value for a particular label. -The details of the above priority functions can be found in [plugin/pkg/scheduler/algorithm/priorities](../../plugin/pkg/scheduler/algorithm/priorities). Kubernetes uses some, but not all, of these priority functions by default. You can see which ones are used by default in [/plugin/pkg/scheduler/algorithmprovider/defaults/defaults.go](../../plugin/pkg/scheduler/algorithmprovider/defaults/defaults.go). Similar as predicates, you can combine the above priority functions and assign weight factors (positive number) to them as you want (check [docs/devel/scheduler.md](../../docs/devel/scheduler.md) for how to customize). +The details of the above priority functions can be found in [plugin/pkg/scheduler/algorithm/priorities](../../plugin/pkg/scheduler/algorithm/priorities/). Kubernetes uses some, but not all, of these priority functions by default. You can see which ones are used by default in [plugin/pkg/scheduler/algorithmprovider/defaults/defaults.go](../../plugin/pkg/scheduler/algorithmprovider/defaults/defaults.go). Similar as predicates, you can combine the above priority functions and assign weight factors (positive number) to them as you want (check [docs/devel/scheduler.md](../../docs/devel/scheduler.md) for how to customize). [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/scheduler_algorithm.md?pixel)]() -- cgit v1.2.3 From 92e08e130d859bb0a7dad654534906a2a92ed4a3 Mon Sep 17 00:00:00 2001 From: Zach Loafman Date: Fri, 10 Jul 2015 18:43:12 -0700 Subject: Fix patch release instructions Somewhere in the last round of editing, I compressed the patch release instructions after the release validation steps went in. They no longer made sense because they assume some variables are set from the previous step that you don't have set. Set them. These instructions are now begging to be refactored between the patch and normal releases, but I won't do that here. --- releasing.md | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/releasing.md b/releasing.md index 6533858c..9e9fcaf7 100644 --- a/releasing.md +++ b/releasing.md @@ -100,10 +100,13 @@ manage cherry picks prior to cutting the release. #### Tagging and Merging -Do the following (you still have `${VER}` set and you're still working on the -`release-${VER}` branch, right?): - +1. `export VER=x.y` (e.g. `0.20` for v0.20) 1. `export PATCH=Z` where `Z` is the patch level of `vX.Y.Z` +1. cd to the base of the repo +1. `git fetch upstream && git checkout -b upstream/release-${VER}` +1. Make sure you don't have any files you care about littering your repo (they + better be checked in or outside the repo, or the next step will delete them). +1. `make clean && git reset --hard HEAD && git clean -xdf` 1. `make` (TBD: you really shouldn't have to do this, but the swagger output step requires it right now) 1. `./build/mark-new-version.sh v${VER}.${PATCH}` to mark the new release and get further instructions. This creates a series of commits on the branch you're working -- cgit v1.2.3 From ee97e734b55c5f605944bcc4e324bdc09bd4c476 Mon Sep 17 00:00:00 2001 From: Akshay Aurora Date: Sun, 12 Jul 2015 03:49:01 +0530 Subject: Fix formatting in networking.md --- networking.md | 1 + 1 file changed, 1 insertion(+) diff --git a/networking.md b/networking.md index 8bf03437..af64ed8d 100644 --- a/networking.md +++ b/networking.md @@ -1,6 +1,7 @@ # Networking There are 4 distinct networking problems to solve: + 1. Highly-coupled container-to-container communications 2. Pod-to-Pod communications 3. Pod-to-Service communications -- cgit v1.2.3 From b4deb49a719e9d5c7ece5c930dec4ff225409466 Mon Sep 17 00:00:00 2001 From: Ed Costello Date: Sun, 12 Jul 2015 22:03:06 -0400 Subject: Copy edits for typos --- networking.md | 2 +- security.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/networking.md b/networking.md index af64ed8d..210d10e5 100644 --- a/networking.md +++ b/networking.md @@ -128,7 +128,7 @@ to serve the purpose outside of GCE. The [service](../services.md) abstraction provides a way to group pods under a common access policy (e.g. load-balanced). The implementation of this creates a -virtual IP which clients can access and which is transparantly proxied to the +virtual IP which clients can access and which is transparently proxied to the pods in a Service. Each node runs a kube-proxy process which programs `iptables` rules to trap access to service IPs and redirect them to the correct backends. This provides a highly-available load-balancing solution with low diff --git a/security.md b/security.md index 4ea7d755..c2fd092e 100644 --- a/security.md +++ b/security.md @@ -78,7 +78,7 @@ A pod runs in a *security context* under a *service account* that is defined by 5. Developers should be able to run their own images or images from the community and expect those images to run correctly 6. Developers may need to ensure their images work within higher security requirements specified by administrators 7. When available, Linux kernel user namespaces can be used to ensure 5.2 and 5.4 are met. - 8. When application developers want to share filesytem data via distributed filesystems, the Unix user ids on those filesystems must be consistent across different container processes + 8. When application developers want to share filesystem data via distributed filesystems, the Unix user ids on those filesystems must be consistent across different container processes 6. Developers should be able to define [secrets](secrets.md) that are automatically added to the containers when pods are run 1. Secrets are files injected into the container whose values should not be displayed within a pod. Examples: 1. An SSH private key for git cloning remote data -- cgit v1.2.3 From a284d4cf980e30a237d170cd77b8f50c8b251c3b Mon Sep 17 00:00:00 2001 From: Ed Costello Date: Sun, 12 Jul 2015 22:03:06 -0400 Subject: Copy edits for typos --- releasing.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/releasing.md b/releasing.md index 6533858c..9cec89e0 100644 --- a/releasing.md +++ b/releasing.md @@ -21,7 +21,7 @@ You should progress in this strict order. #### Selecting Release Components When cutting a major/minor release, your first job is to find the branch -point. We cut `vX.Y.0` releases directly from `master`, which is also the the +point. We cut `vX.Y.0` releases directly from `master`, which is also the branch that we have most continuous validation on. Go first to [the main GCE Jenkins end-to-end job](http://go/k8s-test/job/kubernetes-e2e-gce) and next to [the Critical Builds page](http://go/k8s-test/view/Critical%20Builds) and hopefully find a @@ -42,7 +42,7 @@ Because Jenkins builds frequently, if you're looking between jobs `kubernetes-e2e-gce` build (but please check that it corresponds to a temporally similar build that's green on `kubernetes-e2e-gke-ci`). Lastly, if you're having trouble understanding why the GKE continuous integration clusters are failing -and you're trying to cut a release, don't hesistate to contact the GKE +and you're trying to cut a release, don't hesitate to contact the GKE oncall. Before proceeding to the next step: -- cgit v1.2.3 From a087970567c98d58292b7c3b8413d2788737d021 Mon Sep 17 00:00:00 2001 From: Ed Costello Date: Sun, 12 Jul 2015 22:03:06 -0400 Subject: Copy edits for typos --- autoscaling.md | 2 +- high-availability.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/autoscaling.md b/autoscaling.md index 3acaf298..b767e132 100644 --- a/autoscaling.md +++ b/autoscaling.md @@ -208,7 +208,7 @@ be specified as "when requests per second fall below 25 for 30 seconds scale the This section has intentionally been left empty. I will defer to folks who have more experience gathering and analyzing time series statistics. -Data aggregation is opaque to the the auto-scaler resource. The auto-scaler is configured to use `AutoScaleThresholds` +Data aggregation is opaque to the auto-scaler resource. The auto-scaler is configured to use `AutoScaleThresholds` that know how to work with the underlying data in order to know if an application must be scaled up or down. Data aggregation must feed a common data structure to ease the development of `AutoScaleThreshold`s but it does not matter to the auto-scaler whether this occurs in a push or pull implementation, whether or not the data is stored at a granular level, diff --git a/high-availability.md b/high-availability.md index 60ccfce6..ece47395 100644 --- a/high-availability.md +++ b/high-availability.md @@ -4,7 +4,7 @@ This document serves as a proposal for high availability of the scheduler and co ## Design Options For complete reference see [this](https://www.ibm.com/developerworks/community/blogs/RohitShetty/entry/high_availability_cold_warm_hot?lang=en) -1. Hot Standby: In this scenario, data and state are shared between the two components such that an immediate failure in one component causes the the standby daemon to take over exactly where the failed component had left off. This would be an ideal solution for kubernetes, however it poses a series of challenges in the case of controllers where component-state is cached locally and not persisted in a transactional way to a storage facility. This would also introduce additional load on the apiserver, which is not desirable. As a result, we are **NOT** planning on this approach at this time. +1. Hot Standby: In this scenario, data and state are shared between the two components such that an immediate failure in one component causes the standby daemon to take over exactly where the failed component had left off. This would be an ideal solution for kubernetes, however it poses a series of challenges in the case of controllers where component-state is cached locally and not persisted in a transactional way to a storage facility. This would also introduce additional load on the apiserver, which is not desirable. As a result, we are **NOT** planning on this approach at this time. 2. **Warm Standby**: In this scenario there is only one active component acting as the master and additional components running but not providing service or responding to requests. Data and state are not shared between the active and standby components. When a failure occurs, the standby component that becomes the master must determine the current state of the system before resuming functionality. This is the approach that this proposal will leverage. -- cgit v1.2.3 From a43748bb04d34c4916fe138a2e8a72e9d2f5914f Mon Sep 17 00:00:00 2001 From: Marek Biskup Date: Mon, 13 Jul 2015 15:09:26 +0200 Subject: kubectl-rolling-update-doc-fix --- simple-rolling-update.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/simple-rolling-update.md b/simple-rolling-update.md index 0208b609..fb21c096 100644 --- a/simple-rolling-update.md +++ b/simple-rolling-update.md @@ -6,7 +6,7 @@ Complete execution flow can be found [here](#execution-details). ### Lightweight rollout Assume that we have a current replication controller named ```foo``` and it is running image ```image:v1``` -```kubectl rolling-update rc foo [foo-v2] --image=myimage:v2``` +```kubectl rolling-update foo [foo-v2] --image=myimage:v2``` If the user doesn't specify a name for the 'next' replication controller, then the 'next' replication controller is renamed to the name of the original replication controller. @@ -27,7 +27,7 @@ To facilitate recovery in the case of a crash of the updating process itself, we Recovery is achieved by issuing the same command again: ``` -kubectl rolling-update rc foo [foo-v2] --image=myimage:v2 +kubectl rolling-update foo [foo-v2] --image=myimage:v2 ``` Whenever the rolling update command executes, the kubectl client looks for replication controllers called ```foo``` and ```foo-next```, if they exist, an attempt is @@ -38,11 +38,11 @@ it is assumed that the rollout is nearly completed, and ```foo-next``` is rename ### Aborting a rollout Abort is assumed to want to reverse a rollout in progress. -```kubectl rolling-update rc foo [foo-v2] --rollback``` +```kubectl rolling-update foo [foo-v2] --rollback``` This is really just semantic sugar for: -```kubectl rolling-update rc foo-v2 foo``` +```kubectl rolling-update foo-v2 foo``` With the added detail that it moves the ```desired-replicas``` annotation from ```foo-v2``` to ```foo``` -- cgit v1.2.3 From eed049cf8d255fd787d10565e8700ba5cd296750 Mon Sep 17 00:00:00 2001 From: Zach Loafman Date: Sat, 11 Jul 2015 17:00:20 -0700 Subject: hack/cherry_pick_pull.sh: Allow multiple pulls Reorder the arguments to allow for multiple pulls at the end: hack/cherry_pick_pull.sh ... This solves some common A-then-immediate-A' cases that appear frequently on head. (There's a workaround, but it's a hack.) Updates the documentation. --- cherry-picks.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/cherry-picks.md b/cherry-picks.md index 5fbada99..2708db93 100644 --- a/cherry-picks.md +++ b/cherry-picks.md @@ -8,7 +8,7 @@ Kubernetes projects. Any contributor can propose a cherry pick of any pull request, like so: ``` -hack/cherry_pick_pull.sh 98765 upstream/release-3.14 +hack/cherry_pick_pull.sh upstream/release-3.14 98765 ``` This will walk you through the steps to propose an automated cherry pick of pull -- cgit v1.2.3 From 0fc797704e628d863d0154a599e191dadfb3ce67 Mon Sep 17 00:00:00 2001 From: Ed Costello Date: Mon, 13 Jul 2015 10:11:07 -0400 Subject: Copy edits to remove doubled words --- service_accounts.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/service_accounts.md b/service_accounts.md index bd10336f..c6ceb6b2 100644 --- a/service_accounts.md +++ b/service_accounts.md @@ -90,7 +90,7 @@ The distinction is useful for a number of reasons: Pod Object. The `secrets` field is a list of references to /secret objects that an process started as that service account should -have access to to be able to assert that role. +have access to be able to assert that role. The secrets are not inline with the serviceAccount object. This way, most or all users can have permission to `GET /serviceAccounts` so they can remind themselves what serviceAccounts are available for use. @@ -150,7 +150,7 @@ then it copies in the referenced securityContext and secrets references for the Second, if ServiceAccount definitions change, it may take some actions. **TODO**: decide what actions it takes when a serviceAccount definition changes. Does it stop pods, or just -allow someone to list ones that out out of spec? In general, people may want to customize this? +allow someone to list ones that are out of spec? In general, people may want to customize this? Third, if a new namespace is created, it may create a new serviceAccount for that namespace. This may include a new username (e.g. `NAMESPACE-default-service-account@serviceaccounts.$CLUSTERID.kubernetes.io`), a new -- cgit v1.2.3 From 5b891f610132f670f1a2bc4cbdbf53ef05180c25 Mon Sep 17 00:00:00 2001 From: Ed Costello Date: Mon, 13 Jul 2015 10:11:07 -0400 Subject: Copy edits to remove doubled words --- api_changes.md | 2 +- scheduler_algorithm.md | 2 +- writing-a-getting-started-guide.md | 4 ++-- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/api_changes.md b/api_changes.md index 17278c6e..de073677 100644 --- a/api_changes.md +++ b/api_changes.md @@ -177,7 +177,7 @@ need to add cases to `pkg/api//defaults.go`. Of course, since you have added code, you have to add a test: `pkg/api//defaults_test.go`. Do use pointers to scalars when you need to distinguish between an unset value -and an an automatic zero value. For example, +and an automatic zero value. For example, `PodSpec.TerminationGracePeriodSeconds` is defined as `*int64` the go type definition. A zero value means 0 seconds, and a nil value asks the system to pick a default. diff --git a/scheduler_algorithm.md b/scheduler_algorithm.md index f353a4ed..2d239f2b 100644 --- a/scheduler_algorithm.md +++ b/scheduler_algorithm.md @@ -16,7 +16,7 @@ The details of the above predicates can be found in [plugin/pkg/scheduler/algori ## Ranking the nodes -The filtered nodes are considered suitable to host the Pod, and it is often that there are more than one nodes remaining. Kubernetes prioritizes the remaining nodes to to find the "best" one for the Pod. The prioritization is performed by a set of priority functions. For each remaining node, a priority function gives a score which scales from 0-10 with 10 representing for "most preferred" and 0 for "least preferred". Each priority function is weighted by a positive number and the final score of each node is calculated by adding up all the weighted scores. For example, suppose there are two priority functions, `priorityFunc1` and `priorityFunc2` with weighting factors `weight1` and `weight2` respectively, the final score of some NodeA is: +The filtered nodes are considered suitable to host the Pod, and it is often that there are more than one nodes remaining. Kubernetes prioritizes the remaining nodes to find the "best" one for the Pod. The prioritization is performed by a set of priority functions. For each remaining node, a priority function gives a score which scales from 0-10 with 10 representing for "most preferred" and 0 for "least preferred". Each priority function is weighted by a positive number and the final score of each node is calculated by adding up all the weighted scores. For example, suppose there are two priority functions, `priorityFunc1` and `priorityFunc2` with weighting factors `weight1` and `weight2` respectively, the final score of some NodeA is: finalScoreNodeA = (weight1 * priorityFunc1) + (weight2 * priorityFunc2) diff --git a/writing-a-getting-started-guide.md b/writing-a-getting-started-guide.md index d7452c09..40852361 100644 --- a/writing-a-getting-started-guide.md +++ b/writing-a-getting-started-guide.md @@ -62,7 +62,7 @@ These guidelines say *what* to do. See the Rationale section for *why*. refactoring and feature additions that affect code for their IaaS. ## Rationale - - We want want people to create Kubernetes clusters with whatever IaaS, Node OS, + - We want people to create Kubernetes clusters with whatever IaaS, Node OS, configuration management tools, and so on, which they are familiar with. The guidelines for **versioned distros** are designed for flexibility. - We want developers to be able to work without understanding all the permutations of @@ -81,7 +81,7 @@ These guidelines say *what* to do. See the Rationale section for *why*. gate commits on passing CI for all distros, and since end-to-end tests are typically somewhat flaky, it would be highly likely for there to be false positives and CI backlogs with many CI pipelines. - We do not require versioned distros to do **CI** for several reasons. It is a steep - learning curve to understand our our automated testing scripts. And it is considerable effort + learning curve to understand our automated testing scripts. And it is considerable effort to fully automate setup and teardown of a cluster, which is needed for CI. And, not everyone has the time and money to run CI. We do not want to discourage people from writing and sharing guides because of this. -- cgit v1.2.3 From d4ee5006858aec1fa1fecff18bfda3dfeeb162ff Mon Sep 17 00:00:00 2001 From: Tim Hockin Date: Sat, 11 Jul 2015 21:04:52 -0700 Subject: Run gendocs and munges --- README.md | 14 ++++++++++++++ access.md | 14 ++++++++++++++ admission_control.md | 14 ++++++++++++++ admission_control_limit_range.md | 14 ++++++++++++++ admission_control_resource_quota.md | 14 ++++++++++++++ architecture.md | 14 ++++++++++++++ clustering.md | 14 ++++++++++++++ clustering/README.md | 14 ++++++++++++++ command_execution_port_forwarding.md | 14 ++++++++++++++ event_compression.md | 14 ++++++++++++++ expansion.md | 14 ++++++++++++++ identifiers.md | 14 ++++++++++++++ namespaces.md | 14 ++++++++++++++ networking.md | 14 ++++++++++++++ persistent-storage.md | 14 ++++++++++++++ principles.md | 14 ++++++++++++++ resources.md | 14 ++++++++++++++ secrets.md | 14 ++++++++++++++ security.md | 14 ++++++++++++++ security_context.md | 14 ++++++++++++++ service_accounts.md | 14 ++++++++++++++ simple-rolling-update.md | 14 ++++++++++++++ 22 files changed, 308 insertions(+) diff --git a/README.md b/README.md index b70c5615..66265b99 100644 --- a/README.md +++ b/README.md @@ -1,3 +1,17 @@ + + + + +

*** PLEASE NOTE: This document applies to the HEAD of the source +tree only. If you are using a released version of Kubernetes, you almost +certainly want the docs that go with that version.

+ +Documentation for specific releases can be found at +[releases.k8s.io](http://releases.k8s.io). + + + + # Kubernetes Design Overview Kubernetes is a system for managing containerized applications across multiple hosts, providing basic mechanisms for deployment, maintenance, and scaling of applications. diff --git a/access.md b/access.md index 85b9c8ec..98bf2bdf 100644 --- a/access.md +++ b/access.md @@ -1,3 +1,17 @@ + + + + +

*** PLEASE NOTE: This document applies to the HEAD of the source +tree only. If you are using a released version of Kubernetes, you almost +certainly want the docs that go with that version.

+ +Documentation for specific releases can be found at +[releases.k8s.io](http://releases.k8s.io). + + + + # K8s Identity and Access Management Sketch This document suggests a direction for identity and access management in the Kubernetes system. diff --git a/admission_control.md b/admission_control.md index 749e949e..4094156b 100644 --- a/admission_control.md +++ b/admission_control.md @@ -1,3 +1,17 @@ + + + + +

*** PLEASE NOTE: This document applies to the HEAD of the source +tree only. If you are using a released version of Kubernetes, you almost +certainly want the docs that go with that version.

+ +Documentation for specific releases can be found at +[releases.k8s.io](http://releases.k8s.io). + + + + # Kubernetes Proposal - Admission Control **Related PR:** diff --git a/admission_control_limit_range.md b/admission_control_limit_range.md index daddb425..c1914478 100644 --- a/admission_control_limit_range.md +++ b/admission_control_limit_range.md @@ -1,3 +1,17 @@ + + + + +

*** PLEASE NOTE: This document applies to the HEAD of the source +tree only. If you are using a released version of Kubernetes, you almost +certainly want the docs that go with that version.

+ +Documentation for specific releases can be found at +[releases.k8s.io](http://releases.k8s.io). + + + + # Admission control plugin: LimitRanger ## Background diff --git a/admission_control_resource_quota.md b/admission_control_resource_quota.md index b2dfbe85..cd9282df 100644 --- a/admission_control_resource_quota.md +++ b/admission_control_resource_quota.md @@ -1,3 +1,17 @@ + + + + +

*** PLEASE NOTE: This document applies to the HEAD of the source +tree only. If you are using a released version of Kubernetes, you almost +certainly want the docs that go with that version.

+ +Documentation for specific releases can be found at +[releases.k8s.io](http://releases.k8s.io). + + + + # Admission control plugin: ResourceQuota ## Background diff --git a/architecture.md b/architecture.md index ebfb4964..6c82896e 100644 --- a/architecture.md +++ b/architecture.md @@ -1,3 +1,17 @@ + + + + +

*** PLEASE NOTE: This document applies to the HEAD of the source +tree only. If you are using a released version of Kubernetes, you almost +certainly want the docs that go with that version.

+ +Documentation for specific releases can be found at +[releases.k8s.io](http://releases.k8s.io). + + + + # Kubernetes architecture A running Kubernetes cluster contains node agents (kubelet) and master components (APIs, scheduler, etc), on top of a distributed storage solution. This diagram shows our desired eventual state, though we're still working on a few things, like making kubelet itself (all our components, really) run within containers, and making the scheduler 100% pluggable. diff --git a/clustering.md b/clustering.md index 4cef06f8..f88157aa 100644 --- a/clustering.md +++ b/clustering.md @@ -1,3 +1,17 @@ + + + + +

*** PLEASE NOTE: This document applies to the HEAD of the source +tree only. If you are using a released version of Kubernetes, you almost +certainly want the docs that go with that version.

+ +Documentation for specific releases can be found at +[releases.k8s.io](http://releases.k8s.io). + + + + # Clustering in Kubernetes diff --git a/clustering/README.md b/clustering/README.md index 09d2c4e1..dfd55e96 100644 --- a/clustering/README.md +++ b/clustering/README.md @@ -1,3 +1,17 @@ + + + + +

*** PLEASE NOTE: This document applies to the HEAD of the source +tree only. If you are using a released version of Kubernetes, you almost +certainly want the docs that go with that version.

+ +Documentation for specific releases can be found at +[releases.k8s.io](http://releases.k8s.io). + + + + This directory contains diagrams for the clustering design doc. This depends on the `seqdiag` [utility](http://blockdiag.com/en/seqdiag/index.html). Assuming you have a non-borked python install, this should be installable with diff --git a/command_execution_port_forwarding.md b/command_execution_port_forwarding.md index 3e548d40..056814e7 100644 --- a/command_execution_port_forwarding.md +++ b/command_execution_port_forwarding.md @@ -1,3 +1,17 @@ + + + + +

*** PLEASE NOTE: This document applies to the HEAD of the source +tree only. If you are using a released version of Kubernetes, you almost +certainly want the docs that go with that version.

+ +Documentation for specific releases can be found at +[releases.k8s.io](http://releases.k8s.io). + + + + # Container Command Execution & Port Forwarding in Kubernetes ## Abstract diff --git a/event_compression.md b/event_compression.md index 74aba66f..4178393c 100644 --- a/event_compression.md +++ b/event_compression.md @@ -1,3 +1,17 @@ + + + + +

*** PLEASE NOTE: This document applies to the HEAD of the source +tree only. If you are using a released version of Kubernetes, you almost +certainly want the docs that go with that version.

+ +Documentation for specific releases can be found at +[releases.k8s.io](http://releases.k8s.io). + + + + # Kubernetes Event Compression This document captures the design of event compression. diff --git a/expansion.md b/expansion.md index 8b31526a..01a774cb 100644 --- a/expansion.md +++ b/expansion.md @@ -1,3 +1,17 @@ + + + + +

*** PLEASE NOTE: This document applies to the HEAD of the source +tree only. If you are using a released version of Kubernetes, you almost +certainly want the docs that go with that version.

+ +Documentation for specific releases can be found at +[releases.k8s.io](http://releases.k8s.io). + + + + # Variable expansion in pod command, args, and env ## Abstract diff --git a/identifiers.md b/identifiers.md index 23b976d3..e192b1ed 100644 --- a/identifiers.md +++ b/identifiers.md @@ -1,3 +1,17 @@ + + + + +

*** PLEASE NOTE: This document applies to the HEAD of the source +tree only. If you are using a released version of Kubernetes, you almost +certainly want the docs that go with that version.

+ +Documentation for specific releases can be found at +[releases.k8s.io](http://releases.k8s.io). + + + + # Identifiers and Names in Kubernetes A summarization of the goals and recommendations for identifiers in Kubernetes. Described in [GitHub issue #199](https://github.com/GoogleCloudPlatform/kubernetes/issues/199). diff --git a/namespaces.md b/namespaces.md index 0fef2bed..547d040b 100644 --- a/namespaces.md +++ b/namespaces.md @@ -1,3 +1,17 @@ + + + + +

*** PLEASE NOTE: This document applies to the HEAD of the source +tree only. If you are using a released version of Kubernetes, you almost +certainly want the docs that go with that version.

+ +Documentation for specific releases can be found at +[releases.k8s.io](http://releases.k8s.io). + + + + # Namespaces ## Abstract diff --git a/networking.md b/networking.md index 210d10e5..5a4a5835 100644 --- a/networking.md +++ b/networking.md @@ -1,3 +1,17 @@ + + + + +

*** PLEASE NOTE: This document applies to the HEAD of the source +tree only. If you are using a released version of Kubernetes, you almost +certainly want the docs that go with that version.

+ +Documentation for specific releases can be found at +[releases.k8s.io](http://releases.k8s.io). + + + + # Networking There are 4 distinct networking problems to solve: diff --git a/persistent-storage.md b/persistent-storage.md index 8e7c6765..9cc92b42 100644 --- a/persistent-storage.md +++ b/persistent-storage.md @@ -1,3 +1,17 @@ + + + + +

*** PLEASE NOTE: This document applies to the HEAD of the source +tree only. If you are using a released version of Kubernetes, you almost +certainly want the docs that go with that version.

+ +Documentation for specific releases can be found at +[releases.k8s.io](http://releases.k8s.io). + + + + # Persistent Storage This document proposes a model for managing persistent, cluster-scoped storage for applications requiring long lived data. diff --git a/principles.md b/principles.md index cf8833a4..e1bd97da 100644 --- a/principles.md +++ b/principles.md @@ -1,3 +1,17 @@ + + + + +

*** PLEASE NOTE: This document applies to the HEAD of the source +tree only. If you are using a released version of Kubernetes, you almost +certainly want the docs that go with that version.

+ +Documentation for specific releases can be found at +[releases.k8s.io](http://releases.k8s.io). + + + + # Design Principles Principles to follow when extending Kubernetes. diff --git a/resources.md b/resources.md index bb3c05e9..9539bed2 100644 --- a/resources.md +++ b/resources.md @@ -1,3 +1,17 @@ + + + + +

*** PLEASE NOTE: This document applies to the HEAD of the source +tree only. If you are using a released version of Kubernetes, you almost +certainly want the docs that go with that version.

+ +Documentation for specific releases can be found at +[releases.k8s.io](http://releases.k8s.io). + + + + **Note: this is a design doc, which describes features that have not been completely implemented. User documentation of the current state is [here](../compute_resources.md). The tracking issue for implementation of this model is diff --git a/secrets.md b/secrets.md index 979c07f0..a6d2591f 100644 --- a/secrets.md +++ b/secrets.md @@ -1,3 +1,17 @@ + + + + +

*** PLEASE NOTE: This document applies to the HEAD of the source +tree only. If you are using a released version of Kubernetes, you almost +certainly want the docs that go with that version.

+ +Documentation for specific releases can be found at +[releases.k8s.io](http://releases.k8s.io). + + + + ## Abstract diff --git a/security.md b/security.md index c2fd092e..1d1373d2 100644 --- a/security.md +++ b/security.md @@ -1,3 +1,17 @@ + + + + +

*** PLEASE NOTE: This document applies to the HEAD of the source +tree only. If you are using a released version of Kubernetes, you almost +certainly want the docs that go with that version.

+ +Documentation for specific releases can be found at +[releases.k8s.io](http://releases.k8s.io). + + + + # Security in Kubernetes Kubernetes should define a reasonable set of security best practices that allows processes to be isolated from each other, from the cluster infrastructure, and which preserves important boundaries between those who manage the cluster, and those who use the cluster. diff --git a/security_context.md b/security_context.md index 61641297..cbf525a8 100644 --- a/security_context.md +++ b/security_context.md @@ -1,3 +1,17 @@ + + + + +

*** PLEASE NOTE: This document applies to the HEAD of the source +tree only. If you are using a released version of Kubernetes, you almost +certainly want the docs that go with that version.

+ +Documentation for specific releases can be found at +[releases.k8s.io](http://releases.k8s.io). + + + + # Security Contexts ## Abstract A security context is a set of constraints that are applied to a container in order to achieve the following goals (from [security design](security.md)): diff --git a/service_accounts.md b/service_accounts.md index c6ceb6b2..896bd68e 100644 --- a/service_accounts.md +++ b/service_accounts.md @@ -1,3 +1,17 @@ + + + + +

*** PLEASE NOTE: This document applies to the HEAD of the source +tree only. If you are using a released version of Kubernetes, you almost +certainly want the docs that go with that version.

+ +Documentation for specific releases can be found at +[releases.k8s.io](http://releases.k8s.io). + + + + #Service Accounts ## Motivation diff --git a/simple-rolling-update.md b/simple-rolling-update.md index fb21c096..45005353 100644 --- a/simple-rolling-update.md +++ b/simple-rolling-update.md @@ -1,3 +1,17 @@ + + + + +

*** PLEASE NOTE: This document applies to the HEAD of the source +tree only. If you are using a released version of Kubernetes, you almost +certainly want the docs that go with that version.

+ +Documentation for specific releases can be found at +[releases.k8s.io](http://releases.k8s.io). + + + + ## Simple rolling update This is a lightweight design document for simple rolling update in ```kubectl``` -- cgit v1.2.3 From 6d684974a6fb198fc27c570c7f5f9684b8aea5f4 Mon Sep 17 00:00:00 2001 From: Tim Hockin Date: Sat, 11 Jul 2015 21:04:52 -0700 Subject: Run gendocs and munges --- autoscaling.md | 14 ++++++++++++++ federation.md | 14 ++++++++++++++ high-availability.md | 14 ++++++++++++++ 3 files changed, 42 insertions(+) diff --git a/autoscaling.md b/autoscaling.md index b767e132..bd8244ab 100644 --- a/autoscaling.md +++ b/autoscaling.md @@ -1,3 +1,17 @@ + + + + +

*** PLEASE NOTE: This document applies to the HEAD of the source +tree only. If you are using a released version of Kubernetes, you almost +certainly want the docs that go with that version.

+ +Documentation for specific releases can be found at +[releases.k8s.io](http://releases.k8s.io). + + + + ## Abstract Auto-scaling is a data-driven feature that allows users to increase or decrease capacity as needed by controlling the number of pods deployed within the system automatically. diff --git a/federation.md b/federation.md index efdd726a..a8e9813b 100644 --- a/federation.md +++ b/federation.md @@ -1,3 +1,17 @@ + + + + +

*** PLEASE NOTE: This document applies to the HEAD of the source +tree only. If you are using a released version of Kubernetes, you almost +certainly want the docs that go with that version.

+ +Documentation for specific releases can be found at +[releases.k8s.io](http://releases.k8s.io). + + + + #Kubernetes Cluster Federation ##(a.k.a. "Ubernetes") diff --git a/high-availability.md b/high-availability.md index ece47395..e7f77288 100644 --- a/high-availability.md +++ b/high-availability.md @@ -1,3 +1,17 @@ + + + + +

*** PLEASE NOTE: This document applies to the HEAD of the source +tree only. If you are using a released version of Kubernetes, you almost +certainly want the docs that go with that version.

+ +Documentation for specific releases can be found at +[releases.k8s.io](http://releases.k8s.io). + + + + # High Availability of Scheduling and Controller Components in Kubernetes This document serves as a proposal for high availability of the scheduler and controller components in kubernetes. This proposal is intended to provide a simple High Availability api for kubernetes components with the potential to extend to services running on kubernetes. Those services would be subject to their own constraints. -- cgit v1.2.3 From 01bb3613a48b76cfb0354376aedc1cfb2077bf1b Mon Sep 17 00:00:00 2001 From: Tim Hockin Date: Sat, 11 Jul 2015 21:04:52 -0700 Subject: Run gendocs and munges --- README.md | 14 ++++++++++++++ api_changes.md | 14 ++++++++++++++ cherry-picks.md | 14 ++++++++++++++ coding-conventions.md | 14 ++++++++++++++ collab.md | 14 ++++++++++++++ developer-guides/vagrant.md | 14 ++++++++++++++ development.md | 14 ++++++++++++++ faster_reviews.md | 14 ++++++++++++++ flaky-tests.md | 14 ++++++++++++++ getting-builds.md | 14 ++++++++++++++ instrumentation.md | 14 ++++++++++++++ issues.md | 14 ++++++++++++++ logging.md | 14 ++++++++++++++ making-release-notes.md | 14 ++++++++++++++ profiling.md | 14 ++++++++++++++ pull-requests.md | 14 ++++++++++++++ releasing.md | 14 ++++++++++++++ scheduler.md | 14 ++++++++++++++ scheduler_algorithm.md | 14 ++++++++++++++ writing-a-getting-started-guide.md | 14 ++++++++++++++ 20 files changed, 280 insertions(+) diff --git a/README.md b/README.md index 5957902f..26eb7ced 100644 --- a/README.md +++ b/README.md @@ -1,3 +1,17 @@ + + + + +

*** PLEASE NOTE: This document applies to the HEAD of the source +tree only. If you are using a released version of Kubernetes, you almost +certainly want the docs that go with that version.

+ +Documentation for specific releases can be found at +[releases.k8s.io](http://releases.k8s.io). + + + + # Developing Kubernetes Docs in this directory relate to developing Kubernetes. diff --git a/api_changes.md b/api_changes.md index de073677..3ad1847d 100644 --- a/api_changes.md +++ b/api_changes.md @@ -1,3 +1,17 @@ + + + + +

*** PLEASE NOTE: This document applies to the HEAD of the source +tree only. If you are using a released version of Kubernetes, you almost +certainly want the docs that go with that version.

+ +Documentation for specific releases can be found at +[releases.k8s.io](http://releases.k8s.io). + + + + # So you want to change the API? The Kubernetes API has two major components - the internal structures and diff --git a/cherry-picks.md b/cherry-picks.md index 2708db93..03f2ebb5 100644 --- a/cherry-picks.md +++ b/cherry-picks.md @@ -1,3 +1,17 @@ + + + + +

*** PLEASE NOTE: This document applies to the HEAD of the source +tree only. If you are using a released version of Kubernetes, you almost +certainly want the docs that go with that version.

+ +Documentation for specific releases can be found at +[releases.k8s.io](http://releases.k8s.io). + + + + # Overview This document explains cherry picks are managed on release branches within the diff --git a/coding-conventions.md b/coding-conventions.md index bdcbb708..e61398ee 100644 --- a/coding-conventions.md +++ b/coding-conventions.md @@ -1,3 +1,17 @@ + + + + +

*** PLEASE NOTE: This document applies to the HEAD of the source +tree only. If you are using a released version of Kubernetes, you almost +certainly want the docs that go with that version.

+ +Documentation for specific releases can be found at +[releases.k8s.io](http://releases.k8s.io). + + + + Coding style advice for contributors - Bash - https://google-styleguide.googlecode.com/svn/trunk/shell.xml diff --git a/collab.md b/collab.md index b424f502..dc12537d 100644 --- a/collab.md +++ b/collab.md @@ -1,3 +1,17 @@ + + + + +

*** PLEASE NOTE: This document applies to the HEAD of the source +tree only. If you are using a released version of Kubernetes, you almost +certainly want the docs that go with that version.

+ +Documentation for specific releases can be found at +[releases.k8s.io](http://releases.k8s.io). + + + + # On Collaborative Development Kubernetes is open source, but many of the people working on it do so as their day job. In order to avoid forcing people to be "at work" effectively 24/7, we want to establish some semi-formal protocols around development. Hopefully these rules make things go more smoothly. If you find that this is not the case, please complain loudly. diff --git a/developer-guides/vagrant.md b/developer-guides/vagrant.md index a561b446..1edf07a6 100644 --- a/developer-guides/vagrant.md +++ b/developer-guides/vagrant.md @@ -1,3 +1,17 @@ + + + + +

*** PLEASE NOTE: This document applies to the HEAD of the source +tree only. If you are using a released version of Kubernetes, you almost +certainly want the docs that go with that version.

+ +Documentation for specific releases can be found at +[releases.k8s.io](http://releases.k8s.io). + + + + ## Getting started with Vagrant Running kubernetes with Vagrant (and VirtualBox) is an easy way to run/test/develop on your local machine (Linux, Mac OS X). diff --git a/development.md b/development.md index 2e540bcb..37a4478a 100644 --- a/development.md +++ b/development.md @@ -1,3 +1,17 @@ + + + + +

*** PLEASE NOTE: This document applies to the HEAD of the source +tree only. If you are using a released version of Kubernetes, you almost +certainly want the docs that go with that version.

+ +Documentation for specific releases can be found at +[releases.k8s.io](http://releases.k8s.io). + + + + # Development Guide # Releases and Official Builds diff --git a/faster_reviews.md b/faster_reviews.md index ed890a7f..99e60fb1 100644 --- a/faster_reviews.md +++ b/faster_reviews.md @@ -1,3 +1,17 @@ + + + + +

*** PLEASE NOTE: This document applies to the HEAD of the source +tree only. If you are using a released version of Kubernetes, you almost +certainly want the docs that go with that version.

+ +Documentation for specific releases can be found at +[releases.k8s.io](http://releases.k8s.io). + + + + # How to get faster PR reviews Most of what is written here is not at all specific to Kubernetes, but it bears diff --git a/flaky-tests.md b/flaky-tests.md index da5549c8..ee93bf19 100644 --- a/flaky-tests.md +++ b/flaky-tests.md @@ -1,3 +1,17 @@ + + + + +

*** PLEASE NOTE: This document applies to the HEAD of the source +tree only. If you are using a released version of Kubernetes, you almost +certainly want the docs that go with that version.

+ +Documentation for specific releases can be found at +[releases.k8s.io](http://releases.k8s.io). + + + + # Hunting flaky tests in Kubernetes Sometimes unit tests are flaky. This means that due to (usually) race conditions, they will occasionally fail, even though most of the time they pass. diff --git a/getting-builds.md b/getting-builds.md index dbad8f3a..5a1a4dde 100644 --- a/getting-builds.md +++ b/getting-builds.md @@ -1,3 +1,17 @@ + + + + +

*** PLEASE NOTE: This document applies to the HEAD of the source +tree only. If you are using a released version of Kubernetes, you almost +certainly want the docs that go with that version.

+ +Documentation for specific releases can be found at +[releases.k8s.io](http://releases.k8s.io). + + + + # Getting Kubernetes Builds You can use [hack/get-build.sh](../../hack/get-build.sh) to or use as a reference on how to get the most recent builds with curl. With `get-build.sh` you can grab the most recent stable build, the most recent release candidate, or the most recent build to pass our ci and gce e2e tests (essentially a nightly build). diff --git a/instrumentation.md b/instrumentation.md index b52480d2..762d1980 100644 --- a/instrumentation.md +++ b/instrumentation.md @@ -1,3 +1,17 @@ + + + + +

*** PLEASE NOTE: This document applies to the HEAD of the source +tree only. If you are using a released version of Kubernetes, you almost +certainly want the docs that go with that version.

+ +Documentation for specific releases can be found at +[releases.k8s.io](http://releases.k8s.io). + + + + Instrumenting Kubernetes with a new metric =================== diff --git a/issues.md b/issues.md index 99e1089a..62444185 100644 --- a/issues.md +++ b/issues.md @@ -1,3 +1,17 @@ + + + + +

*** PLEASE NOTE: This document applies to the HEAD of the source +tree only. If you are using a released version of Kubernetes, you almost +certainly want the docs that go with that version.

+ +Documentation for specific releases can be found at +[releases.k8s.io](http://releases.k8s.io). + + + + GitHub Issues for the Kubernetes Project ======================================== diff --git a/logging.md b/logging.md index 331eda97..1ca18718 100644 --- a/logging.md +++ b/logging.md @@ -1,3 +1,17 @@ + + + + +

*** PLEASE NOTE: This document applies to the HEAD of the source +tree only. If you are using a released version of Kubernetes, you almost +certainly want the docs that go with that version.

+ +Documentation for specific releases can be found at +[releases.k8s.io](http://releases.k8s.io). + + + + Logging Conventions =================== diff --git a/making-release-notes.md b/making-release-notes.md index 5d08ac50..0dfbeebe 100644 --- a/making-release-notes.md +++ b/making-release-notes.md @@ -1,3 +1,17 @@ + + + + +

*** PLEASE NOTE: This document applies to the HEAD of the source +tree only. If you are using a released version of Kubernetes, you almost +certainly want the docs that go with that version.

+ +Documentation for specific releases can be found at +[releases.k8s.io](http://releases.k8s.io). + + + + ## Making release notes This documents the process for making release notes for a release. diff --git a/profiling.md b/profiling.md index 1dd42095..51635424 100644 --- a/profiling.md +++ b/profiling.md @@ -1,3 +1,17 @@ + + + + +

*** PLEASE NOTE: This document applies to the HEAD of the source +tree only. If you are using a released version of Kubernetes, you almost +certainly want the docs that go with that version.

+ +Documentation for specific releases can be found at +[releases.k8s.io](http://releases.k8s.io). + + + + # Profiling Kubernetes This document explain how to plug in profiler and how to profile Kubernetes services. diff --git a/pull-requests.md b/pull-requests.md index 1b5c30e6..e82d2d00 100644 --- a/pull-requests.md +++ b/pull-requests.md @@ -1,3 +1,17 @@ + + + + +

*** PLEASE NOTE: This document applies to the HEAD of the source +tree only. If you are using a released version of Kubernetes, you almost +certainly want the docs that go with that version.

+ +Documentation for specific releases can be found at +[releases.k8s.io](http://releases.k8s.io). + + + + Pull Request Process ==================== diff --git a/releasing.md b/releasing.md index 9cec89e0..a83f6677 100644 --- a/releasing.md +++ b/releasing.md @@ -1,3 +1,17 @@ + + + + +

*** PLEASE NOTE: This document applies to the HEAD of the source +tree only. If you are using a released version of Kubernetes, you almost +certainly want the docs that go with that version.

+ +Documentation for specific releases can be found at +[releases.k8s.io](http://releases.k8s.io). + + + + # Releasing Kubernetes This document explains how to cut a release, and the theory behind it. If you diff --git a/scheduler.md b/scheduler.md index de05b014..3e1ae0e1 100644 --- a/scheduler.md +++ b/scheduler.md @@ -1,3 +1,17 @@ + + + + +

*** PLEASE NOTE: This document applies to the HEAD of the source +tree only. If you are using a released version of Kubernetes, you almost +certainly want the docs that go with that version.

+ +Documentation for specific releases can be found at +[releases.k8s.io](http://releases.k8s.io). + + + + # The Kubernetes Scheduler diff --git a/scheduler_algorithm.md b/scheduler_algorithm.md index 2d239f2b..96789422 100644 --- a/scheduler_algorithm.md +++ b/scheduler_algorithm.md @@ -1,3 +1,17 @@ + + + + +

*** PLEASE NOTE: This document applies to the HEAD of the source +tree only. If you are using a released version of Kubernetes, you almost +certainly want the docs that go with that version.

+ +Documentation for specific releases can be found at +[releases.k8s.io](http://releases.k8s.io). + + + + # Scheduler Algorithm in Kubernetes For each unscheduled Pod, the Kubernetes scheduler tries to find a node across the cluster according to a set of rules. A general introduction to the Kubernetes scheduler can be found at [docs/devel/scheduler.md](../../docs/devel/scheduler.md). In this document, the algorithm of how to select a node for the Pod is explained. There are two steps before a destination node of a Pod is chosen. The first step is filtering all the nodes and the second is ranking the remaining nodes to find a best fit for the Pod. diff --git a/writing-a-getting-started-guide.md b/writing-a-getting-started-guide.md index 40852361..7b94d9a3 100644 --- a/writing-a-getting-started-guide.md +++ b/writing-a-getting-started-guide.md @@ -1,3 +1,17 @@ + + + + +

*** PLEASE NOTE: This document applies to the HEAD of the source +tree only. If you are using a released version of Kubernetes, you almost +certainly want the docs that go with that version.

+ +Documentation for specific releases can be found at +[releases.k8s.io](http://releases.k8s.io). + + + + # Writing a Getting Started Guide This page gives some advice for anyone planning to write or update a Getting Started Guide for Kubernetes. It also gives some guidelines which reviewers should follow when reviewing a pull request for a -- cgit v1.2.3 From 8601b6ff40148c7be7a02a4a70ccfd1d9e231c33 Mon Sep 17 00:00:00 2001 From: Eric Tune Date: Mon, 13 Jul 2015 11:11:34 -0700 Subject: Remove colon from end of doc heading. --- secrets.md | 2 +- security.md | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/secrets.md b/secrets.md index a6d2591f..c1643a9d 100644 --- a/secrets.md +++ b/secrets.md @@ -261,7 +261,7 @@ Kubelet volume plugin API will be changed so that a volume plugin receives the s a volume along with the volume spec. This will allow volume plugins to implement setting the security context of volumes they manage. -## Community work: +## Community work Several proposals / upstream patches are notable as background for this proposal: diff --git a/security.md b/security.md index 1d1373d2..90dc3237 100644 --- a/security.md +++ b/security.md @@ -32,7 +32,7 @@ While Kubernetes today is not primarily a multi-tenant system, the long term evo ## Use cases -### Roles: +### Roles We define "user" as a unique identity accessing the Kubernetes API server, which may be a human or an automated process. Human users fall into the following categories: @@ -46,7 +46,7 @@ Automated process users fall into the following categories: 2. k8s infrastructure user - the user that kubernetes infrastructure components use to perform cluster functions with clearly defined roles -### Description of roles: +### Description of roles * Developers: * write pod specs. -- cgit v1.2.3 From 15b47283e8e299be1b160048ed7e10192dca5991 Mon Sep 17 00:00:00 2001 From: Eric Tune Date: Mon, 13 Jul 2015 11:11:34 -0700 Subject: Remove colon from end of doc heading. --- high-availability.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/high-availability.md b/high-availability.md index e7f77288..4525f709 100644 --- a/high-availability.md +++ b/high-availability.md @@ -56,7 +56,7 @@ There is a short window after a new master acquires the lease, during which data 5. When the API server makes the corresponding write to etcd, it includes it in a transaction that does a compare-and-swap on the "current master" entry (old value == new value == host:port and sequence number from the replica that sent the mutating operation). This basically guarantees that if we elect the new master, all transactions coming from the old master will fail. You can think of this as the master attaching a "precondition" of its belief about who is the latest master. -## Open Questions: +## Open Questions * Is there a desire to keep track of all nodes for a specific component type? -- cgit v1.2.3 From 37813afc4bc36b2f617cdac0233e1d02b45352eb Mon Sep 17 00:00:00 2001 From: Tim Hockin Date: Sun, 12 Jul 2015 21:15:58 -0700 Subject: Change 'minion' to 'node' in docs --- developer-guides/vagrant.md | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/developer-guides/vagrant.md b/developer-guides/vagrant.md index 1edf07a6..1316e26b 100644 --- a/developer-guides/vagrant.md +++ b/developer-guides/vagrant.md @@ -27,7 +27,7 @@ Running kubernetes with Vagrant (and VirtualBox) is an easy way to run/test/deve ### Setup -By default, the Vagrant setup will create a single kubernetes-master and 1 kubernetes-minion. Each VM will take 1 GB, so make sure you have at least 2GB to 4GB of free memory (plus appropriate free disk space). To start your local cluster, open a shell and run: +By default, the Vagrant setup will create a single master VM (called kubernetes-master) and one node (called kubernetes-minion-1). Each VM will take 1 GB, so make sure you have at least 2GB to 4GB of free memory (plus appropriate free disk space). To start your local cluster, open a shell and run: ```sh cd kubernetes @@ -77,7 +77,7 @@ vagrant ssh master [vagrant@kubernetes-master ~] $ sudo systemctl status nginx ``` -To view the services on any of the kubernetes-minion(s): +To view the services on any of the nodes: ```sh vagrant ssh minion-1 [vagrant@kubernetes-minion-1] $ sudo systemctl status docker @@ -312,20 +312,20 @@ cat ~/.kubernetes_vagrant_auth #### I just created the cluster, but I do not see my container running! -If this is your first time creating the cluster, the kubelet on each minion schedules a number of docker pull requests to fetch prerequisite images. This can take some time and as a result may delay your initial pod getting provisioned. +If this is your first time creating the cluster, the kubelet on each node schedules a number of docker pull requests to fetch prerequisite images. This can take some time and as a result may delay your initial pod getting provisioned. #### I changed Kubernetes code, but it's not running! Are you sure there was no build error? After running `$ vagrant provision`, scroll up and ensure that each Salt state was completed successfully on each box in the cluster. It's very likely you see a build error due to an error in your source files! -#### I have brought Vagrant up but the minions won't validate! +#### I have brought Vagrant up but the nodes won't validate! -Are you sure you built a release first? Did you install `net-tools`? For more clues, login to one of the minions (`vagrant ssh minion-1`) and inspect the salt minion log (`sudo cat /var/log/salt/minion`). +Are you sure you built a release first? Did you install `net-tools`? For more clues, login to one of the nodes (`vagrant ssh minion-1`) and inspect the salt minion log (`sudo cat /var/log/salt/minion`). -#### I want to change the number of minions! +#### I want to change the number of nodes! -You can control the number of minions that are instantiated via the environment variable `NUM_MINIONS` on your host machine. If you plan to work with replicas, we strongly encourage you to work with enough minions to satisfy your largest intended replica size. If you do not plan to work with replicas, you can save some system resources by running with a single minion. You do this, by setting `NUM_MINIONS` to 1 like so: +You can control the number of nodes that are instantiated via the environment variable `NUM_MINIONS` on your host machine. If you plan to work with replicas, we strongly encourage you to work with enough nodes to satisfy your largest intended replica size. If you do not plan to work with replicas, you can save some system resources by running with a single node. You do this, by setting `NUM_MINIONS` to 1 like so: ```sh export NUM_MINIONS=1 @@ -340,7 +340,7 @@ Just set it to the number of megabytes you would like the machines to have. For export KUBERNETES_MEMORY=2048 ``` -If you need more granular control, you can set the amount of memory for the master and minions independently. For example: +If you need more granular control, you can set the amount of memory for the master and nodes independently. For example: ```sh export KUBERNETES_MASTER_MEMORY=1536 -- cgit v1.2.3 From d3293eb75835fbdb3f50dde82e513eff752ca82d Mon Sep 17 00:00:00 2001 From: Daniel Smith Date: Mon, 13 Jul 2015 17:13:09 -0700 Subject: Apply mungedocs changes --- README.md | 2 ++ access.md | 2 ++ admission_control.md | 2 ++ admission_control_limit_range.md | 2 ++ admission_control_resource_quota.md | 2 ++ architecture.md | 2 ++ clustering.md | 2 ++ clustering/README.md | 3 +++ command_execution_port_forwarding.md | 3 +++ event_compression.md | 2 ++ expansion.md | 3 +++ identifiers.md | 2 ++ namespaces.md | 3 +++ networking.md | 2 ++ persistent-storage.md | 2 ++ principles.md | 2 ++ resources.md | 2 ++ secrets.md | 2 ++ security.md | 2 ++ security_context.md | 3 ++- service_accounts.md | 3 ++- simple-rolling-update.md | 2 ++ 22 files changed, 48 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 66265b99..5a5b0497 100644 --- a/README.md +++ b/README.md @@ -31,4 +31,6 @@ Finally, Kubernetes aspires to be an extensible, pluggable, building-block OSS p For more about the Kubernetes architecture, see [architecture](architecture.md). + [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/README.md?pixel)]() + diff --git a/access.md b/access.md index 98bf2bdf..912f93aa 100644 --- a/access.md +++ b/access.md @@ -262,4 +262,6 @@ Improvements: - Policies to drop logging for high rate trusted API calls, or by users performing audit or other sensitive functions. + [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/access.md?pixel)]() + diff --git a/admission_control.md b/admission_control.md index 4094156b..5870a601 100644 --- a/admission_control.md +++ b/admission_control.md @@ -93,4 +93,6 @@ will ensure the following: If at any step, there is an error, the request is canceled. + [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/admission_control.md?pixel)]() + diff --git a/admission_control_limit_range.md b/admission_control_limit_range.md index c1914478..e5363cea 100644 --- a/admission_control_limit_range.md +++ b/admission_control_limit_range.md @@ -146,4 +146,6 @@ It is expected we will want to define limits for particular pods or containers b To make a **LimitRangeItem** more restrictive, we will intend to add these additional restrictions at a future point in time. + [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/admission_control_limit_range.md?pixel)]() + diff --git a/admission_control_resource_quota.md b/admission_control_resource_quota.md index cd9282df..754e5a00 100644 --- a/admission_control_resource_quota.md +++ b/admission_control_resource_quota.md @@ -167,4 +167,6 @@ services 3 5 ``` + [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/admission_control_resource_quota.md?pixel)]() + diff --git a/architecture.md b/architecture.md index 6c82896e..71d606a1 100644 --- a/architecture.md +++ b/architecture.md @@ -58,4 +58,6 @@ All other cluster-level functions are currently performed by the Controller Mana The [`replicationcontroller`](../replication-controller.md) is a mechanism that is layered on top of the simple [`pod`](../pods.md) API. We eventually plan to port it to a generic plug-in mechanism, once one is implemented. + [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/architecture.md?pixel)]() + diff --git a/clustering.md b/clustering.md index 95ff3ccc..3e9972ce 100644 --- a/clustering.md +++ b/clustering.md @@ -74,4 +74,6 @@ This flow has the admin manually approving the kubelet signing requests. This i ![Dynamic Sequence Diagram](clustering/dynamic.png) + [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/clustering.md?pixel)]() + diff --git a/clustering/README.md b/clustering/README.md index dfd55e96..07dcc7b3 100644 --- a/clustering/README.md +++ b/clustering/README.md @@ -39,4 +39,7 @@ If you are using boot2docker and get warnings about clock skew (or if things are If you have the fswatch utility installed, you can have it monitor the file system and automatically rebuild when files have changed. Just do a `make watch`. + + [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/clustering/README.md?pixel)]() + diff --git a/command_execution_port_forwarding.md b/command_execution_port_forwarding.md index 056814e7..7d110c3f 100644 --- a/command_execution_port_forwarding.md +++ b/command_execution_port_forwarding.md @@ -157,4 +157,7 @@ access. Additional work is required to ensure that multiple command execution or port forwarding connections from different clients are not able to see each other's data. This can most likely be achieved via SELinux labeling and unique process contexts. + + [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/command_execution_port_forwarding.md?pixel)]() + diff --git a/event_compression.md b/event_compression.md index 4178393c..40dc9e52 100644 --- a/event_compression.md +++ b/event_compression.md @@ -92,4 +92,6 @@ This demonstrates what would have been 20 separate entries (indicating schedulin * PR [#4444](https://github.com/GoogleCloudPlatform/kubernetes/pull/4444): Switch events history to use LRU cache instead of map + [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/event_compression.md?pixel)]() + diff --git a/expansion.md b/expansion.md index 01a774cb..4f4511ce 100644 --- a/expansion.md +++ b/expansion.md @@ -399,4 +399,7 @@ spec: restartPolicy: Never ``` + + [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/expansion.md?pixel)]() + diff --git a/identifiers.md b/identifiers.md index e192b1ed..49068cc8 100644 --- a/identifiers.md +++ b/identifiers.md @@ -106,4 +106,6 @@ objectives. 1. This may correspond to Docker's container ID. + [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/identifiers.md?pixel)]() + diff --git a/namespaces.md b/namespaces.md index 547d040b..cd8b5280 100644 --- a/namespaces.md +++ b/namespaces.md @@ -348,4 +348,7 @@ to remove that Namespace from the storage. At this point, all content associated with that Namespace, and the Namespace itself are gone. + + [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/namespaces.md?pixel)]() + diff --git a/networking.md b/networking.md index 5a4a5835..35248a71 100644 --- a/networking.md +++ b/networking.md @@ -190,4 +190,6 @@ External IP assignment would also simplify DNS support (see below). IPv6 would be a nice option, also, but we can't depend on it yet. Docker support is in progress: [Docker issue #2974](https://github.com/dotcloud/docker/issues/2974), [Docker issue #6923](https://github.com/dotcloud/docker/issues/6923), [Docker issue #6975](https://github.com/dotcloud/docker/issues/6975). Additionally, direct ipv6 assignment to instances doesn't appear to be supported by major cloud providers (e.g., AWS EC2, GCE) yet. We'd happily take pull requests from people running Kubernetes on bare metal, though. :-) + [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/networking.md?pixel)]() + diff --git a/persistent-storage.md b/persistent-storage.md index 9cc92b42..585cd281 100644 --- a/persistent-storage.md +++ b/persistent-storage.md @@ -228,4 +228,6 @@ The ```PersistentVolumeClaimBinder``` will reconcile this by removing the claim Admins can script the recycling of released volumes. Future dynamic provisioners will understand how a volume should be recycled. + [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/persistent-storage.md?pixel)]() + diff --git a/principles.md b/principles.md index e1bd97da..5071e89d 100644 --- a/principles.md +++ b/principles.md @@ -69,4 +69,6 @@ TODO * [Eric Raymond's 17 UNIX rules](https://en.wikipedia.org/wiki/Unix_philosophy#Eric_Raymond.E2.80.99s_17_Unix_Rules) + [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/principles.md?pixel)]() + diff --git a/resources.md b/resources.md index 9539bed2..229e9b76 100644 --- a/resources.md +++ b/resources.md @@ -227,4 +227,6 @@ This is the amount of time a container spends accessing disk, including actuator * Compressible? yes + [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/resources.md?pixel)]() + diff --git a/secrets.md b/secrets.md index c1643a9d..2fdee537 100644 --- a/secrets.md +++ b/secrets.md @@ -590,4 +590,6 @@ source. Both containers will have the following files present on their filesyst /etc/secret-volume/password + [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/secrets.md?pixel)]() + diff --git a/security.md b/security.md index 90dc3237..bbb735eb 100644 --- a/security.md +++ b/security.md @@ -131,4 +131,6 @@ The controller manager for Replication Controllers and other future controllers The Kubernetes pod scheduler is responsible for reading data from the pod to fit it onto a node in the cluster. At a minimum, it needs access to view the ID of a pod (to craft the binding), its current state, any resource information necessary to identify placement, and other data relevant to concerns like anti-affinity, zone or region preference, or custom logic. It does not need the ability to modify pods or see other resources, only to create bindings. It should not need the ability to delete bindings unless the scheduler takes control of relocating components on failed hosts (which could be implemented by a separate component that can delete bindings but not create them). The scheduler may need read access to user or project-container information to determine preferential location (underspecified at this time). + [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/security.md?pixel)]() + diff --git a/security_context.md b/security_context.md index cbf525a8..ad83a6bd 100644 --- a/security_context.md +++ b/security_context.md @@ -170,5 +170,6 @@ will be denied by default. In the future the admission plugin will base this de configurable policies that reside within the [service account](https://github.com/GoogleCloudPlatform/kubernetes/pull/2297). - + [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/security_context.md?pixel)]() + diff --git a/service_accounts.md b/service_accounts.md index 896bd68e..61237853 100644 --- a/service_accounts.md +++ b/service_accounts.md @@ -177,5 +177,6 @@ Finally, it may provide an interface to automate creation of new serviceAccounts to GET serviceAccounts to see what has been created. - + [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/service_accounts.md?pixel)]() + diff --git a/simple-rolling-update.md b/simple-rolling-update.md index 45005353..0f2fe9e6 100644 --- a/simple-rolling-update.md +++ b/simple-rolling-update.md @@ -105,4 +105,6 @@ then ```foo-next``` is synthesized using the pattern ```- [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/simple-rolling-update.md?pixel)]() + -- cgit v1.2.3 From d85ea8aa273639ff82bdc5dfe4f137fbf48b8cca Mon Sep 17 00:00:00 2001 From: Daniel Smith Date: Mon, 13 Jul 2015 17:13:09 -0700 Subject: Apply mungedocs changes --- autoscaling.md | 3 ++- federation.md | 2 ++ high-availability.md | 2 ++ 3 files changed, 6 insertions(+), 1 deletion(-) diff --git a/autoscaling.md b/autoscaling.md index bd8244ab..e56a2256 100644 --- a/autoscaling.md +++ b/autoscaling.md @@ -267,5 +267,6 @@ temporarily disable negative decrement thresholds until the deployment process i an auto-scaler to be able to grow capacity during a deployment than to shrink the number of instances precisely. - + [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/autoscaling.md?pixel)]() + diff --git a/federation.md b/federation.md index a8e9813b..e61163db 100644 --- a/federation.md +++ b/federation.md @@ -445,4 +445,6 @@ their primary zookeeper replica? And now how do I do a shared, highly available redis database? + [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/federation.md?pixel)]() + diff --git a/high-availability.md b/high-availability.md index 4525f709..ee03b28e 100644 --- a/high-availability.md +++ b/high-availability.md @@ -60,4 +60,6 @@ There is a short window after a new master acquires the lease, during which data * Is there a desire to keep track of all nodes for a specific component type? + [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/high-availability.md?pixel)]() + -- cgit v1.2.3 From b8889dc9532b5b58504d8b0ab52df2d5c386e449 Mon Sep 17 00:00:00 2001 From: Daniel Smith Date: Mon, 13 Jul 2015 17:13:09 -0700 Subject: Apply mungedocs changes --- README.md | 3 +++ api_changes.md | 2 ++ cherry-picks.md | 2 ++ coding-conventions.md | 3 ++- collab.md | 2 ++ developer-guides/vagrant.md | 2 ++ development.md | 6 ++++-- faster_reviews.md | 3 ++- flaky-tests.md | 2 ++ getting-builds.md | 2 ++ instrumentation.md | 2 ++ issues.md | 2 ++ logging.md | 2 ++ making-release-notes.md | 3 ++- profiling.md | 2 ++ pull-requests.md | 3 +++ releasing.md | 2 ++ scheduler.md | 2 ++ scheduler_algorithm.md | 6 ++++-- writing-a-getting-started-guide.md | 4 +++- 20 files changed, 47 insertions(+), 8 deletions(-) diff --git a/README.md b/README.md index 26eb7ced..6ce86769 100644 --- a/README.md +++ b/README.md @@ -47,4 +47,7 @@ Docs in this directory relate to developing Kubernetes. * **Getting Recent Builds** ([getting-builds.md](getting-builds.md)): How to get recent builds including the latest builds to pass CI. + + [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/README.md?pixel)]() + diff --git a/api_changes.md b/api_changes.md index 3ad1847d..3a0c1991 100644 --- a/api_changes.md +++ b/api_changes.md @@ -356,4 +356,6 @@ the change gets in. If you are unsure, ask. Also make sure that the change gets TODO(smarterclayton): write this. + [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/api_changes.md?pixel)]() + diff --git a/cherry-picks.md b/cherry-picks.md index 03f2ebb5..04811f0b 100644 --- a/cherry-picks.md +++ b/cherry-picks.md @@ -46,4 +46,6 @@ against a release is a GitHub query: For example, [this query is all of the v0.21.x cherry-picks](https://github.com/GoogleCloudPlatform/kubernetes/pulls?utf8=%E2%9C%93&q=is%3Apr+%22automated+cherry+pick%22+base%3Arelease-0.21) + [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/cherry-picks.md?pixel)]() + diff --git a/coding-conventions.md b/coding-conventions.md index e61398ee..54d9aaa6 100644 --- a/coding-conventions.md +++ b/coding-conventions.md @@ -20,5 +20,6 @@ Coding style advice for contributors - https://gist.github.com/lavalamp/4bd23295a9f32706a48f - + [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/coding-conventions.md?pixel)]() + diff --git a/collab.md b/collab.md index dc12537d..d212012f 100644 --- a/collab.md +++ b/collab.md @@ -54,4 +54,6 @@ PRs that are incorrectly judged to be merge-able, may be reverted and subject to Any maintainer or core contributor who wants to review a PR but does not have time immediately may put a hold on a PR simply by saying so on the PR discussion and offering an ETA measured in single-digit days at most. Any PR that has a hold shall not be merged until the person who requested the hold acks the review, withdraws their hold, or is overruled by a preponderance of maintainers. + [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/collab.md?pixel)]() + diff --git a/developer-guides/vagrant.md b/developer-guides/vagrant.md index 1316e26b..1b716648 100644 --- a/developer-guides/vagrant.md +++ b/developer-guides/vagrant.md @@ -351,4 +351,6 @@ export KUBERNETES_MINION_MEMORY=2048 ```vagrant suspend``` seems to mess up the network. It's not supported at this time. + [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/developer-guides/vagrant.md?pixel)]() + diff --git a/development.md b/development.md index 157f49d6..ba9b9897 100644 --- a/development.md +++ b/development.md @@ -281,8 +281,8 @@ go run hack/e2e.go -v -ctl='delete pod foobar' ## Conformance testing End-to-end testing, as described above, is for [development -distributions](../../docs/devel/writing-a-getting-started-guide.md). A conformance test is used on -a [versioned distro](../../docs/devel/writing-a-getting-started-guide.md). +distributions](writing-a-getting-started-guide.md). A conformance test is used on +a [versioned distro](writing-a-getting-started-guide.md). The conformance test runs a subset of the e2e-tests against a manually-created cluster. It does not require support for up/push/down and other operations. To run a conformance test, you need to know the @@ -300,4 +300,6 @@ hack/run-gendocs.sh ``` + [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/development.md?pixel)]() + diff --git a/faster_reviews.md b/faster_reviews.md index 99e60fb1..eb3b25e9 100644 --- a/faster_reviews.md +++ b/faster_reviews.md @@ -190,5 +190,6 @@ a bit of thought into how your work can be made easier to review. If you do these things your PRs will flow much more easily. - + [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/faster_reviews.md?pixel)]() + diff --git a/flaky-tests.md b/flaky-tests.md index ee93bf19..d26fc406 100644 --- a/flaky-tests.md +++ b/flaky-tests.md @@ -76,4 +76,6 @@ If you do a final check for flakes with ```docker ps -a```, ignore tasks that ex Happy flake hunting! + [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/flaky-tests.md?pixel)]() + diff --git a/getting-builds.md b/getting-builds.md index 5a1a4dde..770d486c 100644 --- a/getting-builds.md +++ b/getting-builds.md @@ -35,4 +35,6 @@ gsutil ls gs://kubernetes-release/release # list all official re ``` + [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/getting-builds.md?pixel)]() + diff --git a/instrumentation.md b/instrumentation.md index 762d1980..22cd38e1 100644 --- a/instrumentation.md +++ b/instrumentation.md @@ -47,4 +47,6 @@ https://github.com/prometheus/client_golang/blob/master/prometheus/histogram.go https://github.com/prometheus/client_golang/blob/master/prometheus/summary.go + [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/instrumentation.md?pixel)]() + diff --git a/issues.md b/issues.md index 62444185..d4d1d132 100644 --- a/issues.md +++ b/issues.md @@ -33,4 +33,6 @@ Definitions * untriaged - anything without a priority/X label will be considered untriaged + [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/issues.md?pixel)]() + diff --git a/logging.md b/logging.md index 1ca18718..bf2bd5c8 100644 --- a/logging.md +++ b/logging.md @@ -40,4 +40,6 @@ The following conventions for the glog levels to use. [glog](http://godoc.org/g As per the comments, the practical default level is V(2). Developers and QE environments may wish to run at V(3) or V(4). If you wish to change the log level, you can pass in `-v=X` where X is the desired maximum level to log. + [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/logging.md?pixel)]() + diff --git a/making-release-notes.md b/making-release-notes.md index 0dfbeebe..877c1364 100644 --- a/making-release-notes.md +++ b/making-release-notes.md @@ -47,5 +47,6 @@ With the final markdown all set, cut and paste it to the top of ```CHANGELOG.md` * Press Save. - + [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/making-release-notes.md?pixel)]() + diff --git a/profiling.md b/profiling.md index 51635424..41737414 100644 --- a/profiling.md +++ b/profiling.md @@ -48,4 +48,6 @@ to get 30 sec. CPU profile. To enable contention profiling you need to add line ```rt.SetBlockProfileRate(1)``` in addition to ```m.mux.HandleFunc(...)``` added before (```rt``` stands for ```runtime``` in ```master.go```). This enables 'debug/pprof/block' subpage, which can be used as an input to ```go tool pprof```. + [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/profiling.md?pixel)]() + diff --git a/pull-requests.md b/pull-requests.md index e82d2d00..1c6bbe5f 100644 --- a/pull-requests.md +++ b/pull-requests.md @@ -42,4 +42,7 @@ Once those requirements are met, they will be labeled [ok-to-merge](https://gith These restrictions will be relaxed after v1.0 is released. + + [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/pull-requests.md?pixel)]() + diff --git a/releasing.md b/releasing.md index 29e685cf..5cdbde2f 100644 --- a/releasing.md +++ b/releasing.md @@ -314,4 +314,6 @@ by plain mortals (in a perfect world PR/issue's title would be enough but often it is just too cryptic/geeky/domain-specific that it isn't). + [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/releasing.md?pixel)]() + diff --git a/scheduler.md b/scheduler.md index 3e1ae0e1..d9fccefc 100644 --- a/scheduler.md +++ b/scheduler.md @@ -61,4 +61,6 @@ If you want to get a global picture of how the scheduler works, you can start in [plugin/cmd/kube-scheduler/app/server.go](../../plugin/cmd/kube-scheduler/app/server.go) + [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/scheduler.md?pixel)]() + diff --git a/scheduler_algorithm.md b/scheduler_algorithm.md index 96789422..119b0c86 100644 --- a/scheduler_algorithm.md +++ b/scheduler_algorithm.md @@ -14,7 +14,7 @@ certainly want the docs that go with that version. # Scheduler Algorithm in Kubernetes -For each unscheduled Pod, the Kubernetes scheduler tries to find a node across the cluster according to a set of rules. A general introduction to the Kubernetes scheduler can be found at [docs/devel/scheduler.md](../../docs/devel/scheduler.md). In this document, the algorithm of how to select a node for the Pod is explained. There are two steps before a destination node of a Pod is chosen. The first step is filtering all the nodes and the second is ranking the remaining nodes to find a best fit for the Pod. +For each unscheduled Pod, the Kubernetes scheduler tries to find a node across the cluster according to a set of rules. A general introduction to the Kubernetes scheduler can be found at [scheduler.md](scheduler.md). In this document, the algorithm of how to select a node for the Pod is explained. There are two steps before a destination node of a Pod is chosen. The first step is filtering all the nodes and the second is ranking the remaining nodes to find a best fit for the Pod. ## Filtering the nodes The purpose of filtering the nodes is to filter out the nodes that do not meet certain requirements of the Pod. For example, if the free resource on a node (measured by the capacity minus the sum of the resource limits of all the Pods that already run on the node) is less than the Pod's required resource, the node should not be considered in the ranking phase so it is filtered out. Currently, there are several "predicates" implementing different filtering policies, including: @@ -44,7 +44,9 @@ Currently, Kubernetes scheduler provides some practical priority functions, incl - `CalculateSpreadPriority`: Spread Pods by minimizing the number of Pods belonging to the same service on the same node. - `CalculateAntiAffinityPriority`: Spread Pods by minimizing the number of Pods belonging to the same service on nodes with the same value for a particular label. -The details of the above priority functions can be found in [plugin/pkg/scheduler/algorithm/priorities](../../plugin/pkg/scheduler/algorithm/priorities/). Kubernetes uses some, but not all, of these priority functions by default. You can see which ones are used by default in [plugin/pkg/scheduler/algorithmprovider/defaults/defaults.go](../../plugin/pkg/scheduler/algorithmprovider/defaults/defaults.go). Similar as predicates, you can combine the above priority functions and assign weight factors (positive number) to them as you want (check [docs/devel/scheduler.md](../../docs/devel/scheduler.md) for how to customize). +The details of the above priority functions can be found in [plugin/pkg/scheduler/algorithm/priorities](../../plugin/pkg/scheduler/algorithm/priorities/). Kubernetes uses some, but not all, of these priority functions by default. You can see which ones are used by default in [plugin/pkg/scheduler/algorithmprovider/defaults/defaults.go](../../plugin/pkg/scheduler/algorithmprovider/defaults/defaults.go). Similar as predicates, you can combine the above priority functions and assign weight factors (positive number) to them as you want (check [scheduler.md](scheduler.md) for how to customize). + [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/scheduler_algorithm.md?pixel)]() + diff --git a/writing-a-getting-started-guide.md b/writing-a-getting-started-guide.md index 7b94d9a3..dec4d9c9 100644 --- a/writing-a-getting-started-guide.md +++ b/writing-a-getting-started-guide.md @@ -43,7 +43,7 @@ These guidelines say *what* to do. See the Rationale section for *why*. search for uses of flags by guides. - We may ask that you host binary assets or large amounts of code in our `contrib` directory or on your own repo. - - Setup a cluster and run the [conformance test](../../docs/devel/development.md#conformance-testing) against it, and report the + - Setup a cluster and run the [conformance test](development.md#conformance-testing) against it, and report the results in your PR. - Add or update a row in [The Matrix](../../docs/getting-started-guides/README.md). - State the binary version of kubernetes that you tested clearly in your Guide doc and in The Matrix. @@ -113,4 +113,6 @@ These guidelines say *what* to do. See the Rationale section for *why*. + [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/writing-a-getting-started-guide.md?pixel)]() + -- cgit v1.2.3 From 9b2fc6d4e38015acae5713a75e7e9e0ea07bb549 Mon Sep 17 00:00:00 2001 From: Daniel Smith Date: Thu, 9 Jul 2015 13:33:48 -0700 Subject: move admin related docs into docs/admin --- README.md | 2 +- architecture.md | 2 +- namespaces.md | 2 +- networking.md | 2 +- service_accounts.md | 2 +- 5 files changed, 5 insertions(+), 5 deletions(-) diff --git a/README.md b/README.md index 5a5b0497..2a7c153c 100644 --- a/README.md +++ b/README.md @@ -24,7 +24,7 @@ Kubernetes enables users to ask a cluster to run a set of containers. The system Kubernetes is intended to run on a number of cloud providers, as well as on physical hosts. -A single Kubernetes cluster is not intended to span multiple availability zones. Instead, we recommend building a higher-level layer to replicate complete deployments of highly available applications across multiple zones (see [the availability doc](../availability.md) and [cluster federation proposal](../proposals/federation.md) for more details). +A single Kubernetes cluster is not intended to span multiple availability zones. Instead, we recommend building a higher-level layer to replicate complete deployments of highly available applications across multiple zones (see [the availability doc](../admin/availability.md) and [cluster federation proposal](../proposals/federation.md) for more details). Finally, Kubernetes aspires to be an extensible, pluggable, building-block OSS platform and toolkit. Therefore, architecturally, we want Kubernetes to be built as a collection of pluggable components and layers, with the ability to use alternative schedulers, controllers, storage systems, and distribution mechanisms, and we're evolving its current code in that direction. Furthermore, we want others to be able to extend Kubernetes functionality, such as with higher-level PaaS functionality or multi-cluster layers, without modification of core Kubernetes source. Therefore, its API isn't just (or even necessarily mainly) targeted at end users, but at tool and extension developers. Its APIs are intended to serve as the foundation for an open ecosystem of tools, automation systems, and higher-level API layers. Consequently, there are no "internal" inter-component APIs. All APIs are visible and available, including the APIs used by the scheduler, the node controller, the replication-controller manager, Kubelet's API, etc. There's no glass to break -- in order to handle more complex use cases, one can just access the lower-level APIs in a fully transparent, composable manner. diff --git a/architecture.md b/architecture.md index 71d606a1..22d61b27 100644 --- a/architecture.md +++ b/architecture.md @@ -33,7 +33,7 @@ The **Kubelet** manages [pods](../pods.md) and their containers, their images, t Each node also runs a simple network proxy and load balancer (see the [services FAQ](https://github.com/GoogleCloudPlatform/kubernetes/wiki/Services-FAQ) for more details). This reflects `services` (see [the services doc](../services.md) for more details) as defined in the Kubernetes API on each node and can do simple TCP and UDP stream forwarding (round robin) across a set of backends. -Service endpoints are currently found via [DNS](../dns.md) or through environment variables (both [Docker-links-compatible](https://docs.docker.com/userguide/dockerlinks/) and Kubernetes {FOO}_SERVICE_HOST and {FOO}_SERVICE_PORT variables are supported). These variables resolve to ports managed by the service proxy. +Service endpoints are currently found via [DNS](../admin/dns.md) or through environment variables (both [Docker-links-compatible](https://docs.docker.com/userguide/dockerlinks/) and Kubernetes {FOO}_SERVICE_HOST and {FOO}_SERVICE_PORT variables are supported). These variables resolve to ports managed by the service proxy. ## The Kubernetes Control Plane diff --git a/namespaces.md b/namespaces.md index cd8b5280..b33b8c4a 100644 --- a/namespaces.md +++ b/namespaces.md @@ -86,7 +86,7 @@ distinguish distinct entities, and reference particular entities across operatio A *Namespace* provides an authorization scope for accessing content associated with the *Namespace*. -See [Authorization plugins](../authorization.md) +See [Authorization plugins](../admin/authorization.md) ### Limit Resource Consumption diff --git a/networking.md b/networking.md index 35248a71..1ebc3d47 100644 --- a/networking.md +++ b/networking.md @@ -129,7 +129,7 @@ a pod tries to egress beyond GCE's project the packets must be SNAT'ed With the primary aim of providing IP-per-pod-model, other implementations exist to serve the purpose outside of GCE. - - [OpenVSwitch with GRE/VxLAN](../ovs-networking.md) + - [OpenVSwitch with GRE/VxLAN](../admin/ovs-networking.md) - [Flannel](https://github.com/coreos/flannel#flannel) - [L2 networks](http://blog.oddbit.com/2014/08/11/four-ways-to-connect-a-docker/) ("With Linux Bridge devices" section) diff --git a/service_accounts.md b/service_accounts.md index 61237853..3b9e6ed9 100644 --- a/service_accounts.md +++ b/service_accounts.md @@ -34,7 +34,7 @@ They also may interact with services other than the Kubernetes API, such as: ## Design Overview A service account binds together several things: - a *name*, understood by users, and perhaps by peripheral systems, for an identity - - a *principal* that can be authenticated and [authorized](../authorization.md) + - a *principal* that can be authenticated and [authorized](../admin/authorization.md) - a [security context](security_context.md), which defines the Linux Capabilities, User IDs, Groups IDs, and other capabilities and controls on interaction with the file system and OS. - a set of [secrets](secrets.md), which a container may use to -- cgit v1.2.3 From 8e5a970d432f598962d97ecd9d6ce4b07d8f79bc Mon Sep 17 00:00:00 2001 From: Daniel Smith Date: Fri, 10 Jul 2015 12:39:25 -0700 Subject: standardize on - instead of _ in file names --- resources.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/resources.md b/resources.md index 229e9b76..437aac09 100644 --- a/resources.md +++ b/resources.md @@ -13,7 +13,7 @@ certainly want the docs that go with that version. **Note: this is a design doc, which describes features that have not been completely implemented. -User documentation of the current state is [here](../compute_resources.md). The tracking issue for +User documentation of the current state is [here](../compute-resources.md). The tracking issue for implementation of this model is [#168](https://github.com/GoogleCloudPlatform/kubernetes/issues/168). Currently, only memory and cpu limits on containers (not pods) are supported. "memory" is in bytes and "cpu" is in -- cgit v1.2.3 From 6cf8654ed142562a5b7f0c7a947fd06925439046 Mon Sep 17 00:00:00 2001 From: Eric Tune Date: Mon, 13 Jul 2015 16:25:16 -0700 Subject: Move versioning.md to design/ -- not user-focused. --- versioning.md | 64 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 64 insertions(+) create mode 100644 versioning.md diff --git a/versioning.md b/versioning.md new file mode 100644 index 00000000..4d17a939 --- /dev/null +++ b/versioning.md @@ -0,0 +1,64 @@ + + + + +

*** PLEASE NOTE: This document applies to the HEAD of the source +tree only. If you are using a released version of Kubernetes, you almost +certainly want the docs that go with that version.

+ +Documentation for specific releases can be found at +[releases.k8s.io](http://releases.k8s.io). + + + + +# Kubernetes API and Release Versioning + +Legend: + +* **Kube <major>.<minor>.<patch>** refers to the version of Kubernetes that is released. This versions all components: apiserver, kubelet, kubectl, etc. +* **API vX[betaY]** refers to the version of the HTTP API. + +## Release Timeline + +### Minor version timeline + +* Kube 1.0.0 +* Kube 1.0.x: We create a 1.0-patch branch and backport critical bugs and security issues to it. Patch releases occur as needed. +* Kube 1.1-alpha1: Cut from HEAD, smoke tested and released two weeks after Kube 1.0's release. Roughly every two weeks a new alpha is released from HEAD. The timeline is flexible; for example, if there is a critical bugfix, a new alpha can be released ahead of schedule. (This applies to the beta and rc releases as well.) +* Kube 1.1-beta1: When HEAD is feature complete, we create a 1.1-snapshot branch and release it as a beta. (The 1.1-snapshot branch may be created earlier if something that definitely won't be in 1.1 needs to be merged to HEAD.) This should occur 6-8 weeks after Kube 1.0. Development continues at HEAD and only fixes are backported to 1.1-snapshot. +* Kube 1.1-rc1: Released from 1.1-snapshot when it is considered stable and ready for testing. Most users should be able to upgrade to this version in production. +* Kube 1.1: Final release. Should occur between 3 and 4 months after 1.0. + +### Major version timeline + +There is no mandated timeline for major versions. They only occur when we need to start the clock on deprecating features. A given major version should be the latest major version for at least one year from its original release date. + +## Release versions as related to API versions + +Here is an example major release cycle: + +* **Kube 1.0 should have API v1 without v1beta\* API versions** + * The last version of Kube before 1.0 (e.g. 0.14 or whatever it is) will have the stable v1 API. This enables you to migrate all your objects off of the beta API versions of the API and allows us to remove those beta API versions in Kube 1.0 with no effect. There will be tooling to help you detect and migrate any v1beta\* data versions or calls to v1 before you do the upgrade. +* **Kube 1.x may have API v2beta*** + * The first incarnation of a new (backwards-incompatible) API in HEAD is v2beta1. By default this will be unregistered in apiserver, so it can change freely. Once it is available by default in apiserver (which may not happen for several minor releases), it cannot change ever again because we serialize objects in versioned form, and we always need to be able to deserialize any objects that are saved in etcd, even between alpha versions. If further changes to v2beta1 need to be made, v2beta2 is created, and so on, in subsequent 1.x versions. +* **Kube 1.y (where y is the last version of the 1.x series) must have final API v2** + * Before Kube 2.0 is cut, API v2 must be released in 1.x. This enables two things: (1) users can upgrade to API v2 when running Kube 1.x and then switch over to Kube 2.x transparently, and (2) in the Kube 2.0 release itself we can cleanup and remove all API v2beta\* versions because no one should have v2beta\* objects left in their database. As mentioned above, tooling will exist to make sure there are no calls or references to a given API version anywhere inside someone's kube installation before someone upgrades. + * Kube 2.0 must include the v1 API, but Kube 3.0 must include the v2 API only. It *may* include the v1 API as well if the burden is not high - this will be determined on a per-major-version basis. + +## Rationale for API v2 being complete before v2.0's release + +It may seem a bit strange to complete the v2 API before v2.0 is released, but *adding* a v2 API is not a breaking change. *Removing* the v2beta\* APIs *is* a breaking change, which is what necessitates the major version bump. There are other ways to do this, but having the major release be the fresh start of that release's API without the baggage of its beta versions seems most intuitive out of the available options. + +# Upgrades + +* Users can upgrade from any Kube 1.x release to any other Kube 1.x release as a rolling upgrade across their cluster. (Rolling upgrade means being able to upgrade the master first, then one node at a time. See #4855 for details.) +* No hard breaking changes over version boundaries. + * For example, if a user is at Kube 1.x, we may require them to upgrade to Kube 1.x+y before upgrading to Kube 2.x. In others words, an upgrade across major versions (e.g. Kube 1.x to Kube 2.x) should effectively be a no-op and as graceful as an upgrade from Kube 1.x to Kube 1.x+1. But you can require someone to go from 1.x to 1.x+y before they go to 2.x. + +There is a separate question of how to track the capabilities of a kubelet to facilitate rolling upgrades. That is not addressed here. + + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/versioning.md?pixel)]() + -- cgit v1.2.3 From 53eee3533f5d877d5dbbca69f7171784dacc7fbb Mon Sep 17 00:00:00 2001 From: Mike Danese Date: Tue, 14 Jul 2015 09:37:37 -0700 Subject: automated link fixes --- access.md | 2 +- architecture.md | 6 +++--- networking.md | 2 +- resources.md | 4 ++-- 4 files changed, 7 insertions(+), 7 deletions(-) diff --git a/access.md b/access.md index 912f93aa..6192792d 100644 --- a/access.md +++ b/access.md @@ -165,7 +165,7 @@ In the Simple Profile: Namespaces versus userAccount vs Labels: - `userAccount`s are intended for audit logging (both name and UID should be logged), and to define who has access to `namespace`s. -- `labels` (see [docs/labels.md](../../docs/labels.md)) should be used to distinguish pods, users, and other objects that cooperate towards a common goal but are different in some way, such as version, or responsibilities. +- `labels` (see [docs/user-guide/labels.md](../../docs/user-guide/labels.md)) should be used to distinguish pods, users, and other objects that cooperate towards a common goal but are different in some way, such as version, or responsibilities. - `namespace`s prevent name collisions between uncoordinated groups of people, and provide a place to attach common policies for co-operating groups of people. diff --git a/architecture.md b/architecture.md index 22d61b27..d2f9d942 100644 --- a/architecture.md +++ b/architecture.md @@ -27,11 +27,11 @@ The Kubernetes node has the services necessary to run application containers and Each node runs Docker, of course. Docker takes care of the details of downloading images and running containers. ### Kubelet -The **Kubelet** manages [pods](../pods.md) and their containers, their images, their volumes, etc. +The **Kubelet** manages [pods](../user-guide/pods.md) and their containers, their images, their volumes, etc. ### Kube-Proxy -Each node also runs a simple network proxy and load balancer (see the [services FAQ](https://github.com/GoogleCloudPlatform/kubernetes/wiki/Services-FAQ) for more details). This reflects `services` (see [the services doc](../services.md) for more details) as defined in the Kubernetes API on each node and can do simple TCP and UDP stream forwarding (round robin) across a set of backends. +Each node also runs a simple network proxy and load balancer (see the [services FAQ](https://github.com/GoogleCloudPlatform/kubernetes/wiki/Services-FAQ) for more details). This reflects `services` (see [the services doc](../user-guide/services.md) for more details) as defined in the Kubernetes API on each node and can do simple TCP and UDP stream forwarding (round robin) across a set of backends. Service endpoints are currently found via [DNS](../admin/dns.md) or through environment variables (both [Docker-links-compatible](https://docs.docker.com/userguide/dockerlinks/) and Kubernetes {FOO}_SERVICE_HOST and {FOO}_SERVICE_PORT variables are supported). These variables resolve to ports managed by the service proxy. @@ -55,7 +55,7 @@ The scheduler binds unscheduled pods to nodes via the `/binding` API. The schedu All other cluster-level functions are currently performed by the Controller Manager. For instance, `Endpoints` objects are created and updated by the endpoints controller, and nodes are discovered, managed, and monitored by the node controller. These could eventually be split into separate components to make them independently pluggable. -The [`replicationcontroller`](../replication-controller.md) is a mechanism that is layered on top of the simple [`pod`](../pods.md) API. We eventually plan to port it to a generic plug-in mechanism, once one is implemented. +The [`replicationcontroller`](../user-guide/replication-controller.md) is a mechanism that is layered on top of the simple [`pod`](../user-guide/pods.md) API. We eventually plan to port it to a generic plug-in mechanism, once one is implemented. diff --git a/networking.md b/networking.md index 1ebc3d47..c13daa1b 100644 --- a/networking.md +++ b/networking.md @@ -140,7 +140,7 @@ to serve the purpose outside of GCE. ## Pod to service -The [service](../services.md) abstraction provides a way to group pods under a +The [service](../user-guide/services.md) abstraction provides a way to group pods under a common access policy (e.g. load-balanced). The implementation of this creates a virtual IP which clients can access and which is transparently proxied to the pods in a Service. Each node runs a kube-proxy process which programs diff --git a/resources.md b/resources.md index 437aac09..fb147fa5 100644 --- a/resources.md +++ b/resources.md @@ -13,7 +13,7 @@ certainly want the docs that go with that version. **Note: this is a design doc, which describes features that have not been completely implemented. -User documentation of the current state is [here](../compute-resources.md). The tracking issue for +User documentation of the current state is [here](../user-guide/compute-resources.md). The tracking issue for implementation of this model is [#168](https://github.com/GoogleCloudPlatform/kubernetes/issues/168). Currently, only memory and cpu limits on containers (not pods) are supported. "memory" is in bytes and "cpu" is in @@ -163,7 +163,7 @@ The following are planned future extensions to the resource model, included here ## Usage data -Because resource usage and related metrics change continuously, need to be tracked over time (i.e., historically), can be characterized in a variety of ways, and are fairly voluminous, we will not include usage in core API objects, such as [Pods](../pods.md) and Nodes, but will provide separate APIs for accessing and managing that data. See the Appendix for possible representations of usage data, but the representation we'll use is TBD. +Because resource usage and related metrics change continuously, need to be tracked over time (i.e., historically), can be characterized in a variety of ways, and are fairly voluminous, we will not include usage in core API objects, such as [Pods](../user-guide/pods.md) and Nodes, but will provide separate APIs for accessing and managing that data. See the Appendix for possible representations of usage data, but the representation we'll use is TBD. Singleton values for observed and predicted future usage will rapidly prove inadequate, so we will support the following structure for extended usage information: -- cgit v1.2.3 From 3ef32f954665ada9c6030f377fa3bb871af72b18 Mon Sep 17 00:00:00 2001 From: Mike Danese Date: Tue, 14 Jul 2015 09:37:37 -0700 Subject: automated link fixes --- autoscaling.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/autoscaling.md b/autoscaling.md index e56a2256..9b6e83b2 100644 --- a/autoscaling.md +++ b/autoscaling.md @@ -35,7 +35,7 @@ done automatically based on statistical analysis and thresholds. * This proposal is for horizontal scaling only. Vertical scaling will be handled in [issue 2072](https://github.com/GoogleCloudPlatform/kubernetes/issues/2072) * `ReplicationControllers` will not know about the auto-scaler, they are the target of the auto-scaler. The `ReplicationController` responsibilities are -constrained to only ensuring that the desired number of pods are operational per the [Replication Controller Design](../replication-controller.md#responsibilities-of-the-replication-controller) +constrained to only ensuring that the desired number of pods are operational per the [Replication Controller Design](../user-guide/replication-controller.md#responsibilities-of-the-replication-controller) * Auto-scalers will be loosely coupled with data gathering components in order to allow a wide variety of input sources * Auto-scalable resources will support a scale verb ([1629](https://github.com/GoogleCloudPlatform/kubernetes/issues/1629)) such that the auto-scaler does not directly manipulate the underlying resource. @@ -56,7 +56,7 @@ applications will expose one or more network endpoints for clients to connect to balanced or situated behind a proxy - the data from those proxies and load balancers can be used to estimate client to server traffic for applications. This is the primary, but not sole, source of data for making decisions. -Within Kubernetes a [kube proxy](../services.md#ips-and-vips) +Within Kubernetes a [kube proxy](../user-guide/services.md#ips-and-vips) running on each node directs service requests to the underlying implementation. While the proxy provides internal inter-pod connections, there will be L3 and L7 proxies and load balancers that manage @@ -239,7 +239,7 @@ or down as appropriate. In the future this may be more configurable. ### Interactions with a deployment -In a deployment it is likely that multiple replication controllers must be monitored. For instance, in a [rolling deployment](../replication-controller.md#rolling-updates) +In a deployment it is likely that multiple replication controllers must be monitored. For instance, in a [rolling deployment](../user-guide/replication-controller.md#rolling-updates) there will be multiple replication controllers, with one scaling up and another scaling down. This means that an auto-scaler must be aware of the entire set of capacity that backs a service so it does not fight with the deployer. `AutoScalerSpec.MonitorSelector` is what provides this ability. By using a selector that spans the entire service the auto-scaler can monitor capacity -- cgit v1.2.3 From be001476aac7c7b13cf50083bfc46a27d9d8f08c Mon Sep 17 00:00:00 2001 From: Tim Hockin Date: Mon, 13 Jul 2015 15:15:35 -0700 Subject: Run gendocs --- README.md | 10 +++++++++- access.md | 10 +++++++++- admission_control.md | 10 +++++++++- admission_control_limit_range.md | 10 +++++++++- admission_control_resource_quota.md | 10 +++++++++- architecture.md | 10 +++++++++- clustering.md | 10 +++++++++- clustering/README.md | 10 +++++++++- command_execution_port_forwarding.md | 10 +++++++++- event_compression.md | 10 +++++++++- expansion.md | 10 +++++++++- identifiers.md | 10 +++++++++- namespaces.md | 10 +++++++++- networking.md | 10 +++++++++- persistent-storage.md | 10 +++++++++- principles.md | 10 +++++++++- resources.md | 10 +++++++++- secrets.md | 10 +++++++++- security.md | 10 +++++++++- security_context.md | 10 +++++++++- service_accounts.md | 10 +++++++++- simple-rolling-update.md | 10 +++++++++- versioning.md | 10 +++++++++- 23 files changed, 207 insertions(+), 23 deletions(-) diff --git a/README.md b/README.md index 2a7c153c..8d98c34a 100644 --- a/README.md +++ b/README.md @@ -2,13 +2,21 @@ -

*** PLEASE NOTE: This document applies to the HEAD of the source +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + +

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + diff --git a/access.md b/access.md index 6192792d..d8060025 100644 --- a/access.md +++ b/access.md @@ -2,13 +2,21 @@ -

*** PLEASE NOTE: This document applies to the HEAD of the source +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + +

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + diff --git a/admission_control.md b/admission_control.md index 5870a601..ac488e6f 100644 --- a/admission_control.md +++ b/admission_control.md @@ -2,13 +2,21 @@ -

*** PLEASE NOTE: This document applies to the HEAD of the source +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + +

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + diff --git a/admission_control_limit_range.md b/admission_control_limit_range.md index e5363cea..b95f87a5 100644 --- a/admission_control_limit_range.md +++ b/admission_control_limit_range.md @@ -2,13 +2,21 @@ -

*** PLEASE NOTE: This document applies to the HEAD of the source +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + +

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + diff --git a/admission_control_resource_quota.md b/admission_control_resource_quota.md index 754e5a00..22825e8d 100644 --- a/admission_control_resource_quota.md +++ b/admission_control_resource_quota.md @@ -2,13 +2,21 @@ -

*** PLEASE NOTE: This document applies to the HEAD of the source +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + +

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + diff --git a/architecture.md b/architecture.md index d2f9d942..5202147f 100644 --- a/architecture.md +++ b/architecture.md @@ -2,13 +2,21 @@ -

*** PLEASE NOTE: This document applies to the HEAD of the source +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + +

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + diff --git a/clustering.md b/clustering.md index 3e9972ce..a2ea1139 100644 --- a/clustering.md +++ b/clustering.md @@ -2,13 +2,21 @@ -

*** PLEASE NOTE: This document applies to the HEAD of the source +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + +

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + diff --git a/clustering/README.md b/clustering/README.md index 07dcc7b3..3e390f37 100644 --- a/clustering/README.md +++ b/clustering/README.md @@ -2,13 +2,21 @@ -

*** PLEASE NOTE: This document applies to the HEAD of the source +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + +

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + diff --git a/command_execution_port_forwarding.md b/command_execution_port_forwarding.md index 7d110c3f..fc06c5d3 100644 --- a/command_execution_port_forwarding.md +++ b/command_execution_port_forwarding.md @@ -2,13 +2,21 @@ -

*** PLEASE NOTE: This document applies to the HEAD of the source +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + +

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + diff --git a/event_compression.md b/event_compression.md index 40dc9e52..d8984e13 100644 --- a/event_compression.md +++ b/event_compression.md @@ -2,13 +2,21 @@ -

*** PLEASE NOTE: This document applies to the HEAD of the source +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + +

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + diff --git a/expansion.md b/expansion.md index 4f4511ce..eb2a78b5 100644 --- a/expansion.md +++ b/expansion.md @@ -2,13 +2,21 @@ -

*** PLEASE NOTE: This document applies to the HEAD of the source +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + +

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + diff --git a/identifiers.md b/identifiers.md index 49068cc8..daadc90f 100644 --- a/identifiers.md +++ b/identifiers.md @@ -2,13 +2,21 @@ -

*** PLEASE NOTE: This document applies to the HEAD of the source +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + +

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + diff --git a/namespaces.md b/namespaces.md index b33b8c4a..ff2ceb91 100644 --- a/namespaces.md +++ b/namespaces.md @@ -2,13 +2,21 @@ -

*** PLEASE NOTE: This document applies to the HEAD of the source +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + +

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + diff --git a/networking.md b/networking.md index c13daa1b..235f8f19 100644 --- a/networking.md +++ b/networking.md @@ -2,13 +2,21 @@ -

*** PLEASE NOTE: This document applies to the HEAD of the source +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + +

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + diff --git a/persistent-storage.md b/persistent-storage.md index 585cd281..5a18fa55 100644 --- a/persistent-storage.md +++ b/persistent-storage.md @@ -2,13 +2,21 @@ -

*** PLEASE NOTE: This document applies to the HEAD of the source +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + +

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + diff --git a/principles.md b/principles.md index 5071e89d..af831a07 100644 --- a/principles.md +++ b/principles.md @@ -2,13 +2,21 @@ -

*** PLEASE NOTE: This document applies to the HEAD of the source +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + +

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + diff --git a/resources.md b/resources.md index fb147fa5..70420ec2 100644 --- a/resources.md +++ b/resources.md @@ -2,13 +2,21 @@ -

*** PLEASE NOTE: This document applies to the HEAD of the source +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + +

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + diff --git a/secrets.md b/secrets.md index 2fdee537..d0728044 100644 --- a/secrets.md +++ b/secrets.md @@ -2,13 +2,21 @@ -

*** PLEASE NOTE: This document applies to the HEAD of the source +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + +

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + diff --git a/security.md b/security.md index bbb735eb..4f9ed395 100644 --- a/security.md +++ b/security.md @@ -2,13 +2,21 @@ -

*** PLEASE NOTE: This document applies to the HEAD of the source +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + +

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + diff --git a/security_context.md b/security_context.md index ad83a6bd..ca12db00 100644 --- a/security_context.md +++ b/security_context.md @@ -2,13 +2,21 @@ -

*** PLEASE NOTE: This document applies to the HEAD of the source +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + +

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + diff --git a/service_accounts.md b/service_accounts.md index 3b9e6ed9..e877b880 100644 --- a/service_accounts.md +++ b/service_accounts.md @@ -2,13 +2,21 @@ -

*** PLEASE NOTE: This document applies to the HEAD of the source +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + +

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + diff --git a/simple-rolling-update.md b/simple-rolling-update.md index 0f2fe9e6..b8473682 100644 --- a/simple-rolling-update.md +++ b/simple-rolling-update.md @@ -2,13 +2,21 @@ -

*** PLEASE NOTE: This document applies to the HEAD of the source +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + +

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + diff --git a/versioning.md b/versioning.md index 4d17a939..fff6cbd7 100644 --- a/versioning.md +++ b/versioning.md @@ -2,13 +2,21 @@ -

*** PLEASE NOTE: This document applies to the HEAD of the source +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + +

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + -- cgit v1.2.3 From c8cc5f5d4a33e2c77e99580849f93c54a2fd1d11 Mon Sep 17 00:00:00 2001 From: Tim Hockin Date: Mon, 13 Jul 2015 15:15:35 -0700 Subject: Run gendocs --- README.md | 10 +++++++++- api_changes.md | 10 +++++++++- cherry-picks.md | 10 +++++++++- coding-conventions.md | 10 +++++++++- collab.md | 10 +++++++++- developer-guides/vagrant.md | 10 +++++++++- development.md | 10 +++++++++- faster_reviews.md | 10 +++++++++- flaky-tests.md | 10 +++++++++- getting-builds.md | 10 +++++++++- instrumentation.md | 10 +++++++++- issues.md | 10 +++++++++- logging.md | 10 +++++++++- making-release-notes.md | 10 +++++++++- profiling.md | 10 +++++++++- pull-requests.md | 10 +++++++++- releasing.md | 10 +++++++++- scheduler.md | 10 +++++++++- scheduler_algorithm.md | 10 +++++++++- writing-a-getting-started-guide.md | 10 +++++++++- 20 files changed, 180 insertions(+), 20 deletions(-) diff --git a/README.md b/README.md index 6ce86769..505e7f34 100644 --- a/README.md +++ b/README.md @@ -2,13 +2,21 @@ -

*** PLEASE NOTE: This document applies to the HEAD of the source +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + +

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + diff --git a/api_changes.md b/api_changes.md index 3a0c1991..d132adf3 100644 --- a/api_changes.md +++ b/api_changes.md @@ -2,13 +2,21 @@ -

*** PLEASE NOTE: This document applies to the HEAD of the source +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + +

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + diff --git a/cherry-picks.md b/cherry-picks.md index 04811f0b..0453102f 100644 --- a/cherry-picks.md +++ b/cherry-picks.md @@ -2,13 +2,21 @@ -

*** PLEASE NOTE: This document applies to the HEAD of the source +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + +

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + diff --git a/coding-conventions.md b/coding-conventions.md index 54d9aaa6..030b3448 100644 --- a/coding-conventions.md +++ b/coding-conventions.md @@ -2,13 +2,21 @@ -

*** PLEASE NOTE: This document applies to the HEAD of the source +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + +

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + diff --git a/collab.md b/collab.md index d212012f..e5fbf24d 100644 --- a/collab.md +++ b/collab.md @@ -2,13 +2,21 @@ -

*** PLEASE NOTE: This document applies to the HEAD of the source +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + +

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + diff --git a/developer-guides/vagrant.md b/developer-guides/vagrant.md index 1b716648..5234e88a 100644 --- a/developer-guides/vagrant.md +++ b/developer-guides/vagrant.md @@ -2,13 +2,21 @@ -

*** PLEASE NOTE: This document applies to the HEAD of the source +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + +

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + diff --git a/development.md b/development.md index ba9b9897..435aac3a 100644 --- a/development.md +++ b/development.md @@ -2,13 +2,21 @@ -

*** PLEASE NOTE: This document applies to the HEAD of the source +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + +

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + diff --git a/faster_reviews.md b/faster_reviews.md index eb3b25e9..8879075e 100644 --- a/faster_reviews.md +++ b/faster_reviews.md @@ -2,13 +2,21 @@ -

*** PLEASE NOTE: This document applies to the HEAD of the source +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + +

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + diff --git a/flaky-tests.md b/flaky-tests.md index d26fc406..fe5af939 100644 --- a/flaky-tests.md +++ b/flaky-tests.md @@ -2,13 +2,21 @@ -

*** PLEASE NOTE: This document applies to the HEAD of the source +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + +

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + diff --git a/getting-builds.md b/getting-builds.md index 770d486c..53193e84 100644 --- a/getting-builds.md +++ b/getting-builds.md @@ -2,13 +2,21 @@ -

*** PLEASE NOTE: This document applies to the HEAD of the source +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + +

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + diff --git a/instrumentation.md b/instrumentation.md index 22cd38e1..39a9d922 100644 --- a/instrumentation.md +++ b/instrumentation.md @@ -2,13 +2,21 @@ -

*** PLEASE NOTE: This document applies to the HEAD of the source +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + +

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + diff --git a/issues.md b/issues.md index d4d1d132..e73dcb1d 100644 --- a/issues.md +++ b/issues.md @@ -2,13 +2,21 @@ -

*** PLEASE NOTE: This document applies to the HEAD of the source +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + +

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + diff --git a/logging.md b/logging.md index bf2bd5c8..68fd98f9 100644 --- a/logging.md +++ b/logging.md @@ -2,13 +2,21 @@ -

*** PLEASE NOTE: This document applies to the HEAD of the source +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + +

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + diff --git a/making-release-notes.md b/making-release-notes.md index 877c1364..482c05a1 100644 --- a/making-release-notes.md +++ b/making-release-notes.md @@ -2,13 +2,21 @@ -

*** PLEASE NOTE: This document applies to the HEAD of the source +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + +

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + diff --git a/profiling.md b/profiling.md index 41737414..7eadfbbe 100644 --- a/profiling.md +++ b/profiling.md @@ -2,13 +2,21 @@ -

*** PLEASE NOTE: This document applies to the HEAD of the source +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + +

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + diff --git a/pull-requests.md b/pull-requests.md index 1c6bbe5f..cf325823 100644 --- a/pull-requests.md +++ b/pull-requests.md @@ -2,13 +2,21 @@ -

*** PLEASE NOTE: This document applies to the HEAD of the source +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + +

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + diff --git a/releasing.md b/releasing.md index 5cdbde2f..3de00293 100644 --- a/releasing.md +++ b/releasing.md @@ -2,13 +2,21 @@ -

*** PLEASE NOTE: This document applies to the HEAD of the source +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + +

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + diff --git a/scheduler.md b/scheduler.md index d9fccefc..3617a1dd 100644 --- a/scheduler.md +++ b/scheduler.md @@ -2,13 +2,21 @@ -

*** PLEASE NOTE: This document applies to the HEAD of the source +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + +

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + diff --git a/scheduler_algorithm.md b/scheduler_algorithm.md index 119b0c86..d5ab280a 100644 --- a/scheduler_algorithm.md +++ b/scheduler_algorithm.md @@ -2,13 +2,21 @@ -

*** PLEASE NOTE: This document applies to the HEAD of the source +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + +

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + diff --git a/writing-a-getting-started-guide.md b/writing-a-getting-started-guide.md index dec4d9c9..bb017814 100644 --- a/writing-a-getting-started-guide.md +++ b/writing-a-getting-started-guide.md @@ -2,13 +2,21 @@ -

*** PLEASE NOTE: This document applies to the HEAD of the source +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + +

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + -- cgit v1.2.3 From b0ed9396a58aca16fd72543b05ec68045c559be5 Mon Sep 17 00:00:00 2001 From: Tim Hockin Date: Mon, 13 Jul 2015 15:15:35 -0700 Subject: Run gendocs --- autoscaling.md | 10 +++++++++- federation.md | 10 +++++++++- high-availability.md | 10 +++++++++- 3 files changed, 27 insertions(+), 3 deletions(-) diff --git a/autoscaling.md b/autoscaling.md index 9b6e83b2..114fd331 100644 --- a/autoscaling.md +++ b/autoscaling.md @@ -2,13 +2,21 @@ -

*** PLEASE NOTE: This document applies to the HEAD of the source +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + +

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + diff --git a/federation.md b/federation.md index e61163db..a21bb46c 100644 --- a/federation.md +++ b/federation.md @@ -2,13 +2,21 @@ -

*** PLEASE NOTE: This document applies to the HEAD of the source +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + +

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + diff --git a/high-availability.md b/high-availability.md index ee03b28e..dd061bb5 100644 --- a/high-availability.md +++ b/high-availability.md @@ -2,13 +2,21 @@ -

*** PLEASE NOTE: This document applies to the HEAD of the source +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + +

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) + -- cgit v1.2.3 From 43bcff9826eb3752ae4ce19c2f76323387de3f9d Mon Sep 17 00:00:00 2001 From: Tim Hockin Date: Tue, 14 Jul 2015 17:28:47 -0700 Subject: Run gendocs --- README.md | 12 ++++++------ access.md | 12 ++++++------ admission_control.md | 12 ++++++------ admission_control_limit_range.md | 12 ++++++------ admission_control_resource_quota.md | 12 ++++++------ architecture.md | 12 ++++++------ clustering.md | 12 ++++++------ clustering/README.md | 12 ++++++------ command_execution_port_forwarding.md | 12 ++++++------ event_compression.md | 12 ++++++------ expansion.md | 12 ++++++------ identifiers.md | 12 ++++++------ namespaces.md | 12 ++++++------ networking.md | 12 ++++++------ persistent-storage.md | 12 ++++++------ principles.md | 12 ++++++------ resources.md | 12 ++++++------ secrets.md | 12 ++++++------ security.md | 12 ++++++------ security_context.md | 12 ++++++------ service_accounts.md | 12 ++++++------ simple-rolling-update.md | 12 ++++++------ versioning.md | 12 ++++++------ 23 files changed, 138 insertions(+), 138 deletions(-) diff --git a/README.md b/README.md index 8d98c34a..1f850ffb 100644 --- a/README.md +++ b/README.md @@ -2,9 +2,9 @@ -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png)

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost @@ -13,9 +13,9 @@ certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) diff --git a/access.md b/access.md index d8060025..c3ac41a0 100644 --- a/access.md +++ b/access.md @@ -2,9 +2,9 @@ -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png)

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost @@ -13,9 +13,9 @@ certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) diff --git a/admission_control.md b/admission_control.md index ac488e6f..a80de2b2 100644 --- a/admission_control.md +++ b/admission_control.md @@ -2,9 +2,9 @@ -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png)

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost @@ -13,9 +13,9 @@ certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) diff --git a/admission_control_limit_range.md b/admission_control_limit_range.md index b95f87a5..125c6d06 100644 --- a/admission_control_limit_range.md +++ b/admission_control_limit_range.md @@ -2,9 +2,9 @@ -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png)

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost @@ -13,9 +13,9 @@ certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) diff --git a/admission_control_resource_quota.md b/admission_control_resource_quota.md index 22825e8d..d80f38bf 100644 --- a/admission_control_resource_quota.md +++ b/admission_control_resource_quota.md @@ -2,9 +2,9 @@ -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png)

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost @@ -13,9 +13,9 @@ certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) diff --git a/architecture.md b/architecture.md index 5202147f..3deeb3aa 100644 --- a/architecture.md +++ b/architecture.md @@ -2,9 +2,9 @@ -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png)

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost @@ -13,9 +13,9 @@ certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) diff --git a/clustering.md b/clustering.md index a2ea1139..e5307fd7 100644 --- a/clustering.md +++ b/clustering.md @@ -2,9 +2,9 @@ -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png)

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost @@ -13,9 +13,9 @@ certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) diff --git a/clustering/README.md b/clustering/README.md index 3e390f37..cf5a3d50 100644 --- a/clustering/README.md +++ b/clustering/README.md @@ -2,9 +2,9 @@ -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png)

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost @@ -13,9 +13,9 @@ certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) diff --git a/command_execution_port_forwarding.md b/command_execution_port_forwarding.md index fc06c5d3..998d1cbd 100644 --- a/command_execution_port_forwarding.md +++ b/command_execution_port_forwarding.md @@ -2,9 +2,9 @@ -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png)

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost @@ -13,9 +13,9 @@ certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) diff --git a/event_compression.md b/event_compression.md index d8984e13..32e52607 100644 --- a/event_compression.md +++ b/event_compression.md @@ -2,9 +2,9 @@ -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png)

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost @@ -13,9 +13,9 @@ certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) diff --git a/expansion.md b/expansion.md index eb2a78b5..f81db3c4 100644 --- a/expansion.md +++ b/expansion.md @@ -2,9 +2,9 @@ -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png)

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost @@ -13,9 +13,9 @@ certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) diff --git a/identifiers.md b/identifiers.md index daadc90f..e66d2d7a 100644 --- a/identifiers.md +++ b/identifiers.md @@ -2,9 +2,9 @@ -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png)

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost @@ -13,9 +13,9 @@ certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) diff --git a/namespaces.md b/namespaces.md index ff2ceb91..70f5e860 100644 --- a/namespaces.md +++ b/namespaces.md @@ -2,9 +2,9 @@ -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png)

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost @@ -13,9 +13,9 @@ certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) diff --git a/networking.md b/networking.md index 235f8f19..052ec128 100644 --- a/networking.md +++ b/networking.md @@ -2,9 +2,9 @@ -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png)

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost @@ -13,9 +13,9 @@ certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) diff --git a/persistent-storage.md b/persistent-storage.md index 5a18fa55..9639a521 100644 --- a/persistent-storage.md +++ b/persistent-storage.md @@ -2,9 +2,9 @@ -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png)

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost @@ -13,9 +13,9 @@ certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) diff --git a/principles.md b/principles.md index af831a07..83a1ae91 100644 --- a/principles.md +++ b/principles.md @@ -2,9 +2,9 @@ -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png)

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost @@ -13,9 +13,9 @@ certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) diff --git a/resources.md b/resources.md index 70420ec2..4172cdb4 100644 --- a/resources.md +++ b/resources.md @@ -2,9 +2,9 @@ -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png)

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost @@ -13,9 +13,9 @@ certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) diff --git a/secrets.md b/secrets.md index d0728044..33433dc0 100644 --- a/secrets.md +++ b/secrets.md @@ -2,9 +2,9 @@ -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png)

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost @@ -13,9 +13,9 @@ certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) diff --git a/security.md b/security.md index 4f9ed395..e2ab4fb7 100644 --- a/security.md +++ b/security.md @@ -2,9 +2,9 @@ -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png)

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost @@ -13,9 +13,9 @@ certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) diff --git a/security_context.md b/security_context.md index ca12db00..6b0601e6 100644 --- a/security_context.md +++ b/security_context.md @@ -2,9 +2,9 @@ -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png)

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost @@ -13,9 +13,9 @@ certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) diff --git a/service_accounts.md b/service_accounts.md index e877b880..ddb127f2 100644 --- a/service_accounts.md +++ b/service_accounts.md @@ -2,9 +2,9 @@ -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png)

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost @@ -13,9 +13,9 @@ certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) diff --git a/simple-rolling-update.md b/simple-rolling-update.md index b8473682..ed2e5349 100644 --- a/simple-rolling-update.md +++ b/simple-rolling-update.md @@ -2,9 +2,9 @@ -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png)

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost @@ -13,9 +13,9 @@ certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) diff --git a/versioning.md b/versioning.md index fff6cbd7..85e3f56f 100644 --- a/versioning.md +++ b/versioning.md @@ -2,9 +2,9 @@ -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png)

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost @@ -13,9 +13,9 @@ certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) -- cgit v1.2.3 From 70aa961049adb9d481b720e42a4e984f93eaf842 Mon Sep 17 00:00:00 2001 From: Tim Hockin Date: Tue, 14 Jul 2015 17:28:47 -0700 Subject: Run gendocs --- README.md | 12 ++++++------ api_changes.md | 12 ++++++------ cherry-picks.md | 12 ++++++------ coding-conventions.md | 12 ++++++------ collab.md | 12 ++++++------ developer-guides/vagrant.md | 12 ++++++------ development.md | 12 ++++++------ faster_reviews.md | 12 ++++++------ flaky-tests.md | 12 ++++++------ getting-builds.md | 12 ++++++------ instrumentation.md | 12 ++++++------ issues.md | 12 ++++++------ logging.md | 12 ++++++------ making-release-notes.md | 12 ++++++------ profiling.md | 12 ++++++------ pull-requests.md | 12 ++++++------ releasing.md | 12 ++++++------ scheduler.md | 12 ++++++------ scheduler_algorithm.md | 12 ++++++------ writing-a-getting-started-guide.md | 12 ++++++------ 20 files changed, 120 insertions(+), 120 deletions(-) diff --git a/README.md b/README.md index 505e7f34..f97c49b4 100644 --- a/README.md +++ b/README.md @@ -2,9 +2,9 @@ -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png)

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost @@ -13,9 +13,9 @@ certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) diff --git a/api_changes.md b/api_changes.md index d132adf3..2d571eb5 100644 --- a/api_changes.md +++ b/api_changes.md @@ -2,9 +2,9 @@ -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png)

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost @@ -13,9 +13,9 @@ certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) diff --git a/cherry-picks.md b/cherry-picks.md index 0453102f..b971f2fc 100644 --- a/cherry-picks.md +++ b/cherry-picks.md @@ -2,9 +2,9 @@ -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png)

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost @@ -13,9 +13,9 @@ certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) diff --git a/coding-conventions.md b/coding-conventions.md index 030b3448..76ba29e8 100644 --- a/coding-conventions.md +++ b/coding-conventions.md @@ -2,9 +2,9 @@ -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png)

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost @@ -13,9 +13,9 @@ certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) diff --git a/collab.md b/collab.md index e5fbf24d..caadc8de 100644 --- a/collab.md +++ b/collab.md @@ -2,9 +2,9 @@ -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png)

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost @@ -13,9 +13,9 @@ certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) diff --git a/developer-guides/vagrant.md b/developer-guides/vagrant.md index 5234e88a..0ef31c68 100644 --- a/developer-guides/vagrant.md +++ b/developer-guides/vagrant.md @@ -2,9 +2,9 @@ -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png)

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost @@ -13,9 +13,9 @@ certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) diff --git a/development.md b/development.md index 435aac3a..e2ec2068 100644 --- a/development.md +++ b/development.md @@ -2,9 +2,9 @@ -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png)

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost @@ -13,9 +13,9 @@ certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) diff --git a/faster_reviews.md b/faster_reviews.md index 8879075e..335d2a3e 100644 --- a/faster_reviews.md +++ b/faster_reviews.md @@ -2,9 +2,9 @@ -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png)

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost @@ -13,9 +13,9 @@ certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) diff --git a/flaky-tests.md b/flaky-tests.md index fe5af939..fb000ea6 100644 --- a/flaky-tests.md +++ b/flaky-tests.md @@ -2,9 +2,9 @@ -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png)

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost @@ -13,9 +13,9 @@ certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) diff --git a/getting-builds.md b/getting-builds.md index 53193e84..372d080d 100644 --- a/getting-builds.md +++ b/getting-builds.md @@ -2,9 +2,9 @@ -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png)

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost @@ -13,9 +13,9 @@ certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) diff --git a/instrumentation.md b/instrumentation.md index 39a9d922..95786c52 100644 --- a/instrumentation.md +++ b/instrumentation.md @@ -2,9 +2,9 @@ -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png)

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost @@ -13,9 +13,9 @@ certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) diff --git a/issues.md b/issues.md index e73dcb1d..689a18ff 100644 --- a/issues.md +++ b/issues.md @@ -2,9 +2,9 @@ -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png)

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost @@ -13,9 +13,9 @@ certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) diff --git a/logging.md b/logging.md index 68fd98f9..1a536d07 100644 --- a/logging.md +++ b/logging.md @@ -2,9 +2,9 @@ -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png)

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost @@ -13,9 +13,9 @@ certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) diff --git a/making-release-notes.md b/making-release-notes.md index 482c05a1..5703965a 100644 --- a/making-release-notes.md +++ b/making-release-notes.md @@ -2,9 +2,9 @@ -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png)

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost @@ -13,9 +13,9 @@ certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) diff --git a/profiling.md b/profiling.md index 7eadfbbe..863dc4c1 100644 --- a/profiling.md +++ b/profiling.md @@ -2,9 +2,9 @@ -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png)

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost @@ -13,9 +13,9 @@ certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) diff --git a/pull-requests.md b/pull-requests.md index cf325823..bdb7a172 100644 --- a/pull-requests.md +++ b/pull-requests.md @@ -2,9 +2,9 @@ -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png)

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost @@ -13,9 +13,9 @@ certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) diff --git a/releasing.md b/releasing.md index 3de00293..2f5035cc 100644 --- a/releasing.md +++ b/releasing.md @@ -2,9 +2,9 @@ -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png)

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost @@ -13,9 +13,9 @@ certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) diff --git a/scheduler.md b/scheduler.md index 3617a1dd..912d1128 100644 --- a/scheduler.md +++ b/scheduler.md @@ -2,9 +2,9 @@ -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png)

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost @@ -13,9 +13,9 @@ certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) diff --git a/scheduler_algorithm.md b/scheduler_algorithm.md index d5ab280a..fc402516 100644 --- a/scheduler_algorithm.md +++ b/scheduler_algorithm.md @@ -2,9 +2,9 @@ -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png)

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost @@ -13,9 +13,9 @@ certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) diff --git a/writing-a-getting-started-guide.md b/writing-a-getting-started-guide.md index bb017814..348faf9b 100644 --- a/writing-a-getting-started-guide.md +++ b/writing-a-getting-started-guide.md @@ -2,9 +2,9 @@ -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png)

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost @@ -13,9 +13,9 @@ certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) -- cgit v1.2.3 From 2d2958db9dc95022aece3ccc877cd63a4741c760 Mon Sep 17 00:00:00 2001 From: Tim Hockin Date: Tue, 14 Jul 2015 17:28:47 -0700 Subject: Run gendocs --- autoscaling.md | 12 ++++++------ federation.md | 12 ++++++------ high-availability.md | 12 ++++++------ 3 files changed, 18 insertions(+), 18 deletions(-) diff --git a/autoscaling.md b/autoscaling.md index 114fd331..15071645 100644 --- a/autoscaling.md +++ b/autoscaling.md @@ -2,9 +2,9 @@ -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png)

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost @@ -13,9 +13,9 @@ certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) diff --git a/federation.md b/federation.md index a21bb46c..a573050f 100644 --- a/federation.md +++ b/federation.md @@ -2,9 +2,9 @@ -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png)

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost @@ -13,9 +13,9 @@ certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) diff --git a/high-availability.md b/high-availability.md index dd061bb5..b61148f9 100644 --- a/high-availability.md +++ b/high-availability.md @@ -2,9 +2,9 @@ -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png)

PLEASE NOTE: This document applies to the HEAD of the source tree only. If you are using a released version of Kubernetes, you almost @@ -13,9 +13,9 @@ certainly want the docs that go with that version.

Documentation for specific releases can be found at [releases.k8s.io](http://releases.k8s.io). -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) -![WARNING](http://releases.k8s.io/HEAD/docs/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) -- cgit v1.2.3 From 8b825dd62649854c64185070384955f7f59b371c Mon Sep 17 00:00:00 2001 From: David Oppenheimer Date: Tue, 14 Jul 2015 22:07:44 -0700 Subject: Move some docs from docs/ top-level into docs/{admin/,devel/,user-guide/}. --- principles.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/principles.md b/principles.md index 83a1ae91..212f04bd 100644 --- a/principles.md +++ b/principles.md @@ -26,7 +26,7 @@ Principles to follow when extending Kubernetes. ## API -See also the [API conventions](../api-conventions.md). +See also the [API conventions](../devel/api-conventions.md). * All APIs should be declarative. * API objects should be complementary and composable, not opaque wrappers. -- cgit v1.2.3 From cb5465e2c6af85fd4f5b0577b8e4b16d930001d1 Mon Sep 17 00:00:00 2001 From: David Oppenheimer Date: Tue, 14 Jul 2015 22:07:44 -0700 Subject: Move some docs from docs/ top-level into docs/{admin/,devel/,user-guide/}. --- api-conventions.md | 637 ++++++++++++++++++++++++++++++++++++++++++++++++++++ cli-roadmap.md | 105 +++++++++ client-libraries.md | 43 ++++ developer-guide.md | 62 +++++ 4 files changed, 847 insertions(+) create mode 100644 api-conventions.md create mode 100644 cli-roadmap.md create mode 100644 client-libraries.md create mode 100644 developer-guide.md diff --git a/api-conventions.md b/api-conventions.md new file mode 100644 index 00000000..4a0cfccb --- /dev/null +++ b/api-conventions.md @@ -0,0 +1,637 @@ + + + + +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) + +

PLEASE NOTE: This document applies to the HEAD of the source +tree only. If you are using a released version of Kubernetes, you almost +certainly want the docs that go with that version.

+ +Documentation for specific releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) + + + + +API Conventions +=============== + +Updated: 4/16/2015 + +*This document is oriented at users who want a deeper understanding of the kubernetes +API structure, and developers wanting to extend the kubernetes API. An introduction to +using resources with kubectl can be found in (working_with_resources.md).* + +**Table of Contents** + + - [Types (Kinds)](#types-(kinds)) + - [Resources](#resources) + - [Objects](#objects) + - [Metadata](#metadata) + - [Spec and Status](#spec-and-status) + - [Typical status properties](#typical-status-properties) + - [References to related objects](#references-to-related-objects) + - [Lists of named subobjects preferred over maps](#lists-of-named-subobjects-preferred-over-maps) + - [Constants](#constants) + - [Lists and Simple kinds](#lists-and-simple-kinds) + - [Differing Representations](#differing-representations) + - [Verbs on Resources](#verbs-on-resources) + - [PATCH operations](#patch-operations) + - [Strategic Merge Patch](#strategic-merge-patch) + - [List Operations](#list-operations) + - [Map Operations](#map-operations) + - [Idempotency](#idempotency) + - [Defaulting](#defaulting) + - [Late Initialization](#late-initialization) + - [Concurrency Control and Consistency](#concurrency-control-and-consistency) + - [Serialization Format](#serialization-format) + - [Units](#units) + - [Selecting Fields](#selecting-fields) + - [HTTP Status codes](#http-status-codes) + - [Success codes](#success-codes) + - [Error codes](#error-codes) + - [Response Status Kind](#response-status-kind) + + + +The conventions of the [Kubernetes API](../api.md) (and related APIs in the ecosystem) are intended to ease client development and ensure that configuration mechanisms can be implemented that work across a diverse set of use cases consistently. + +The general style of the Kubernetes API is RESTful - clients create, update, delete, or retrieve a description of an object via the standard HTTP verbs (POST, PUT, DELETE, and GET) - and those APIs preferentially accept and return JSON. Kubernetes also exposes additional endpoints for non-standard verbs and allows alternative content types. All of the JSON accepted and returned by the server has a schema, identified by the "kind" and "apiVersion" fields. Where relevant HTTP header fields exist, they should mirror the content of JSON fields, but the information should not be represented only in the HTTP header. + +The following terms are defined: + +* **Kind** the name of a particular object schema (e.g. the "Cat" and "Dog" kinds would have different attributes and properties) +* **Resource** a representation of a system entity, sent or retrieved as JSON via HTTP to the server. Resources are exposed via: + * Collections - a list of resources of the same type, which may be queryable + * Elements - an individual resource, addressable via a URL + +Each resource typically accepts and returns data of a single kind. A kind may be accepted or returned by multiple resources that reflect specific use cases. For instance, the kind "pod" is exposed as a "pods" resource that allows end users to create, update, and delete pods, while a separate "pod status" resource (that acts on "pod" kind) allows automated processes to update a subset of the fields in that resource. A "restart" resource might be exposed for a number of different resources to allow the same action to have different results for each object. + +Resource collections should be all lowercase and plural, whereas kinds are CamelCase and singular. + + +## Types (Kinds) + +Kinds are grouped into three categories: + +1. **Objects** represent a persistent entity in the system. + + Creating an API object is a record of intent - once created, the system will work to ensure that resource exists. All API objects have common metadata. + + An object may have multiple resources that clients can use to perform specific actions that create, update, delete, or get. + + Examples: `Pods`, `ReplicationControllers`, `Services`, `Namespaces`, `Nodes` + +2. **Lists** are collections of **resources** of one (usually) or more (occasionally) kinds. + + Lists have a limited set of common metadata. All lists use the "items" field to contain the array of objects they return. + + Most objects defined in the system should have an endpoint that returns the full set of resources, as well as zero or more endpoints that return subsets of the full list. Some objects may be singletons (the current user, the system defaults) and may not have lists. + + In addition, all lists that return objects with labels should support label filtering (see [docs/user-guide/labels.md](../user-guide/labels.md), and most lists should support filtering by fields. + + Examples: PodLists, ServiceLists, NodeLists + + TODO: Describe field filtering below or in a separate doc. + +3. **Simple** kinds are used for specific actions on objects and for non-persistent entities. + + Given their limited scope, they have the same set of limited common metadata as lists. + + The "size" action may accept a simple resource that has only a single field as input (the number of things). The "status" kind is returned when errors occur and is not persisted in the system. + + Examples: Binding, Status + +The standard REST verbs (defined below) MUST return singular JSON objects. Some API endpoints may deviate from the strict REST pattern and return resources that are not singular JSON objects, such as streams of JSON objects or unstructured text log data. + +The term "kind" is reserved for these "top-level" API types. The term "type" should be used for distinguishing sub-categories within objects or subobjects. + +### Resources + +All JSON objects returned by an API MUST have the following fields: + +* kind: a string that identifies the schema this object should have +* apiVersion: a string that identifies the version of the schema the object should have + +These fields are required for proper decoding of the object. They may be populated by the server by default from the specified URL path, but the client likely needs to know the values in order to construct the URL path. + +### Objects + +#### Metadata + +Every object kind MUST have the following metadata in a nested object field called "metadata": + +* namespace: a namespace is a DNS compatible subdomain that objects are subdivided into. The default namespace is 'default'. See [docs/admin/namespaces.md](../admin/namespaces.md) for more. +* name: a string that uniquely identifies this object within the current namespace (see [docs/user-guide/identifiers.md](../user-guide/identifiers.md)). This value is used in the path when retrieving an individual object. +* uid: a unique in time and space value (typically an RFC 4122 generated identifier, see [docs/user-guide/identifiers.md](../user-guide/identifiers.md)) used to distinguish between objects with the same name that have been deleted and recreated + +Every object SHOULD have the following metadata in a nested object field called "metadata": + +* resourceVersion: a string that identifies the internal version of this object that can be used by clients to determine when objects have changed. This value MUST be treated as opaque by clients and passed unmodified back to the server. Clients should not assume that the resource version has meaning across namespaces, different kinds of resources, or different servers. (see [concurrency control](#concurrency-control-and-consistency), below, for more details) +* creationTimestamp: a string representing an RFC 3339 date of the date and time an object was created +* deletionTimestamp: a string representing an RFC 3339 date of the date and time after which this resource will be deleted. This field is set by the server when a graceful deletion is requested by the user, and is not directly settable by a client. The resource will be deleted (no longer visible from resource lists, and not reachable by name) after the time in this field. Once set, this value may not be unset or be set further into the future, although it may be shortened or the resource may be deleted prior to this time. +* labels: a map of string keys and values that can be used to organize and categorize objects (see [docs/user-guide/labels.md](../user-guide/labels.md)) +* annotations: a map of string keys and values that can be used by external tooling to store and retrieve arbitrary metadata about this object (see [docs/user-guide/annotations.md](../user-guide/annotations.md)) + +Labels are intended for organizational purposes by end users (select the pods that match this label query). Annotations enable third-party automation and tooling to decorate objects with additional metadata for their own use. + +#### Spec and Status + +By convention, the Kubernetes API makes a distinction between the specification of the desired state of an object (a nested object field called "spec") and the status of the object at the current time (a nested object field called "status"). The specification is a complete description of the desired state, including configuration settings provided by the user, [default values](#defaulting) expanded by the system, and properties initialized or otherwise changed after creation by other ecosystem components (e.g., schedulers, auto-scalers), and is persisted in stable storage with the API object. If the specification is deleted, the object will be purged from the system. The status summarizes the current state of the object in the system, and is usually persisted with the object by an automated processes but may be generated on the fly. At some cost and perhaps some temporary degradation in behavior, the status could be reconstructed by observation if it were lost. + +When a new version of an object is POSTed or PUT, the "spec" is updated and available immediately. Over time the system will work to bring the "status" into line with the "spec". The system will drive toward the most recent "spec" regardless of previous versions of that stanza. In other words, if a value is changed from 2 to 5 in one PUT and then back down to 3 in another PUT the system is not required to 'touch base' at 5 before changing the "status" to 3. In other words, the system's behavior is *level-based* rather than *edge-based*. This enables robust behavior in the presence of missed intermediate state changes. + +The Kubernetes API also serves as the foundation for the declarative configuration schema for the system. In order to facilitate level-based operation and expression of declarative configuration, fields in the specification should have declarative rather than imperative names and semantics -- they represent the desired state, not actions intended to yield the desired state. + +The PUT and POST verbs on objects will ignore the "status" values. A `/status` subresource is provided to enable system components to update statuses of resources they manage. + +Otherwise, PUT expects the whole object to be specified. Therefore, if a field is omitted it is assumed that the client wants to clear that field's value. The PUT verb does not accept partial updates. Modification of just part of an object may be achieved by GETting the resource, modifying part of the spec, labels, or annotations, and then PUTting it back. See [concurrency control](#concurrency-control-and-consistency), below, regarding read-modify-write consistency when using this pattern. Some objects may expose alternative resource representations that allow mutation of the status, or performing custom actions on the object. + +All objects that represent a physical resource whose state may vary from the user's desired intent SHOULD have a "spec" and a "status". Objects whose state cannot vary from the user's desired intent MAY have only "spec", and MAY rename "spec" to a more appropriate name. + +Objects that contain both spec and status should not contain additional top-level fields other than the standard metadata fields. + +##### Typical status properties + +* **phase**: The phase is a simple, high-level summary of the phase of the lifecycle of an object. The phase should progress monotonically. Typical phase values are `Pending` (not yet fully physically realized), `Running` or `Active` (fully realized and active, but not necessarily operating correctly), and `Terminated` (no longer active), but may vary slightly for different types of objects. New phase values should not be added to existing objects in the future. Like other status fields, it must be possible to ascertain the lifecycle phase by observation. Additional details regarding the current phase may be contained in other fields. +* **conditions**: Conditions represent orthogonal observations of an object's current state. Objects may report multiple conditions, and new types of conditions may be added in the future. Condition status values may be `True`, `False`, or `Unknown`. Unlike the phase, conditions are not expected to be monotonic -- their values may change back and forth. A typical condition type is `Ready`, which indicates the object was believed to be fully operational at the time it was last probed. Conditions may carry additional information, such as the last probe time or last transition time. + +TODO(@vishh): Reason and Message. + +Phases and conditions are observations and not, themselves, state machines, nor do we define comprehensive state machines for objects with behaviors associated with state transitions. The system is level-based and should assume an Open World. Additionally, new observations and details about these observations may be added over time. + +In order to preserve extensibility, in the future, we intend to explicitly convey properties that users and components care about rather than requiring those properties to be inferred from observations. + +Note that historical information status (e.g., last transition time, failure counts) is only provided at best effort, and is not guaranteed to not be lost. + +Status information that may be large (especially unbounded in size, such as lists of references to other objects -- see below) and/or rapidly changing, such as [resource usage](../design/resources.md#usage-data), should be put into separate objects, with possibly a reference from the original object. This helps to ensure that GETs and watch remain reasonably efficient for the majority of clients, which may not need that data. + +#### References to related objects + +References to loosely coupled sets of objects, such as [pods](../user-guide/pods.md) overseen by a [replication controller](../user-guide/replication-controller.md), are usually best referred to using a [label selector](../user-guide/labels.md). In order to ensure that GETs of individual objects remain bounded in time and space, these sets may be queried via separate API queries, but will not be expanded in the referring object's status. + +References to specific objects, especially specific resource versions and/or specific fields of those objects, are specified using the `ObjectReference` type. Unlike partial URLs, the ObjectReference type facilitates flexible defaulting of fields from the referring object or other contextual information. + +References in the status of the referee to the referrer may be permitted, when the references are one-to-one and do not need to be frequently updated, particularly in an edge-based manner. + +#### Lists of named subobjects preferred over maps + +Discussed in [#2004](https://github.com/GoogleCloudPlatform/kubernetes/issues/2004) and elsewhere. There are no maps of subobjects in any API objects. Instead, the convention is to use a list of subobjects containing name fields. + +For example: +```yaml +ports: + - name: www + containerPort: 80 +``` +vs. +```yaml +ports: + www: + containerPort: 80 +``` + +This rule maintains the invariant that all JSON/YAML keys are fields in API objects. The only exceptions are pure maps in the API (currently, labels, selectors, and annotations), as opposed to sets of subobjects. + +#### Constants + +Some fields will have a list of allowed values (enumerations). These values will be strings, and they will be in CamelCase, with an initial uppercase letter. Examples: "ClusterFirst", "Pending", "ClientIP". + +### Lists and Simple kinds + +Every list or simple kind SHOULD have the following metadata in a nested object field called "metadata": + +* resourceVersion: a string that identifies the common version of the objects returned by in a list. This value MUST be treated as opaque by clients and passed unmodified back to the server. A resource version is only valid within a single namespace on a single kind of resource. + +Every simple kind returned by the server, and any simple kind sent to the server that must support idempotency or optimistic concurrency should return this value.Since simple resources are often used as input alternate actions that modify objects, the resource version of the simple resource should correspond to the resource version of the object. + + +## Differing Representations + +An API may represent a single entity in different ways for different clients, or transform an object after certain transitions in the system occur. In these cases, one request object may have two representations available as different resources, or different kinds. + +An example is a Service, which represents the intent of the user to group a set of pods with common behavior on common ports. When Kubernetes detects a pod matches the service selector, the IP address and port of the pod are added to an Endpoints resource for that Service. The Endpoints resource exists only if the Service exists, but exposes only the IPs and ports of the selected pods. The full service is represented by two distinct resources - under the original Service resource the user created, as well as in the Endpoints resource. + +As another example, a "pod status" resource may accept a PUT with the "pod" kind, with different rules about what fields may be changed. + +Future versions of Kubernetes may allow alternative encodings of objects beyond JSON. + + +## Verbs on Resources + +API resources should use the traditional REST pattern: + +* GET /<resourceNamePlural> - Retrieve a list of type <resourceName>, e.g. GET /pods returns a list of Pods. +* POST /<resourceNamePlural> - Create a new resource from the JSON object provided by the client. +* GET /<resourceNamePlural>/<name> - Retrieves a single resource with the given name, e.g. GET /pods/first returns a Pod named 'first'. Should be constant time, and the resource should be bounded in size. +* DELETE /<resourceNamePlural>/<name> - Delete the single resource with the given name. DeleteOptions may specify gracePeriodSeconds, the optional duration in seconds before the object should be deleted. Individual kinds may declare fields which provide a default grace period, and different kinds may have differing kind-wide default grace periods. A user provided grace period overrides a default grace period, including the zero grace period ("now"). +* PUT /<resourceNamePlural>/<name> - Update or create the resource with the given name with the JSON object provided by the client. +* PATCH /<resourceNamePlural>/<name> - Selectively modify the specified fields of the resource. See more information [below](#patch). + +Kubernetes by convention exposes additional verbs as new root endpoints with singular names. Examples: + +* GET /watch/<resourceNamePlural> - Receive a stream of JSON objects corresponding to changes made to any resource of the given kind over time. +* GET /watch/<resourceNamePlural>/<name> - Receive a stream of JSON objects corresponding to changes made to the named resource of the given kind over time. + +These are verbs which change the fundamental type of data returned (watch returns a stream of JSON instead of a single JSON object). Support of additional verbs is not required for all object types. + +Two additional verbs `redirect` and `proxy` provide access to cluster resources as described in [docs/user-guide/accessing-the-cluster.md](../user-guide/accessing-the-cluster.md). + +When resources wish to expose alternative actions that are closely coupled to a single resource, they should do so using new sub-resources. An example is allowing automated processes to update the "status" field of a Pod. The `/pods` endpoint only allows updates to "metadata" and "spec", since those reflect end-user intent. An automated process should be able to modify status for users to see by sending an updated Pod kind to the server to the "/pods/<name>/status" endpoint - the alternate endpoint allows different rules to be applied to the update, and access to be appropriately restricted. Likewise, some actions like "stop" or "scale" are best represented as REST sub-resources that are POSTed to. The POST action may require a simple kind to be provided if the action requires parameters, or function without a request body. + +TODO: more documentation of Watch + +### PATCH operations + +The API supports three different PATCH operations, determined by their corresponding Content-Type header: + +* JSON Patch, `Content-Type: application/json-patch+json` + * As defined in [RFC6902](https://tools.ietf.org/html/rfc6902), a JSON Patch is a sequence of operations that are executed on the resource, e.g. `{"op": "add", "path": "/a/b/c", "value": [ "foo", "bar" ]}`. For more details on how to use JSON Patch, see the RFC. +* Merge Patch, `Content-Type: application/merge-json-patch+json` + * As defined in [RFC7386](https://tools.ietf.org/html/rfc7386), a Merge Patch is essentially a partial representation of the resource. The submitted JSON is "merged" with the current resource to create a new one, then the new one is saved. For more details on how to use Merge Patch, see the RFC. +* Strategic Merge Patch, `Content-Type: application/strategic-merge-patch+json` + * Strategic Merge Patch is a custom implementation of Merge Patch. For a detailed explanation of how it works and why it needed to be introduced, see below. + +#### Strategic Merge Patch + +In the standard JSON merge patch, JSON objects are always merged but lists are always replaced. Often that isn't what we want. Let's say we start with the following Pod: + +```yaml +spec: + containers: + - name: nginx + image: nginx-1.0 +``` + +...and we POST that to the server (as JSON). Then let's say we want to *add* a container to this Pod. + +```yaml +PATCH /api/v1/namespaces/default/pods/pod-name +spec: + containers: + - name: log-tailer + image: log-tailer-1.0 +``` + +If we were to use standard Merge Patch, the entire container list would be replaced with the single log-tailer container. However, our intent is for the container lists to merge together based on the `name` field. + +To solve this problem, Strategic Merge Patch uses metadata attached to the API objects to determine what lists should be merged and which ones should not. Currently the metadata is available as struct tags on the API objects themselves, but will become available to clients as Swagger annotations in the future. In the above example, the `patchStrategy` metadata for the `containers` field would be `merge` and the `patchMergeKey` would be `name`. + +Note: If the patch results in merging two lists of scalars, the scalars are first deduplicated and then merged. + +Strategic Merge Patch also supports special operations as listed below. + +### List Operations + +To override the container list to be strictly replaced, regardless of the default: + +```yaml +containers: + - name: nginx + image: nginx-1.0 + - $patch: replace # any further $patch operations nested in this list will be ignored +``` + +To delete an element of a list that should be merged: + +```yaml +containers: + - name: nginx + image: nginx-1.0 + - $patch: delete + name: log-tailer # merge key and value goes here +``` + +### Map Operations + +To indicate that a map should not be merged and instead should be taken literally: + +```yaml +$patch: replace # recursive and applies to all fields of the map it's in +containers: +- name: nginx + image: nginx-1.0 +``` + +To delete a field of a map: + +```yaml +name: nginx +image: nginx-1.0 +labels: + live: null # set the value of the map key to null +``` + + +## Idempotency + +All compatible Kubernetes APIs MUST support "name idempotency" and respond with an HTTP status code 409 when a request is made to POST an object that has the same name as an existing object in the system. See [docs/user-guide/identifiers.md](../user-guide/identifiers.md) for details. + +Names generated by the system may be requested using `metadata.generateName`. GenerateName indicates that the name should be made unique by the server prior to persisting it. A non-empty value for the field indicates the name will be made unique (and the name returned to the client will be different than the name passed). The value of this field will be combined with a unique suffix on the server if the Name field has not been provided. The provided value must be valid within the rules for Name, and may be truncated by the length of the suffix required to make the value unique on the server. If this field is specified, and Name is not present, the server will NOT return a 409 if the generated name exists - instead, it will either return 201 Created or 504 with Reason `ServerTimeout` indicating a unique name could not be found in the time allotted, and the client should retry (optionally after the time indicated in the Retry-After header). + +## Defaulting + +Default resource values are API version-specific, and they are applied during +the conversion from API-versioned declarative configuration to internal objects +representing the desired state (`Spec`) of the resource. Subsequent GETs of the +resource will include the default values explicitly. + +Incorporating the default values into the `Spec` ensures that `Spec` depicts the +full desired state so that it is easier for the system to determine how to +achieve the state, and for the user to know what to anticipate. + +API version-specific default values are set by the API server. + +## Late Initialization + +Late initialization is when resource fields are set by a system controller +after an object is created/updated. + +For example, the scheduler sets the `pod.spec.nodeName` field after the pod is created. + +Late-initializers should only make the following types of modifications: + - Setting previously unset fields + - Adding keys to maps + - Adding values to arrays which have mergeable semantics (`patchStrategy:"merge"` attribute in + the type definition). + +These conventions: + 1. allow a user (with sufficient privilege) to override any system-default behaviors by setting + the fields that would otherwise have been defaulted. + 1. enables updates from users to be merged with changes made during late initialization, using + strategic merge patch, as opposed to clobbering the change. + 1. allow the component which does the late-initialization to use strategic merge patch, which + facilitates composition and concurrency of such components. + +Although the apiserver Admission Control stage acts prior to object creation, +Admission Control plugins should follow the Late Initialization conventions +too, to allow their implementation to be later moved to a 'controller', or to client libraries. + +## Concurrency Control and Consistency + +Kubernetes leverages the concept of *resource versions* to achieve optimistic concurrency. All Kubernetes resources have a "resourceVersion" field as part of their metadata. This resourceVersion is a string that identifies the internal version of an object that can be used by clients to determine when objects have changed. When a record is about to be updated, it's version is checked against a pre-saved value, and if it doesn't match, the update fails with a StatusConflict (HTTP status code 409). + +The resourceVersion is changed by the server every time an object is modified. If resourceVersion is included with the PUT operation the system will verify that there have not been other successful mutations to the resource during a read/modify/write cycle, by verifying that the current value of resourceVersion matches the specified value. + +The resourceVersion is currently backed by [etcd's modifiedIndex](https://coreos.com/docs/distributed-configuration/etcd-api/). However, it's important to note that the application should *not* rely on the implementation details of the versioning system maintained by Kubernetes. We may change the implementation of resourceVersion in the future, such as to change it to a timestamp or per-object counter. + +The only way for a client to know the expected value of resourceVersion is to have received it from the server in response to a prior operation, typically a GET. This value MUST be treated as opaque by clients and passed unmodified back to the server. Clients should not assume that the resource version has meaning across namespaces, different kinds of resources, or different servers. Currently, the value of resourceVersion is set to match etcd's sequencer. You could think of it as a logical clock the API server can use to order requests. However, we expect the implementation of resourceVersion to change in the future, such as in the case we shard the state by kind and/or namespace, or port to another storage system. + +In the case of a conflict, the correct client action at this point is to GET the resource again, apply the changes afresh, and try submitting again. This mechanism can be used to prevent races like the following: + +``` +Client #1 Client #2 +GET Foo GET Foo +Set Foo.Bar = "one" Set Foo.Baz = "two" +PUT Foo PUT Foo +``` + +When these sequences occur in parallel, either the change to Foo.Bar or the change to Foo.Baz can be lost. + +On the other hand, when specifying the resourceVersion, one of the PUTs will fail, since whichever write succeeds changes the resourceVersion for Foo. + +resourceVersion may be used as a precondition for other operations (e.g., GET, DELETE) in the future, such as for read-after-write consistency in the presence of caching. + +"Watch" operations specify resourceVersion using a query parameter. It is used to specify the point at which to begin watching the specified resources. This may be used to ensure that no mutations are missed between a GET of a resource (or list of resources) and a subsequent Watch, even if the current version of the resource is more recent. This is currently the main reason that list operations (GET on a collection) return resourceVersion. + + +## Serialization Format + +APIs may return alternative representations of any resource in response to an Accept header or under alternative endpoints, but the default serialization for input and output of API responses MUST be JSON. + +All dates should be serialized as RFC3339 strings. + + +## Units + +Units must either be explicit in the field name (e.g., `timeoutSeconds`), or must be specified as part of the value (e.g., `resource.Quantity`). Which approach is preferred is TBD. + + +## Selecting Fields + +Some APIs may need to identify which field in a JSON object is invalid, or to reference a value to extract from a separate resource. The current recommendation is to use standard JavaScript syntax for accessing that field, assuming the JSON object was transformed into a JavaScript object. + +Examples: + +* Find the field "current" in the object "state" in the second item in the array "fields": `fields[0].state.current` + +TODO: Plugins, extensions, nested kinds, headers + + +## HTTP Status codes + +The server will respond with HTTP status codes that match the HTTP spec. See the section below for a breakdown of the types of status codes the server will send. + +The following HTTP status codes may be returned by the API. + +#### Success codes + +* `200 StatusOK` + * Indicates that the request completed successfully. +* `201 StatusCreated` + * Indicates that the request to create kind completed successfully. +* `204 StatusNoContent` + * Indicates that the request completed successfully, and the response contains no body. + * Returned in response to HTTP OPTIONS requests. + +#### Error codes +* `307 StatusTemporaryRedirect` + * Indicates that the address for the requested resource has changed. + * Suggested client recovery behavior + * Follow the redirect. +* `400 StatusBadRequest` + * Indicates the requested is invalid. + * Suggested client recovery behavior: + * Do not retry. Fix the request. +* `401 StatusUnauthorized` + * Indicates that the server can be reached and understood the request, but refuses to take any further action, because the client must provide authorization. If the client has provided authorization, the server is indicating the provided authorization is unsuitable or invalid. + * Suggested client recovery behavior + * If the user has not supplied authorization information, prompt them for the appropriate credentials + * If the user has supplied authorization information, inform them their credentials were rejected and optionally prompt them again. +* `403 StatusForbidden` + * Indicates that the server can be reached and understood the request, but refuses to take any further action, because it is configured to deny access for some reason to the requested resource by the client. + * Suggested client recovery behavior + * Do not retry. Fix the request. +* `404 StatusNotFound` + * Indicates that the requested resource does not exist. + * Suggested client recovery behavior + * Do not retry. Fix the request. +* `405 StatusMethodNotAllowed` + * Indicates that the action the client attempted to perform on the resource was not supported by the code. + * Suggested client recovery behavior + * Do not retry. Fix the request. +* `409 StatusConflict` + * Indicates that either the resource the client attempted to create already exists or the requested update operation cannot be completed due to a conflict. + * Suggested client recovery behavior + * * If creating a new resource + * * Either change the identifier and try again, or GET and compare the fields in the pre-existing object and issue a PUT/update to modify the existing object. + * * If updating an existing resource: + * See `Conflict` from the `status` response section below on how to retrieve more information about the nature of the conflict. + * GET and compare the fields in the pre-existing object, merge changes (if still valid according to preconditions), and retry with the updated request (including `ResourceVersion`). +* `422 StatusUnprocessableEntity` + * Indicates that the requested create or update operation cannot be completed due to invalid data provided as part of the request. + * Suggested client recovery behavior + * Do not retry. Fix the request. +* `429 StatusTooManyRequests` + * Indicates that the either the client rate limit has been exceeded or the server has received more requests then it can process. + * Suggested client recovery behavior: + * Read the ```Retry-After``` HTTP header from the response, and wait at least that long before retrying. +* `500 StatusInternalServerError` + * Indicates that the server can be reached and understood the request, but either an unexpected internal error occurred and the outcome of the call is unknown, or the server cannot complete the action in a reasonable time (this maybe due to temporary server load or a transient communication issue with another server). + * Suggested client recovery behavior: + * Retry with exponential backoff. +* `503 StatusServiceUnavailable` + * Indicates that required service is unavailable. + * Suggested client recovery behavior: + * Retry with exponential backoff. +* `504 StatusServerTimeout` + * Indicates that the request could not be completed within the given time. Clients can get this response ONLY when they specified a timeout param in the request. + * Suggested client recovery behavior: + * Increase the value of the timeout param and retry with exponential backoff + +## Response Status Kind + +Kubernetes will always return the ```Status``` kind from any API endpoint when an error occurs. +Clients SHOULD handle these types of objects when appropriate. + +A ```Status``` kind will be returned by the API in two cases: + * When an operation is not successful (i.e. when the server would return a non 2xx HTTP status code). + * When a HTTP ```DELETE``` call is successful. + +The status object is encoded as JSON and provided as the body of the response. The status object contains fields for humans and machine consumers of the API to get more detailed information for the cause of the failure. The information in the status object supplements, but does not override, the HTTP status code's meaning. When fields in the status object have the same meaning as generally defined HTTP headers and that header is returned with the response, the header should be considered as having higher priority. + +**Example:** +``` +$ curl -v -k -H "Authorization: Bearer WhCDvq4VPpYhrcfmF6ei7V9qlbqTubUc" https://10.240.122.184:443/api/v1/namespaces/default/pods/grafana + +> GET /api/v1/namespaces/default/pods/grafana HTTP/1.1 +> User-Agent: curl/7.26.0 +> Host: 10.240.122.184 +> Accept: */* +> Authorization: Bearer WhCDvq4VPpYhrcfmF6ei7V9qlbqTubUc +> + +< HTTP/1.1 404 Not Found +< Content-Type: application/json +< Date: Wed, 20 May 2015 18:10:42 GMT +< Content-Length: 232 +< +{ + "kind": "Status", + "apiVersion": "v1", + "metadata": {}, + "status": "Failure", + "message": "pods \"grafana\" not found", + "reason": "NotFound", + "details": { + "name": "grafana", + "kind": "pods" + }, + "code": 404 +} +``` + +```status``` field contains one of two possible values: +* `Success` +* `Failure` + +`message` may contain human-readable description of the error + +```reason``` may contain a machine-readable description of why this operation is in the `Failure` status. If this value is empty there is no information available. The `reason` clarifies an HTTP status code but does not override it. + +```details``` may contain extended data associated with the reason. Each reason may define its own extended details. This field is optional and the data returned is not guaranteed to conform to any schema except that defined by the reason type. + +Possible values for the ```reason``` and ```details``` fields: +* `BadRequest` + * Indicates that the request itself was invalid, because the request doesn't make any sense, for example deleting a read-only object. + * This is different than `status reason` `Invalid` above which indicates that the API call could possibly succeed, but the data was invalid. + * API calls that return BadRequest can never succeed. + * Http status code: `400 StatusBadRequest` +* `Unauthorized` + * Indicates that the server can be reached and understood the request, but refuses to take any further action without the client providing appropriate authorization. If the client has provided authorization, this error indicates the provided credentials are insufficient or invalid. + * Details (optional): + * `kind string` + * The kind attribute of the unauthorized resource (on some operations may differ from the requested resource). + * `name string` + * The identifier of the unauthorized resource. + * HTTP status code: `401 StatusUnauthorized` +* `Forbidden` + * Indicates that the server can be reached and understood the request, but refuses to take any further action, because it is configured to deny access for some reason to the requested resource by the client. + * Details (optional): + * `kind string` + * The kind attribute of the forbidden resource (on some operations may differ from the requested resource). + * `name string` + * The identifier of the forbidden resource. + * HTTP status code: `403 StatusForbidden` +* `NotFound` + * Indicates that one or more resources required for this operation could not be found. + * Details (optional): + * `kind string` + * The kind attribute of the missing resource (on some operations may differ from the requested resource). + * `name string` + * The identifier of the missing resource. + * HTTP status code: `404 StatusNotFound` +* `AlreadyExists` + * Indicates that the resource you are creating already exists. + * Details (optional): + * `kind string` + * The kind attribute of the conflicting resource. + * `name string` + * The identifier of the conflicting resource. + * HTTP status code: `409 StatusConflict` +* `Conflict` + * Indicates that the requested update operation cannot be completed due to a conflict. The client may need to alter the request. Each resource may define custom details that indicate the nature of the conflict. + * HTTP status code: `409 StatusConflict` +* `Invalid` + * Indicates that the requested create or update operation cannot be completed due to invalid data provided as part of the request. + * Details (optional): + * `kind string` + * the kind attribute of the invalid resource + * `name string` + * the identifier of the invalid resource + * `causes` + * One or more `StatusCause` entries indicating the data in the provided resource that was invalid. The `reason`, `message`, and `field` attributes will be set. + * HTTP status code: `422 StatusUnprocessableEntity` +* `Timeout` + * Indicates that the request could not be completed within the given time. Clients may receive this response if the server has decided to rate limit the client, or if the server is overloaded and cannot process the request at this time. + * Http status code: `429 TooManyRequests` + * The server should set the `Retry-After` HTTP header and return `retryAfterSeconds` in the details field of the object. A value of `0` is the default. +* `ServerTimeout` + * Indicates that the server can be reached and understood the request, but cannot complete the action in a reasonable time. This maybe due to temporary server load or a transient communication issue with another server. + * Details (optional): + * `kind string` + * The kind attribute of the resource being acted on. + * `name string` + * The operation that is being attempted. + * The server should set the `Retry-After` HTTP header and return `retryAfterSeconds` in the details field of the object. A value of `0` is the default. + * Http status code: `504 StatusServerTimeout` +* `MethodNotAllowed` + * Indicates that the action the client attempted to perform on the resource was not supported by the code. + * For instance, attempting to delete a resource that can only be created. + * API calls that return MethodNotAllowed can never succeed. + * Http status code: `405 StatusMethodNotAllowed` +* `InternalError` + * Indicates that an internal error occurred, it is unexpected and the outcome of the call is unknown. + * Details (optional): + * `causes` + * The original error. + * Http status code: `500 StatusInternalServerError` + +`code` may contain the suggested HTTP return code for this status. + + +## Events + +TODO: Document events (refer to another doc for details) + + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/api-conventions.md?pixel)]() + diff --git a/cli-roadmap.md b/cli-roadmap.md new file mode 100644 index 00000000..fe8d5b0f --- /dev/null +++ b/cli-roadmap.md @@ -0,0 +1,105 @@ + + + + +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) + +

PLEASE NOTE: This document applies to the HEAD of the source +tree only. If you are using a released version of Kubernetes, you almost +certainly want the docs that go with that version.

+ +Documentation for specific releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) + + + + +# Kubernetes CLI/Configuration Roadmap + +See also issues with the following labels: +* [area/config-deployment](https://github.com/GoogleCloudPlatform/kubernetes/labels/area%2Fconfig-deployment) +* [component/CLI](https://github.com/GoogleCloudPlatform/kubernetes/labels/component%2FCLI) +* [component/client](https://github.com/GoogleCloudPlatform/kubernetes/labels/component%2Fclient) + +1. Create services before other objects, or at least before objects that depend upon them. Namespace-relative DNS mitigates this some, but most users are still using service environment variables. [#1768](https://github.com/GoogleCloudPlatform/kubernetes/issues/1768) +1. Finish rolling update [#1353](https://github.com/GoogleCloudPlatform/kubernetes/issues/1353) + 1. Friendly to auto-scaling [#2863](https://github.com/GoogleCloudPlatform/kubernetes/pull/2863#issuecomment-69701562) + 1. Rollback (make rolling-update reversible, and complete an in-progress rolling update by taking 2 replication controller names rather than always taking a file) + 1. Rollover (replace multiple replication controllers with one, such as to clean up an aborted partial rollout) + 1. Write a ReplicationController generator to derive the new ReplicationController from an old one (e.g., `--image-version=newversion`, which would apply a name suffix, update a label value, and apply an image tag) + 1. Use readiness [#620](https://github.com/GoogleCloudPlatform/kubernetes/issues/620) + 1. Perhaps factor this in a way that it can be shared with [Openshift’s deployment controller](https://github.com/GoogleCloudPlatform/kubernetes/issues/1743) + 1. Rolling update service as a plugin +1. Kind-based filtering on object streams -- only operate on the kinds of objects specified. This would make directory-based kubectl operations much more useful. Users should be able to instantiate the example applications using `kubectl create -f ...` +1. Improved pretty printing of endpoints, such as in the case that there are more than a few endpoints +1. Service address/port lookup command(s) +1. List supported resources +1. Swagger lookups [#3060](https://github.com/GoogleCloudPlatform/kubernetes/issues/3060) +1. --name, --name-suffix applied during creation and updates +1. --labels and opinionated label injection: --app=foo, --tier={fe,cache,be,db}, --uservice=redis, --env={dev,test,prod}, --stage={canary,final}, --track={hourly,daily,weekly}, --release=0.4.3c2. Exact ones TBD. We could allow arbitrary values -- the keys are important. The actual label keys would be (optionally?) namespaced with kubectl.kubernetes.io/, or perhaps the user’s namespace. +1. --annotations and opinionated annotation injection: --description, --revision +1. Imperative updates. We'll want to optionally make these safe(r) by supporting preconditions based on the current value and resourceVersion. + 1. annotation updates similar to label updates + 1. other custom commands for common imperative updates + 1. more user-friendly (but still generic) on-command-line json for patch +1. We also want to support the following flavors of more general updates: + 1. whichever we don’t support: + 1. safe update: update the full resource, guarded by resourceVersion precondition (and perhaps selected value-based preconditions) + 1. forced update: update the full resource, blowing away the previous Spec without preconditions; delete and re-create if necessary + 1. diff/dryrun: Compare new config with current Spec [#6284](https://github.com/GoogleCloudPlatform/kubernetes/issues/6284) + 1. submit/apply/reconcile/ensure/merge: Merge user-provided fields with current Spec. Keep track of user-provided fields using an annotation -- see [#1702](https://github.com/GoogleCloudPlatform/kubernetes/issues/1702). Delete all objects with deployment-specific labels. +1. --dry-run for all commands +1. Support full label selection syntax, including support for namespaces. +1. Wait on conditions [#1899](https://github.com/GoogleCloudPlatform/kubernetes/issues/1899) +1. Make kubectl scriptable: make output and exit code behavior consistent and useful for wrapping in workflows and piping back into kubectl and/or xargs (e.g., dump full URLs?, distinguish permanent and retry-able failure, identify objects that should be retried) + 1. Here's [an example](http://techoverflow.net/blog/2013/10/22/docker-remove-all-images-and-containers/) where multiple objects on the command line and an option to dump object names only (`-q`) would be useful in combination. [#5906](https://github.com/GoogleCloudPlatform/kubernetes/issues/5906) +1. Easy generation of clean configuration files from existing objects (including containers -- podex) -- remove readonly fields, status + 1. Export from one namespace, import into another is an important use case +1. Derive objects from other objects + 1. pod clone + 1. rc from pod + 1. --labels-from (services from pods or rcs) +1. Kind discovery (i.e., operate on objects of all kinds) [#5278](https://github.com/GoogleCloudPlatform/kubernetes/issues/5278) +1. A fairly general-purpose way to specify fields on the command line during creation and update, not just from a config file +1. Extensible API-based generator framework (i.e. invoke generators via an API/URL rather than building them into kubectl), so that complex client libraries don’t need to be rewritten in multiple languages, and so that the abstractions are available through all interfaces: API, CLI, UI, logs, ... [#5280](https://github.com/GoogleCloudPlatform/kubernetes/issues/5280) + 1. Need schema registry, and some way to invoke generator (e.g., using a container) + 1. Convert run command to API-based generator +1. Transformation framework + 1. More intelligent defaulting of fields (e.g., [#2643](https://github.com/GoogleCloudPlatform/kubernetes/issues/2643)) +1. Update preconditions based on the values of arbitrary object fields. +1. Deployment manager compatibility on GCP: [#3685](https://github.com/GoogleCloudPlatform/kubernetes/issues/3685) +1. Describe multiple objects, multiple kinds of objects [#5905](https://github.com/GoogleCloudPlatform/kubernetes/issues/5905) +1. Support yaml document separator [#5840](https://github.com/GoogleCloudPlatform/kubernetes/issues/5840) + +TODO: +* watch +* attach [#1521](https://github.com/GoogleCloudPlatform/kubernetes/issues/1521) +* image/registry commands +* do any other server paths make sense? validate? generic curl functionality? +* template parameterization +* dynamic/runtime configuration + +Server-side support: + +1. Default selectors from labels [#1698](https://github.com/GoogleCloudPlatform/kubernetes/issues/1698#issuecomment-71048278) +1. Stop [#1535](https://github.com/GoogleCloudPlatform/kubernetes/issues/1535) +1. Deleted objects [#2789](https://github.com/GoogleCloudPlatform/kubernetes/issues/2789) +1. Clone [#170](https://github.com/GoogleCloudPlatform/kubernetes/issues/170) +1. Resize [#1629](https://github.com/GoogleCloudPlatform/kubernetes/issues/1629) +1. Useful /operations API: wait for finalization/reification +1. List supported resources [#2057](https://github.com/GoogleCloudPlatform/kubernetes/issues/2057) +1. Reverse label lookup [#1348](https://github.com/GoogleCloudPlatform/kubernetes/issues/1348) +1. Field selection [#1362](https://github.com/GoogleCloudPlatform/kubernetes/issues/1362) +1. Field filtering [#1459](https://github.com/GoogleCloudPlatform/kubernetes/issues/1459) +1. Operate on uids + + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/cli-roadmap.md?pixel)]() + diff --git a/client-libraries.md b/client-libraries.md new file mode 100644 index 00000000..b7529a01 --- /dev/null +++ b/client-libraries.md @@ -0,0 +1,43 @@ + + + + +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) + +

PLEASE NOTE: This document applies to the HEAD of the source +tree only. If you are using a released version of Kubernetes, you almost +certainly want the docs that go with that version.

+ +Documentation for specific releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) + + + + +## kubernetes API client libraries + +### Supported + * [Go](https://github.com/GoogleCloudPlatform/kubernetes/tree/master/pkg/client) + +### User Contributed +*Note: Libraries provided by outside parties are supported by their authors, not the core Kubernetes team* + + * [Java (OSGI)](https://bitbucket.org/amdatulabs/amdatu-kubernetes) + * [Java (Fabric8)](https://github.com/fabric8io/fabric8/tree/master/components/kubernetes-api) + * [Ruby](https://github.com/Ch00k/kuber) + * [Ruby](https://github.com/abonas/kubeclient) + * [PHP](https://github.com/devstub/kubernetes-api-php-client) + * [PHP](https://github.com/maclof/kubernetes-client) + * [Node.js](https://github.com/tenxcloud/node-kubernetes-client) + * [Perl](https://metacpan.org/pod/Net::Kubernetes) + + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/client-libraries.md?pixel)]() + diff --git a/developer-guide.md b/developer-guide.md new file mode 100644 index 00000000..8801cb3d --- /dev/null +++ b/developer-guide.md @@ -0,0 +1,62 @@ + + + + +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) + +

PLEASE NOTE: This document applies to the HEAD of the source +tree only. If you are using a released version of Kubernetes, you almost +certainly want the docs that go with that version.

+ +Documentation for specific releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) +![WARNING](http://kubernetes.io/img/warning.png) + + + + +# Kubernetes Developer Guide + +The developer guide is for anyone wanting to either write code which directly accesses the +kubernetes API, or to contribute directly to the kubernetes project. +It assumes some familiarity with concepts in the [User Guide](../user-guide/README.md) and the [Cluster Admin +Guide](../admin/README.md). + + +## Developing against the Kubernetes API + +* API objects are explained at [http://kubernetes.io/third_party/swagger-ui/](http://kubernetes.io/third_party/swagger-ui/). + +* **Annotations** ([docs/user-guide/annotations.md](../user-guide/annotations.md)): are for attaching arbitrary non-identifying metadata to objects. + Programs that automate Kubernetes objects may use annotations to store small amounts of their state. + +* **API Conventions** ([api-conventions.md](api-conventions.md)): + Defining the verbs and resources used in the Kubernetes API. + +* **API Client Libraries** ([client-libraries.md](client-libraries.md)): + A list of existing client libraries, both supported and user-contributed. + +## Writing Plugins + +* **Authentication Plugins** ([docs/admin/authentication.md](../admin/authentication.md)): + The current and planned states of authentication tokens. + +* **Authorization Plugins** ([docs/admin/authorization.md](../admin/authorization.md)): + Authorization applies to all HTTP requests on the main apiserver port. + This doc explains the available authorization implementations. + +* **Admission Control Plugins** ([admission_control](../design/admission_control.md)) + +## Contributing to the Kubernetes Project + +See this [README](README.md). + + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/developer-guide.md?pixel)]() + -- cgit v1.2.3 From 3a1db27f1f46e9276c6b1aa28b82d0793f1e0db2 Mon Sep 17 00:00:00 2001 From: David Oppenheimer Date: Tue, 14 Jul 2015 23:56:51 -0700 Subject: Move diagrams out of top-level docs/ directory and merge docs/devel/developer-guide.md into docs/devel/README.md --- README.md | 66 ++++++++++++++++++++++++++++++++++++++++++------------ developer-guide.md | 62 -------------------------------------------------- 2 files changed, 52 insertions(+), 76 deletions(-) delete mode 100644 developer-guide.md diff --git a/README.md b/README.md index f97c49b4..aed7276d 100644 --- a/README.md +++ b/README.md @@ -20,27 +20,35 @@ certainly want the docs that go with that version. -# Developing Kubernetes +# Kubernetes Developer Guide -Docs in this directory relate to developing Kubernetes. +The developer guide is for anyone wanting to either write code which directly accesses the +kubernetes API, or to contribute directly to the kubernetes project. +It assumes some familiarity with concepts in the [User Guide](../user-guide/README.md) and the [Cluster Admin +Guide](../admin/README.md). -* **On Collaborative Development** ([collab.md](collab.md)): info on pull requests and code reviews. -* **Development Guide** ([development.md](development.md)): Setting up your environment tests. +## The process of developing and contributing code to the Kubernetes project -* **Making release notes** ([making-release-notes.md](making-release-notes.md)): Generating release nodes for a new release. - -* **Hunting flaky tests** ([flaky-tests.md](flaky-tests.md)): We have a goal of 99.9% flake free tests. - Here's how to run your tests many times. +* **On Collaborative Development** ([collab.md](collab.md)): Info on pull requests and code reviews. * **GitHub Issues** ([issues.md](issues.md)): How incoming issues are reviewed and prioritized. -* **Logging Conventions** ([logging.md](logging.md)]: Glog levels. - * **Pull Request Process** ([pull-requests.md](pull-requests.md)): When and why pull requests are closed. -* **Releasing Kubernetes** ([releasing.md](releasing.md)): How to create a Kubernetes release (as in version) - and how the version information gets embedded into the built binaries. +* **Faster PR reviews** ([faster_reviews.md](faster_reviews.md)): How to get faster PR reviews. + +* **Getting Recent Builds** ([getting-builds.md](getting-builds.md)): How to get recent builds including the latest builds that pass CI. + + +## Setting up your dev environment, coding, and debugging + +* **Development Guide** ([development.md](development.md)): Setting up your development environment. + +* **Hunting flaky tests** ([flaky-tests.md](flaky-tests.md)): We have a goal of 99.9% flake free tests. + Here's how to run your tests many times. + +* **Logging Conventions** ([logging.md](logging.md)]: Glog levels. * **Profiling Kubernetes** ([profiling.md](profiling.md)): How to plug in go pprof profiler to Kubernetes. @@ -51,9 +59,39 @@ Docs in this directory relate to developing Kubernetes. * **Coding Conventions** ([coding-conventions.md](coding-conventions.md)): Coding style advice for contributors. -* **Faster PR reviews** ([faster_reviews.md](faster_reviews.md)): How to get faster PR reviews. -* **Getting Recent Builds** ([getting-builds.md](getting-builds.md)): How to get recent builds including the latest builds to pass CI. +## Developing against the Kubernetes API + +* API objects are explained at [http://kubernetes.io/third_party/swagger-ui/](http://kubernetes.io/third_party/swagger-ui/). + +* **Annotations** ([docs/user-guide/annotations.md](../user-guide/annotations.md)): are for attaching arbitrary non-identifying metadata to objects. + Programs that automate Kubernetes objects may use annotations to store small amounts of their state. + +* **API Conventions** ([api-conventions.md](api-conventions.md)): + Defining the verbs and resources used in the Kubernetes API. + +* **API Client Libraries** ([client-libraries.md](client-libraries.md)): + A list of existing client libraries, both supported and user-contributed. + + +## Writing plugins + +* **Authentication Plugins** ([docs/admin/authentication.md](../admin/authentication.md)): + The current and planned states of authentication tokens. + +* **Authorization Plugins** ([docs/admin/authorization.md](../admin/authorization.md)): + Authorization applies to all HTTP requests on the main apiserver port. + This doc explains the available authorization implementations. + +* **Admission Control Plugins** ([admission_control](../design/admission_control.md)) + + +## Building releases + +* **Making release notes** ([making-release-notes.md](making-release-notes.md)): Generating release nodes for a new release. + +* **Releasing Kubernetes** ([releasing.md](releasing.md)): How to create a Kubernetes release (as in version) + and how the version information gets embedded into the built binaries. diff --git a/developer-guide.md b/developer-guide.md deleted file mode 100644 index 8801cb3d..00000000 --- a/developer-guide.md +++ /dev/null @@ -1,62 +0,0 @@ - - - - -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) - -

PLEASE NOTE: This document applies to the HEAD of the source -tree only. If you are using a released version of Kubernetes, you almost -certainly want the docs that go with that version.

- -Documentation for specific releases can be found at -[releases.k8s.io](http://releases.k8s.io). - -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) - - - - -# Kubernetes Developer Guide - -The developer guide is for anyone wanting to either write code which directly accesses the -kubernetes API, or to contribute directly to the kubernetes project. -It assumes some familiarity with concepts in the [User Guide](../user-guide/README.md) and the [Cluster Admin -Guide](../admin/README.md). - - -## Developing against the Kubernetes API - -* API objects are explained at [http://kubernetes.io/third_party/swagger-ui/](http://kubernetes.io/third_party/swagger-ui/). - -* **Annotations** ([docs/user-guide/annotations.md](../user-guide/annotations.md)): are for attaching arbitrary non-identifying metadata to objects. - Programs that automate Kubernetes objects may use annotations to store small amounts of their state. - -* **API Conventions** ([api-conventions.md](api-conventions.md)): - Defining the verbs and resources used in the Kubernetes API. - -* **API Client Libraries** ([client-libraries.md](client-libraries.md)): - A list of existing client libraries, both supported and user-contributed. - -## Writing Plugins - -* **Authentication Plugins** ([docs/admin/authentication.md](../admin/authentication.md)): - The current and planned states of authentication tokens. - -* **Authorization Plugins** ([docs/admin/authorization.md](../admin/authorization.md)): - Authorization applies to all HTTP requests on the main apiserver port. - This doc explains the available authorization implementations. - -* **Admission Control Plugins** ([admission_control](../design/admission_control.md)) - -## Contributing to the Kubernetes Project - -See this [README](README.md). - - - -[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/developer-guide.md?pixel)]() - -- cgit v1.2.3 From 915691255225423c86d92c2625dea9567a32f930 Mon Sep 17 00:00:00 2001 From: David Oppenheimer Date: Tue, 14 Jul 2015 23:56:51 -0700 Subject: Move diagrams out of top-level docs/ directory and merge docs/devel/developer-guide.md into docs/devel/README.md --- architecture.dia | Bin 0 -> 6522 bytes architecture.md | 2 +- architecture.png | Bin 0 -> 222407 bytes architecture.svg | 499 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 500 insertions(+), 1 deletion(-) create mode 100644 architecture.dia create mode 100644 architecture.png create mode 100644 architecture.svg diff --git a/architecture.dia b/architecture.dia new file mode 100644 index 00000000..26e0eed2 Binary files /dev/null and b/architecture.dia differ diff --git a/architecture.md b/architecture.md index 3deeb3aa..1591068f 100644 --- a/architecture.md +++ b/architecture.md @@ -24,7 +24,7 @@ certainly want the docs that go with that version. A running Kubernetes cluster contains node agents (kubelet) and master components (APIs, scheduler, etc), on top of a distributed storage solution. This diagram shows our desired eventual state, though we're still working on a few things, like making kubelet itself (all our components, really) run within containers, and making the scheduler 100% pluggable. -![Architecture Diagram](../architecture.png?raw=true "Architecture overview") +![Architecture Diagram](architecture.png?raw=true "Architecture overview") ## The Kubernetes Node diff --git a/architecture.png b/architecture.png new file mode 100644 index 00000000..fa39039a Binary files /dev/null and b/architecture.png differ diff --git a/architecture.svg b/architecture.svg new file mode 100644 index 00000000..825c0ace --- /dev/null +++ b/architecture.svg @@ -0,0 +1,499 @@ + + + + + + + + + + + + + Node + + + + + + kubelet + + + + + + + + + + + container + + + + + + + container + + + + + + + cAdvisor + + + + + + + Pod + + + + + + + + + + + container + + + + + + + container + + + + + + + container + + + + + + + Pod + + + + + + + + + + + + container + + + + + + + container + + + + + + + container + + + + + + + Pod + + + + + + + Proxy + + + + + + + kubectl (user commands) + + + + + + + + + + + + + + + Firewall + + + + + + + Internet + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + replication controller + + + + + + + Scheduler + + + + + + + Scheduler + + + + Master components + Colocated, or spread across machines, + as dictated by cluster size. + + + + + + + + + + + + REST + (pods, services, + rep. controllers) + + + + + + + authorization + authentication + + + + + + + scheduling + actuator + + + + APIs + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + docker + + + + + + + + .. + + + ... + + + + + + + + + + + + + + + + + + + + + + + + Node + + + + + + kubelet + + + + + + + + + + + container + + + + + + + container + + + + + + + cAdvisor + + + + + + + Pod + + + + + + + + + + + container + + + + + + + container + + + + + + + container + + + + + + + Pod + + + + + + + + + + + + container + + + + + + + container + + + + + + + container + + + + + + + Pod + + + + + + + Proxy + + + + + + + + + + + + + + + + + + + docker + + + + + + + + .. + + + ... + + + + + + + + + + + + + + + + + + + + + + + + + + Distributed + Watchable + Storage + + (implemented via etcd) + + + -- cgit v1.2.3 From b6ca2b5bd605d4e65096d3cc2999f4d59d1f1495 Mon Sep 17 00:00:00 2001 From: Zach Loafman Date: Wed, 15 Jul 2015 09:31:28 -0700 Subject: Add hack/cherry_pick_list.sh to list all automated cherry picks * Adds hack/cherry_pick_list.sh to list all automated cherry picks since the last tag. * Adds a short python script to extract title/author and print it in markdown style like our current release notes. * Revises patch release instructions to use said script. --- releasing.md | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/releasing.md b/releasing.md index 2f5035cc..484620f0 100644 --- a/releasing.md +++ b/releasing.md @@ -137,7 +137,9 @@ manage cherry picks prior to cutting the release. version commit. 1. Follow the instructions given to you by that script. They are canon for the remainder of the Git process. If you don't understand something in that - process, please ask! + process, please ask! When proposing PRs, you can pre-fill the body with + `hack/cherry_pick_list.sh upstream/release-${VER}` to inform people of what + is already on the branch. **TODO**: how to fix tags, etc., if the release is changed. @@ -154,10 +156,10 @@ In your git repo (you still have `${VER}` and `${PATCH}` set from above right?): #### Writing Release Notes -Release notes for a patch release are relatives fast: `git log release-${VER}` -(If you followed the procedure in the first section, all the cherry-picks will -have the pull request number in the commit log). Unless there's some reason not -to, just include all the PRs back to the last release. +Run `hack/cherry_pick_list.sh ${VER}.${PATCH}~1` to get the release notes for +the patch release you just created. Feel free to prune anything internal, like +you would for a major release, but typically for patch releases we tend to +include everything in the release notes. ## Origin of the Sources -- cgit v1.2.3 From 304af47c459ddc8041c0f03278e2f8d043d87360 Mon Sep 17 00:00:00 2001 From: Mike Danese Date: Wed, 15 Jul 2015 10:42:59 -0700 Subject: point kubectl -f examples to correct paths --- admission_control_limit_range.md | 2 +- admission_control_resource_quota.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/admission_control_limit_range.md b/admission_control_limit_range.md index 125c6d06..2420a274 100644 --- a/admission_control_limit_range.md +++ b/admission_control_limit_range.md @@ -131,7 +131,7 @@ For example, ```shell $ kubectl namespace myspace -$ kubectl create -f examples/limitrange/limit-range.json +$ kubectl create -f docs/user-guide/limitrange/limits.yaml $ kubectl get limits NAME limits diff --git a/admission_control_resource_quota.md b/admission_control_resource_quota.md index d80f38bf..7a323689 100644 --- a/admission_control_resource_quota.md +++ b/admission_control_resource_quota.md @@ -158,7 +158,7 @@ For example, ``` $ kubectl namespace myspace -$ kubectl create -f examples/resourcequota/resource-quota.json +$ kubectl create -f docs/user-guide/resourcequota/quota.yaml $ kubectl get quota NAME quota -- cgit v1.2.3 From dcb2c5ffc12e3ffbf2cef7ebe35911446294b2f5 Mon Sep 17 00:00:00 2001 From: Eric Tune Date: Wed, 15 Jul 2015 11:34:04 -0700 Subject: Remove requirement of specifying kubernetes ver. Now that things are more stable, and we have a conformance test, the binary version is no longer needed. Remove that requirement. --- writing-a-getting-started-guide.md | 24 +++++++++--------------- 1 file changed, 9 insertions(+), 15 deletions(-) diff --git a/writing-a-getting-started-guide.md b/writing-a-getting-started-guide.md index 348faf9b..d7463a4c 100644 --- a/writing-a-getting-started-guide.md +++ b/writing-a-getting-started-guide.md @@ -20,10 +20,11 @@ certainly want the docs that go with that version. + # Writing a Getting Started Guide This page gives some advice for anyone planning to write or update a Getting Started Guide for Kubernetes. It also gives some guidelines which reviewers should follow when reviewing a pull request for a -guide. +guide. A Getting Started Guide is instructions on how to create a Kubernetes cluster on top of a particular type(s) of infrastructure. Infrastructure includes: the IaaS provider for VMs; @@ -36,11 +37,12 @@ the combination of all these things needed to run on a particular type of infras which is similar to the one you have planned, consider improving that one. -Distros fall into two categories: +Distros fall into two categories: - **versioned distros** are tested to work with a particular binary release of Kubernetes. These come in a wide variety, reflecting a wide range of ideas and preferences in how to run a cluster. - **development distros** are tested work with the latest Kubernetes source code. But, there are - relatively few of these and the bar is much higher for creating one. + relatively few of these and the bar is much higher for creating one. They must support + fully automated cluster creation, deletion, and upgrade. There are different guidelines for each. @@ -51,17 +53,14 @@ These guidelines say *what* to do. See the Rationale section for *why*. search for uses of flags by guides. - We may ask that you host binary assets or large amounts of code in our `contrib` directory or on your own repo. + - Add or update a row in [The Matrix](../../docs/getting-started-guides/README.md). + - State the binary version of kubernetes that you tested clearly in your Guide doc. - Setup a cluster and run the [conformance test](development.md#conformance-testing) against it, and report the results in your PR. - - Add or update a row in [The Matrix](../../docs/getting-started-guides/README.md). - - State the binary version of kubernetes that you tested clearly in your Guide doc and in The Matrix. - - Even if you are just updating the binary version used, please still do a conformance test. - - If it worked before and now fails, you can ask on IRC, - check the release notes since your last tested version, or look at git -logs for files in other distros - that are updated to the new version. - Versioned distros should typically not modify or add code in `cluster/`. That is just scripts for developer distros. - - If a versioned distro has not been updated for many binary releases, it may be dropped from the Matrix. + - When a new major or minor release of Kubernetes comes out, we may also release a new + conformance test, and require a new conformance test run to earn a conformance checkmark. If you have a cluster partially working, but doing all the above steps seems like too much work, we still want to hear from you. We suggest you write a blog post or a Gist, and we will link to it on our wiki page. @@ -93,11 +92,6 @@ These guidelines say *what* to do. See the Rationale section for *why*. - We want users to have a uniform experience with Kubernetes whenever they follow instructions anywhere in our Github repository. So, we ask that versioned distros pass a **conformance test** to make sure really work. - - We ask versioned distros to **clearly state a version**. People pulling from Github may - expect any instructions there to work at Head, so stuff that has not been tested at Head needs - to be called out. We are still changing things really fast, and, while the REST API is versioned, - it is not practical at this point to version or limit changes that affect distros. We still change - flags at the Kubernetes/Infrastructure interface. - We want to **limit the number of development distros** for several reasons. Developers should only have to change a limited number of places to add a new feature. Also, since we will gate commits on passing CI for all distros, and since end-to-end tests are typically somewhat -- cgit v1.2.3 From 7b8b2772975fc34673eee47d44f49fc90b03d089 Mon Sep 17 00:00:00 2001 From: David Oppenheimer Date: Thu, 16 Jul 2015 02:20:30 -0700 Subject: Take availability.md doc and - extract the portion related to multi-cluster operation into a new multi-cluster.md doc - merge the remainder (that was basically high-level troubleshooting advice) into cluster-troubleshooting.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 1f850ffb..2c0455da 100644 --- a/README.md +++ b/README.md @@ -32,7 +32,7 @@ Kubernetes enables users to ask a cluster to run a set of containers. The system Kubernetes is intended to run on a number of cloud providers, as well as on physical hosts. -A single Kubernetes cluster is not intended to span multiple availability zones. Instead, we recommend building a higher-level layer to replicate complete deployments of highly available applications across multiple zones (see [the availability doc](../admin/availability.md) and [cluster federation proposal](../proposals/federation.md) for more details). +A single Kubernetes cluster is not intended to span multiple availability zones. Instead, we recommend building a higher-level layer to replicate complete deployments of highly available applications across multiple zones (see [the multi-cluster doc](../admin/multi-cluster.md) and [cluster federation proposal](../proposals/federation.md) for more details). Finally, Kubernetes aspires to be an extensible, pluggable, building-block OSS platform and toolkit. Therefore, architecturally, we want Kubernetes to be built as a collection of pluggable components and layers, with the ability to use alternative schedulers, controllers, storage systems, and distribution mechanisms, and we're evolving its current code in that direction. Furthermore, we want others to be able to extend Kubernetes functionality, such as with higher-level PaaS functionality or multi-cluster layers, without modification of core Kubernetes source. Therefore, its API isn't just (or even necessarily mainly) targeted at end users, but at tool and extension developers. Its APIs are intended to serve as the foundation for an open ecosystem of tools, automation systems, and higher-level API layers. Consequently, there are no "internal" inter-component APIs. All APIs are visible and available, including the APIs used by the scheduler, the node controller, the replication-controller manager, Kubelet's API, etc. There's no glass to break -- in order to handle more complex use cases, one can just access the lower-level APIs in a fully transparent, composable manner. -- cgit v1.2.3 From e854d97ff44c7a463a5350c546ce32eb3e7bc994 Mon Sep 17 00:00:00 2001 From: Tim Hockin Date: Wed, 15 Jul 2015 17:20:39 -0700 Subject: Add munger to verify kubectl -f targets, fix docs --- flaky-tests.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/flaky-tests.md b/flaky-tests.md index fb000ea6..86c898d9 100644 --- a/flaky-tests.md +++ b/flaky-tests.md @@ -56,7 +56,7 @@ spec: Note that we omit the labels and the selector fields of the replication controller, because they will be populated from the labels field of the pod template by default. ``` -kubectl create -f controller.yaml +kubectl create -f ./controller.yaml ``` This will spin up 24 instances of the test. They will run to completion, then exit, and the kubelet will restart them, accumulating more and more runs of the test. -- cgit v1.2.3 From c198491ead87e5a970a17f75c25a7f3843006f2a Mon Sep 17 00:00:00 2001 From: Daniel Smith Date: Thu, 16 Jul 2015 14:54:28 -0700 Subject: (mostly) auto fixed links --- event_compression.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/event_compression.md b/event_compression.md index 32e52607..294d3f41 100644 --- a/event_compression.md +++ b/event_compression.md @@ -35,7 +35,7 @@ Each binary that generates events (for example, ```kubelet```) should keep track Event compression should be best effort (not guaranteed). Meaning, in the worst case, ```n``` identical (minus timestamp) events may still result in ```n``` event entries. ## Design -Instead of a single Timestamp, each event object [contains](https://github.com/GoogleCloudPlatform/kubernetes/blob/master/pkg/api/types.go#L1111) the following fields: +Instead of a single Timestamp, each event object [contains](../../pkg/api/types.go#L1111) the following fields: * ```FirstTimestamp util.Time``` * The date/time of the first occurrence of the event. * ```LastTimestamp util.Time``` @@ -47,7 +47,7 @@ Instead of a single Timestamp, each event object [contains](https://github.com/G Each binary that generates events: * Maintains a historical record of previously generated events: - * Implemented with ["Least Recently Used Cache"](https://github.com/golang/groupcache/blob/master/lru/lru.go) in [```pkg/client/record/events_cache.go```](https://github.com/GoogleCloudPlatform/kubernetes/tree/master/pkg/client/record/events_cache.go). + * Implemented with ["Least Recently Used Cache"](https://github.com/golang/groupcache/blob/master/lru/lru.go) in [```pkg/client/record/events_cache.go```](../../pkg/client/record/events_cache.go). * The key in the cache is generated from the event object minus timestamps/count/transient fields, specifically the following events fields are used to construct a unique key for an event: * ```event.Source.Component``` * ```event.Source.Host``` @@ -59,7 +59,7 @@ Each binary that generates events: * ```event.Reason``` * ```event.Message``` * The LRU cache is capped at 4096 events. That means if a component (e.g. kubelet) runs for a long period of time and generates tons of unique events, the previously generated events cache will not grow unchecked in memory. Instead, after 4096 unique events are generated, the oldest events are evicted from the cache. - * When an event is generated, the previously generated events cache is checked (see [```pkg/client/record/event.go```](https://github.com/GoogleCloudPlatform/kubernetes/blob/master/pkg/client/record/event.go)). + * When an event is generated, the previously generated events cache is checked (see [```pkg/client/record/event.go```](../../pkg/client/record/event.go)). * If the key for the new event matches the key for a previously generated event (meaning all of the above fields match between the new event and some previously generated event), then the event is considered to be a duplicate and the existing event entry is updated in etcd: * The new PUT (update) event API is called to update the existing event entry in etcd with the new last seen timestamp and count. * The event is also updated in the previously generated events cache with an incremented count, updated last seen timestamp, name, and new resource version (all required to issue a future event update). -- cgit v1.2.3 From d43894cdce090482de0d25f9510603c9d806870c Mon Sep 17 00:00:00 2001 From: Daniel Smith Date: Thu, 16 Jul 2015 14:54:28 -0700 Subject: (mostly) auto fixed links --- cli-roadmap.md | 6 +++--- client-libraries.md | 2 +- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/cli-roadmap.md b/cli-roadmap.md index fe8d5b0f..f2b9f8c1 100644 --- a/cli-roadmap.md +++ b/cli-roadmap.md @@ -23,9 +23,9 @@ certainly want the docs that go with that version. # Kubernetes CLI/Configuration Roadmap See also issues with the following labels: -* [area/config-deployment](https://github.com/GoogleCloudPlatform/kubernetes/labels/area%2Fconfig-deployment) -* [component/CLI](https://github.com/GoogleCloudPlatform/kubernetes/labels/component%2FCLI) -* [component/client](https://github.com/GoogleCloudPlatform/kubernetes/labels/component%2Fclient) +* [area/app-config-deployment](https://github.com/GoogleCloudPlatform/kubernetes/labels/area/app-config-deployment) +* [component/CLI](https://github.com/GoogleCloudPlatform/kubernetes/labels/component/CLI) +* [component/client](https://github.com/GoogleCloudPlatform/kubernetes/labels/component/client) 1. Create services before other objects, or at least before objects that depend upon them. Namespace-relative DNS mitigates this some, but most users are still using service environment variables. [#1768](https://github.com/GoogleCloudPlatform/kubernetes/issues/1768) 1. Finish rolling update [#1353](https://github.com/GoogleCloudPlatform/kubernetes/issues/1353) diff --git a/client-libraries.md b/client-libraries.md index b7529a01..ef9a1f69 100644 --- a/client-libraries.md +++ b/client-libraries.md @@ -23,7 +23,7 @@ certainly want the docs that go with that version. ## kubernetes API client libraries ### Supported - * [Go](https://github.com/GoogleCloudPlatform/kubernetes/tree/master/pkg/client) + * [Go](../../pkg/client/) ### User Contributed *Note: Libraries provided by outside parties are supported by their authors, not the core Kubernetes team* -- cgit v1.2.3 From a7425fa6891a042de69331dba36282276084edeb Mon Sep 17 00:00:00 2001 From: Janet Kuo Date: Wed, 15 Jul 2015 17:28:59 -0700 Subject: Ensure all docs and examples in user guide are reachable --- scheduler_algorithm.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/scheduler_algorithm.md b/scheduler_algorithm.md index fc402516..146c0190 100644 --- a/scheduler_algorithm.md +++ b/scheduler_algorithm.md @@ -31,7 +31,7 @@ The purpose of filtering the nodes is to filter out the nodes that do not meet c - `PodFitsResources`: Check if the free resource (CPU and Memory) meets the requirement of the Pod. The free resource is measured by the capacity minus the sum of limits of all Pods on the node. - `PodFitsPorts`: Check if any HostPort required by the Pod is already occupied on the node. - `PodFitsHost`: Filter out all nodes except the one specified in the PodSpec's NodeName field. -- `PodSelectorMatches`: Check if the labels of the node match the labels specified in the Pod's `nodeSelector` field. +- `PodSelectorMatches`: Check if the labels of the node match the labels specified in the Pod's `nodeSelector` field ([Here](../user-guide/node-selection/) is an example of how to use `nodeSelector` field). - `CheckNodeLabelPresence`: Check if all the specified labels exist on a node or not, regardless of the value. The details of the above predicates can be found in [plugin/pkg/scheduler/algorithm/predicates/predicates.go](../../plugin/pkg/scheduler/algorithm/predicates/predicates.go). All predicates mentioned above can be used in combination to perform a sophisticated filtering policy. Kubernetes uses some, but not all, of these predicates by default. You can see which ones are used by default in [plugin/pkg/scheduler/algorithmprovider/defaults/defaults.go](../../plugin/pkg/scheduler/algorithmprovider/defaults/defaults.go). -- cgit v1.2.3 From 9f528c422685de25aafd8c76cdaef0125c005855 Mon Sep 17 00:00:00 2001 From: Janet Kuo Date: Wed, 15 Jul 2015 17:28:59 -0700 Subject: Ensure all docs and examples in user guide are reachable --- admission_control_limit_range.md | 3 +++ admission_control_resource_quota.md | 3 +++ persistent-storage.md | 2 +- secrets.md | 4 ++-- simple-rolling-update.md | 4 ++-- 5 files changed, 11 insertions(+), 5 deletions(-) diff --git a/admission_control_limit_range.md b/admission_control_limit_range.md index 2420a274..addd8483 100644 --- a/admission_control_limit_range.md +++ b/admission_control_limit_range.md @@ -153,6 +153,9 @@ It is expected we will want to define limits for particular pods or containers b To make a **LimitRangeItem** more restrictive, we will intend to add these additional restrictions at a future point in time. +## Example +See the [example of Limit Range](../user-guide/limitrange) for more information. + [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/admission_control_limit_range.md?pixel)]() diff --git a/admission_control_resource_quota.md b/admission_control_resource_quota.md index 7a323689..ec2cb20d 100644 --- a/admission_control_resource_quota.md +++ b/admission_control_resource_quota.md @@ -174,6 +174,9 @@ resourcequotas 1 1 services 3 5 ``` +## More information +See [resource quota document](../admin/resource-quota.md) and the [example of Resource Quota](../user-guide/resourcequota) for more information. + [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/admission_control_resource_quota.md?pixel)]() diff --git a/persistent-storage.md b/persistent-storage.md index 9639a521..1cbed771 100644 --- a/persistent-storage.md +++ b/persistent-storage.md @@ -28,7 +28,7 @@ This document proposes a model for managing persistent, cluster-scoped storage f Two new API kinds: -A `PersistentVolume` (PV) is a storage resource provisioned by an administrator. It is analogous to a node. +A `PersistentVolume` (PV) is a storage resource provisioned by an administrator. It is analogous to a node. See [Persistent Volume Guide](../user-guide/persistent-volumes/) for how to use it. A `PersistentVolumeClaim` (PVC) is a user's request for a persistent volume to use in a pod. It is analogous to a pod. diff --git a/secrets.md b/secrets.md index 33433dc0..b4bc8385 100644 --- a/secrets.md +++ b/secrets.md @@ -23,8 +23,8 @@ certainly want the docs that go with that version. ## Abstract -A proposal for the distribution of secrets (passwords, keys, etc) to the Kubelet and to -containers inside Kubernetes using a custom volume type. +A proposal for the distribution of [secrets](../user-guide/secrets.md) (passwords, keys, etc) to the Kubelet and to +containers inside Kubernetes using a custom [volume](../user-guide/volumes.md#secrets) type. See the [secrets example](../user-guide/secrets/) for more information. ## Motivation diff --git a/simple-rolling-update.md b/simple-rolling-update.md index ed2e5349..b74264d6 100644 --- a/simple-rolling-update.md +++ b/simple-rolling-update.md @@ -21,9 +21,9 @@ certainly want the docs that go with that version. ## Simple rolling update -This is a lightweight design document for simple rolling update in ```kubectl``` +This is a lightweight design document for simple [rolling update](../user-guide/kubectl/kubectl_rolling-update.md) in ```kubectl```. -Complete execution flow can be found [here](#execution-details). +Complete execution flow can be found [here](#execution-details). See the [example of rolling update](../user-guide/update-demo/) for more information. ### Lightweight rollout Assume that we have a current replication controller named ```foo``` and it is running image ```image:v1``` -- cgit v1.2.3 From 7c7cbb2a44d2b0acf06ad8256ab404cdae509894 Mon Sep 17 00:00:00 2001 From: Janet Kuo Date: Thu, 16 Jul 2015 17:56:56 -0700 Subject: MUNGE generated table of contents should strip comma --- api-conventions.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/api-conventions.md b/api-conventions.md index 4a0cfccb..3014d0cb 100644 --- a/api-conventions.md +++ b/api-conventions.md @@ -31,7 +31,7 @@ using resources with kubectl can be found in (working_with_resources.md).* **Table of Contents** - - [Types (Kinds)](#types-(kinds)) + - [Types (Kinds)](#types-kinds) - [Resources](#resources) - [Objects](#objects) - [Metadata](#metadata) -- cgit v1.2.3 From 6a198dfa61ce281f5d092a9c2576c46bb01a1482 Mon Sep 17 00:00:00 2001 From: Tim Hockin Date: Thu, 16 Jul 2015 10:02:26 -0700 Subject: Better scary message --- README.md | 38 ++++++++++++++++++++++++-------------- api-conventions.md | 38 ++++++++++++++++++++++++-------------- api_changes.md | 38 ++++++++++++++++++++++++-------------- cherry-picks.md | 38 ++++++++++++++++++++++++-------------- cli-roadmap.md | 32 +++++++++++++++++++++----------- client-libraries.md | 38 ++++++++++++++++++++++++-------------- coding-conventions.md | 38 ++++++++++++++++++++++++-------------- collab.md | 38 ++++++++++++++++++++++++-------------- developer-guides/vagrant.md | 38 ++++++++++++++++++++++++-------------- development.md | 38 ++++++++++++++++++++++++-------------- faster_reviews.md | 38 ++++++++++++++++++++++++-------------- flaky-tests.md | 38 ++++++++++++++++++++++++-------------- getting-builds.md | 38 ++++++++++++++++++++++++-------------- instrumentation.md | 38 ++++++++++++++++++++++++-------------- issues.md | 38 ++++++++++++++++++++++++-------------- logging.md | 38 ++++++++++++++++++++++++-------------- making-release-notes.md | 38 ++++++++++++++++++++++++-------------- profiling.md | 38 ++++++++++++++++++++++++-------------- pull-requests.md | 38 ++++++++++++++++++++++++-------------- releasing.md | 38 ++++++++++++++++++++++++-------------- scheduler.md | 38 ++++++++++++++++++++++++-------------- scheduler_algorithm.md | 32 +++++++++++++++++++++----------- writing-a-getting-started-guide.md | 38 ++++++++++++++++++++++++-------------- 23 files changed, 546 insertions(+), 316 deletions(-) diff --git a/README.md b/README.md index aed7276d..a06efc8d 100644 --- a/README.md +++ b/README.md @@ -2,20 +2,30 @@ -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) - -

PLEASE NOTE: This document applies to the HEAD of the source -tree only. If you are using a released version of Kubernetes, you almost -certainly want the docs that go with that version.

- -Documentation for specific releases can be found at -[releases.k8s.io](http://releases.k8s.io). - -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest 1.0.x release of this document can be found +[here](http://releases.k8s.io/release-1.0/docs/devel/README.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- diff --git a/api-conventions.md b/api-conventions.md index 3014d0cb..50e1e1d2 100644 --- a/api-conventions.md +++ b/api-conventions.md @@ -2,20 +2,30 @@ -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) - -

PLEASE NOTE: This document applies to the HEAD of the source -tree only. If you are using a released version of Kubernetes, you almost -certainly want the docs that go with that version.

- -Documentation for specific releases can be found at -[releases.k8s.io](http://releases.k8s.io). - -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest 1.0.x release of this document can be found +[here](http://releases.k8s.io/release-1.0/docs/devel/api-conventions.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- diff --git a/api_changes.md b/api_changes.md index 2d571eb5..c7458d53 100644 --- a/api_changes.md +++ b/api_changes.md @@ -2,20 +2,30 @@ -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) - -

PLEASE NOTE: This document applies to the HEAD of the source -tree only. If you are using a released version of Kubernetes, you almost -certainly want the docs that go with that version.

- -Documentation for specific releases can be found at -[releases.k8s.io](http://releases.k8s.io). - -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest 1.0.x release of this document can be found +[here](http://releases.k8s.io/release-1.0/docs/devel/api_changes.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- diff --git a/cherry-picks.md b/cherry-picks.md index b971f2fc..1d59eaef 100644 --- a/cherry-picks.md +++ b/cherry-picks.md @@ -2,20 +2,30 @@ -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) - -

PLEASE NOTE: This document applies to the HEAD of the source -tree only. If you are using a released version of Kubernetes, you almost -certainly want the docs that go with that version.

- -Documentation for specific releases can be found at -[releases.k8s.io](http://releases.k8s.io). - -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest 1.0.x release of this document can be found +[here](http://releases.k8s.io/release-1.0/docs/devel/cherry-picks.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- diff --git a/cli-roadmap.md b/cli-roadmap.md index f2b9f8c1..45c26827 100644 --- a/cli-roadmap.md +++ b/cli-roadmap.md @@ -2,20 +2,30 @@ -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) +WARNING +WARNING +WARNING +WARNING +WARNING -

PLEASE NOTE: This document applies to the HEAD of the source -tree only. If you are using a released version of Kubernetes, you almost -certainly want the docs that go with that version.

+

PLEASE NOTE: This document applies to the HEAD of the source tree

-Documentation for specific releases can be found at -[releases.k8s.io](http://releases.k8s.io). +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) + +The latest 1.0.x release of this document can be found +[here](http://releases.k8s.io/release-1.0/docs/devel/cli-roadmap.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- diff --git a/client-libraries.md b/client-libraries.md index ef9a1f69..ae7cb623 100644 --- a/client-libraries.md +++ b/client-libraries.md @@ -2,20 +2,30 @@ -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) - -

PLEASE NOTE: This document applies to the HEAD of the source -tree only. If you are using a released version of Kubernetes, you almost -certainly want the docs that go with that version.

- -Documentation for specific releases can be found at -[releases.k8s.io](http://releases.k8s.io). - -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest 1.0.x release of this document can be found +[here](http://releases.k8s.io/release-1.0/docs/devel/client-libraries.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- diff --git a/coding-conventions.md b/coding-conventions.md index 76ba29e8..ac3d353f 100644 --- a/coding-conventions.md +++ b/coding-conventions.md @@ -2,20 +2,30 @@ -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) - -

PLEASE NOTE: This document applies to the HEAD of the source -tree only. If you are using a released version of Kubernetes, you almost -certainly want the docs that go with that version.

- -Documentation for specific releases can be found at -[releases.k8s.io](http://releases.k8s.io). - -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest 1.0.x release of this document can be found +[here](http://releases.k8s.io/release-1.0/docs/devel/coding-conventions.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- diff --git a/collab.md b/collab.md index caadc8de..38b6d586 100644 --- a/collab.md +++ b/collab.md @@ -2,20 +2,30 @@ -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) - -

PLEASE NOTE: This document applies to the HEAD of the source -tree only. If you are using a released version of Kubernetes, you almost -certainly want the docs that go with that version.

- -Documentation for specific releases can be found at -[releases.k8s.io](http://releases.k8s.io). - -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest 1.0.x release of this document can be found +[here](http://releases.k8s.io/release-1.0/docs/devel/collab.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- diff --git a/developer-guides/vagrant.md b/developer-guides/vagrant.md index 0ef31c68..5b4013e3 100644 --- a/developer-guides/vagrant.md +++ b/developer-guides/vagrant.md @@ -2,20 +2,30 @@ -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) - -

PLEASE NOTE: This document applies to the HEAD of the source -tree only. If you are using a released version of Kubernetes, you almost -certainly want the docs that go with that version.

- -Documentation for specific releases can be found at -[releases.k8s.io](http://releases.k8s.io). - -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest 1.0.x release of this document can be found +[here](http://releases.k8s.io/release-1.0/docs/devel/developer-guides/vagrant.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- diff --git a/development.md b/development.md index e2ec2068..1255b7a8 100644 --- a/development.md +++ b/development.md @@ -2,20 +2,30 @@ -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) - -

PLEASE NOTE: This document applies to the HEAD of the source -tree only. If you are using a released version of Kubernetes, you almost -certainly want the docs that go with that version.

- -Documentation for specific releases can be found at -[releases.k8s.io](http://releases.k8s.io). - -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest 1.0.x release of this document can be found +[here](http://releases.k8s.io/release-1.0/docs/devel/development.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- diff --git a/faster_reviews.md b/faster_reviews.md index 335d2a3e..20e3e990 100644 --- a/faster_reviews.md +++ b/faster_reviews.md @@ -2,20 +2,30 @@ -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) - -

PLEASE NOTE: This document applies to the HEAD of the source -tree only. If you are using a released version of Kubernetes, you almost -certainly want the docs that go with that version.

- -Documentation for specific releases can be found at -[releases.k8s.io](http://releases.k8s.io). - -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest 1.0.x release of this document can be found +[here](http://releases.k8s.io/release-1.0/docs/devel/faster_reviews.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- diff --git a/flaky-tests.md b/flaky-tests.md index 86c898d9..52ba45a2 100644 --- a/flaky-tests.md +++ b/flaky-tests.md @@ -2,20 +2,30 @@ -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) - -

PLEASE NOTE: This document applies to the HEAD of the source -tree only. If you are using a released version of Kubernetes, you almost -certainly want the docs that go with that version.

- -Documentation for specific releases can be found at -[releases.k8s.io](http://releases.k8s.io). - -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest 1.0.x release of this document can be found +[here](http://releases.k8s.io/release-1.0/docs/devel/flaky-tests.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- diff --git a/getting-builds.md b/getting-builds.md index 372d080d..e41c4fbf 100644 --- a/getting-builds.md +++ b/getting-builds.md @@ -2,20 +2,30 @@ -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) - -

PLEASE NOTE: This document applies to the HEAD of the source -tree only. If you are using a released version of Kubernetes, you almost -certainly want the docs that go with that version.

- -Documentation for specific releases can be found at -[releases.k8s.io](http://releases.k8s.io). - -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest 1.0.x release of this document can be found +[here](http://releases.k8s.io/release-1.0/docs/devel/getting-builds.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- diff --git a/instrumentation.md b/instrumentation.md index 95786c52..8cc9e2b2 100644 --- a/instrumentation.md +++ b/instrumentation.md @@ -2,20 +2,30 @@ -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) - -

PLEASE NOTE: This document applies to the HEAD of the source -tree only. If you are using a released version of Kubernetes, you almost -certainly want the docs that go with that version.

- -Documentation for specific releases can be found at -[releases.k8s.io](http://releases.k8s.io). - -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest 1.0.x release of this document can be found +[here](http://releases.k8s.io/release-1.0/docs/devel/instrumentation.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- diff --git a/issues.md b/issues.md index 689a18ff..46beb9ce 100644 --- a/issues.md +++ b/issues.md @@ -2,20 +2,30 @@ -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) - -

PLEASE NOTE: This document applies to the HEAD of the source -tree only. If you are using a released version of Kubernetes, you almost -certainly want the docs that go with that version.

- -Documentation for specific releases can be found at -[releases.k8s.io](http://releases.k8s.io). - -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest 1.0.x release of this document can be found +[here](http://releases.k8s.io/release-1.0/docs/devel/issues.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- diff --git a/logging.md b/logging.md index 1a536d07..3870c4c3 100644 --- a/logging.md +++ b/logging.md @@ -2,20 +2,30 @@ -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) - -

PLEASE NOTE: This document applies to the HEAD of the source -tree only. If you are using a released version of Kubernetes, you almost -certainly want the docs that go with that version.

- -Documentation for specific releases can be found at -[releases.k8s.io](http://releases.k8s.io). - -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest 1.0.x release of this document can be found +[here](http://releases.k8s.io/release-1.0/docs/devel/logging.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- diff --git a/making-release-notes.md b/making-release-notes.md index 5703965a..b362d857 100644 --- a/making-release-notes.md +++ b/making-release-notes.md @@ -2,20 +2,30 @@ -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) - -

PLEASE NOTE: This document applies to the HEAD of the source -tree only. If you are using a released version of Kubernetes, you almost -certainly want the docs that go with that version.

- -Documentation for specific releases can be found at -[releases.k8s.io](http://releases.k8s.io). - -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest 1.0.x release of this document can be found +[here](http://releases.k8s.io/release-1.0/docs/devel/making-release-notes.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- diff --git a/profiling.md b/profiling.md index 863dc4c1..215f0c41 100644 --- a/profiling.md +++ b/profiling.md @@ -2,20 +2,30 @@ -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) - -

PLEASE NOTE: This document applies to the HEAD of the source -tree only. If you are using a released version of Kubernetes, you almost -certainly want the docs that go with that version.

- -Documentation for specific releases can be found at -[releases.k8s.io](http://releases.k8s.io). - -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest 1.0.x release of this document can be found +[here](http://releases.k8s.io/release-1.0/docs/devel/profiling.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- diff --git a/pull-requests.md b/pull-requests.md index bdb7a172..e42faa51 100644 --- a/pull-requests.md +++ b/pull-requests.md @@ -2,20 +2,30 @@ -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) - -

PLEASE NOTE: This document applies to the HEAD of the source -tree only. If you are using a released version of Kubernetes, you almost -certainly want the docs that go with that version.

- -Documentation for specific releases can be found at -[releases.k8s.io](http://releases.k8s.io). - -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest 1.0.x release of this document can be found +[here](http://releases.k8s.io/release-1.0/docs/devel/pull-requests.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- diff --git a/releasing.md b/releasing.md index 484620f0..8469fc40 100644 --- a/releasing.md +++ b/releasing.md @@ -2,20 +2,30 @@ -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) - -

PLEASE NOTE: This document applies to the HEAD of the source -tree only. If you are using a released version of Kubernetes, you almost -certainly want the docs that go with that version.

- -Documentation for specific releases can be found at -[releases.k8s.io](http://releases.k8s.io). - -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest 1.0.x release of this document can be found +[here](http://releases.k8s.io/release-1.0/docs/devel/releasing.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- diff --git a/scheduler.md b/scheduler.md index 912d1128..1fccc7ad 100644 --- a/scheduler.md +++ b/scheduler.md @@ -2,20 +2,30 @@ -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) - -

PLEASE NOTE: This document applies to the HEAD of the source -tree only. If you are using a released version of Kubernetes, you almost -certainly want the docs that go with that version.

- -Documentation for specific releases can be found at -[releases.k8s.io](http://releases.k8s.io). - -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest 1.0.x release of this document can be found +[here](http://releases.k8s.io/release-1.0/docs/devel/scheduler.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- diff --git a/scheduler_algorithm.md b/scheduler_algorithm.md index 146c0190..791de7c4 100644 --- a/scheduler_algorithm.md +++ b/scheduler_algorithm.md @@ -2,20 +2,30 @@ -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) +WARNING +WARNING +WARNING +WARNING +WARNING -

PLEASE NOTE: This document applies to the HEAD of the source -tree only. If you are using a released version of Kubernetes, you almost -certainly want the docs that go with that version.

+

PLEASE NOTE: This document applies to the HEAD of the source tree

-Documentation for specific releases can be found at -[releases.k8s.io](http://releases.k8s.io). +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) + +The latest 1.0.x release of this document can be found +[here](http://releases.k8s.io/release-1.0/docs/devel/scheduler_algorithm.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- diff --git a/writing-a-getting-started-guide.md b/writing-a-getting-started-guide.md index d7463a4c..3e67b632 100644 --- a/writing-a-getting-started-guide.md +++ b/writing-a-getting-started-guide.md @@ -2,20 +2,30 @@ -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) - -

PLEASE NOTE: This document applies to the HEAD of the source -tree only. If you are using a released version of Kubernetes, you almost -certainly want the docs that go with that version.

- -Documentation for specific releases can be found at -[releases.k8s.io](http://releases.k8s.io). - -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest 1.0.x release of this document can be found +[here](http://releases.k8s.io/release-1.0/docs/devel/writing-a-getting-started-guide.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- -- cgit v1.2.3 From 4510fab29da1d028321cf708a5665a14757b7ca7 Mon Sep 17 00:00:00 2001 From: Tim Hockin Date: Thu, 16 Jul 2015 10:02:26 -0700 Subject: Better scary message --- README.md | 38 +++++++++++++++++++++++------------- access.md | 38 +++++++++++++++++++++++------------- admission_control.md | 38 +++++++++++++++++++++++------------- admission_control_limit_range.md | 38 +++++++++++++++++++++++------------- admission_control_resource_quota.md | 38 +++++++++++++++++++++++------------- architecture.md | 38 +++++++++++++++++++++++------------- clustering.md | 38 +++++++++++++++++++++++------------- clustering/README.md | 38 +++++++++++++++++++++++------------- command_execution_port_forwarding.md | 38 +++++++++++++++++++++++------------- event_compression.md | 38 +++++++++++++++++++++++------------- expansion.md | 38 +++++++++++++++++++++++------------- identifiers.md | 38 +++++++++++++++++++++++------------- namespaces.md | 38 +++++++++++++++++++++++------------- networking.md | 38 +++++++++++++++++++++++------------- persistent-storage.md | 38 +++++++++++++++++++++++------------- principles.md | 38 +++++++++++++++++++++++------------- resources.md | 38 +++++++++++++++++++++++------------- secrets.md | 38 +++++++++++++++++++++++------------- security.md | 38 +++++++++++++++++++++++------------- security_context.md | 38 +++++++++++++++++++++++------------- service_accounts.md | 38 +++++++++++++++++++++++------------- simple-rolling-update.md | 38 +++++++++++++++++++++++------------- versioning.md | 38 +++++++++++++++++++++++------------- 23 files changed, 552 insertions(+), 322 deletions(-) diff --git a/README.md b/README.md index 2c0455da..b0f3115a 100644 --- a/README.md +++ b/README.md @@ -2,20 +2,30 @@ -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) - -

PLEASE NOTE: This document applies to the HEAD of the source -tree only. If you are using a released version of Kubernetes, you almost -certainly want the docs that go with that version.

- -Documentation for specific releases can be found at -[releases.k8s.io](http://releases.k8s.io). - -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest 1.0.x release of this document can be found +[here](http://releases.k8s.io/release-1.0/docs/design/README.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- diff --git a/access.md b/access.md index c3ac41a0..e42d7859 100644 --- a/access.md +++ b/access.md @@ -2,20 +2,30 @@ -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) - -

PLEASE NOTE: This document applies to the HEAD of the source -tree only. If you are using a released version of Kubernetes, you almost -certainly want the docs that go with that version.

- -Documentation for specific releases can be found at -[releases.k8s.io](http://releases.k8s.io). - -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest 1.0.x release of this document can be found +[here](http://releases.k8s.io/release-1.0/docs/design/access.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- diff --git a/admission_control.md b/admission_control.md index a80de2b2..aaa6ed16 100644 --- a/admission_control.md +++ b/admission_control.md @@ -2,20 +2,30 @@ -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) - -

PLEASE NOTE: This document applies to the HEAD of the source -tree only. If you are using a released version of Kubernetes, you almost -certainly want the docs that go with that version.

- -Documentation for specific releases can be found at -[releases.k8s.io](http://releases.k8s.io). - -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest 1.0.x release of this document can be found +[here](http://releases.k8s.io/release-1.0/docs/design/admission_control.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- diff --git a/admission_control_limit_range.md b/admission_control_limit_range.md index addd8483..824d4a35 100644 --- a/admission_control_limit_range.md +++ b/admission_control_limit_range.md @@ -2,20 +2,30 @@ -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) - -

PLEASE NOTE: This document applies to the HEAD of the source -tree only. If you are using a released version of Kubernetes, you almost -certainly want the docs that go with that version.

- -Documentation for specific releases can be found at -[releases.k8s.io](http://releases.k8s.io). - -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest 1.0.x release of this document can be found +[here](http://releases.k8s.io/release-1.0/docs/design/admission_control_limit_range.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- diff --git a/admission_control_resource_quota.md b/admission_control_resource_quota.md index ec2cb20d..e262eb2d 100644 --- a/admission_control_resource_quota.md +++ b/admission_control_resource_quota.md @@ -2,20 +2,30 @@ -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) - -

PLEASE NOTE: This document applies to the HEAD of the source -tree only. If you are using a released version of Kubernetes, you almost -certainly want the docs that go with that version.

- -Documentation for specific releases can be found at -[releases.k8s.io](http://releases.k8s.io). - -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest 1.0.x release of this document can be found +[here](http://releases.k8s.io/release-1.0/docs/design/admission_control_resource_quota.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- diff --git a/architecture.md b/architecture.md index 1591068f..2e4afc62 100644 --- a/architecture.md +++ b/architecture.md @@ -2,20 +2,30 @@ -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) - -

PLEASE NOTE: This document applies to the HEAD of the source -tree only. If you are using a released version of Kubernetes, you almost -certainly want the docs that go with that version.

- -Documentation for specific releases can be found at -[releases.k8s.io](http://releases.k8s.io). - -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest 1.0.x release of this document can be found +[here](http://releases.k8s.io/release-1.0/docs/design/architecture.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- diff --git a/clustering.md b/clustering.md index e5307fd7..8673284f 100644 --- a/clustering.md +++ b/clustering.md @@ -2,20 +2,30 @@ -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) - -

PLEASE NOTE: This document applies to the HEAD of the source -tree only. If you are using a released version of Kubernetes, you almost -certainly want the docs that go with that version.

- -Documentation for specific releases can be found at -[releases.k8s.io](http://releases.k8s.io). - -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest 1.0.x release of this document can be found +[here](http://releases.k8s.io/release-1.0/docs/design/clustering.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- diff --git a/clustering/README.md b/clustering/README.md index cf5a3d50..f05168d6 100644 --- a/clustering/README.md +++ b/clustering/README.md @@ -2,20 +2,30 @@ -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) - -

PLEASE NOTE: This document applies to the HEAD of the source -tree only. If you are using a released version of Kubernetes, you almost -certainly want the docs that go with that version.

- -Documentation for specific releases can be found at -[releases.k8s.io](http://releases.k8s.io). - -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest 1.0.x release of this document can be found +[here](http://releases.k8s.io/release-1.0/docs/design/clustering/README.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- diff --git a/command_execution_port_forwarding.md b/command_execution_port_forwarding.md index 998d1cbd..c7408b58 100644 --- a/command_execution_port_forwarding.md +++ b/command_execution_port_forwarding.md @@ -2,20 +2,30 @@ -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) - -

PLEASE NOTE: This document applies to the HEAD of the source -tree only. If you are using a released version of Kubernetes, you almost -certainly want the docs that go with that version.

- -Documentation for specific releases can be found at -[releases.k8s.io](http://releases.k8s.io). - -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest 1.0.x release of this document can be found +[here](http://releases.k8s.io/release-1.0/docs/design/command_execution_port_forwarding.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- diff --git a/event_compression.md b/event_compression.md index 294d3f41..0b458c8d 100644 --- a/event_compression.md +++ b/event_compression.md @@ -2,20 +2,30 @@ -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) - -

PLEASE NOTE: This document applies to the HEAD of the source -tree only. If you are using a released version of Kubernetes, you almost -certainly want the docs that go with that version.

- -Documentation for specific releases can be found at -[releases.k8s.io](http://releases.k8s.io). - -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest 1.0.x release of this document can be found +[here](http://releases.k8s.io/release-1.0/docs/design/event_compression.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- diff --git a/expansion.md b/expansion.md index f81db3c4..5cc08c6c 100644 --- a/expansion.md +++ b/expansion.md @@ -2,20 +2,30 @@ -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) - -

PLEASE NOTE: This document applies to the HEAD of the source -tree only. If you are using a released version of Kubernetes, you almost -certainly want the docs that go with that version.

- -Documentation for specific releases can be found at -[releases.k8s.io](http://releases.k8s.io). - -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest 1.0.x release of this document can be found +[here](http://releases.k8s.io/release-1.0/docs/design/expansion.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- diff --git a/identifiers.md b/identifiers.md index e66d2d7a..eda7254b 100644 --- a/identifiers.md +++ b/identifiers.md @@ -2,20 +2,30 @@ -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) - -

PLEASE NOTE: This document applies to the HEAD of the source -tree only. If you are using a released version of Kubernetes, you almost -certainly want the docs that go with that version.

- -Documentation for specific releases can be found at -[releases.k8s.io](http://releases.k8s.io). - -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest 1.0.x release of this document can be found +[here](http://releases.k8s.io/release-1.0/docs/design/identifiers.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- diff --git a/namespaces.md b/namespaces.md index 70f5e860..7bd7ab67 100644 --- a/namespaces.md +++ b/namespaces.md @@ -2,20 +2,30 @@ -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) - -

PLEASE NOTE: This document applies to the HEAD of the source -tree only. If you are using a released version of Kubernetes, you almost -certainly want the docs that go with that version.

- -Documentation for specific releases can be found at -[releases.k8s.io](http://releases.k8s.io). - -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest 1.0.x release of this document can be found +[here](http://releases.k8s.io/release-1.0/docs/design/namespaces.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- diff --git a/networking.md b/networking.md index 052ec128..ac6e5794 100644 --- a/networking.md +++ b/networking.md @@ -2,20 +2,30 @@ -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) - -

PLEASE NOTE: This document applies to the HEAD of the source -tree only. If you are using a released version of Kubernetes, you almost -certainly want the docs that go with that version.

- -Documentation for specific releases can be found at -[releases.k8s.io](http://releases.k8s.io). - -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest 1.0.x release of this document can be found +[here](http://releases.k8s.io/release-1.0/docs/design/networking.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- diff --git a/persistent-storage.md b/persistent-storage.md index 1cbed771..f919baa9 100644 --- a/persistent-storage.md +++ b/persistent-storage.md @@ -2,20 +2,30 @@ -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) - -

PLEASE NOTE: This document applies to the HEAD of the source -tree only. If you are using a released version of Kubernetes, you almost -certainly want the docs that go with that version.

- -Documentation for specific releases can be found at -[releases.k8s.io](http://releases.k8s.io). - -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest 1.0.x release of this document can be found +[here](http://releases.k8s.io/release-1.0/docs/design/persistent-storage.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- diff --git a/principles.md b/principles.md index 212f04bd..1ae3bc3a 100644 --- a/principles.md +++ b/principles.md @@ -2,20 +2,30 @@ -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) - -

PLEASE NOTE: This document applies to the HEAD of the source -tree only. If you are using a released version of Kubernetes, you almost -certainly want the docs that go with that version.

- -Documentation for specific releases can be found at -[releases.k8s.io](http://releases.k8s.io). - -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest 1.0.x release of this document can be found +[here](http://releases.k8s.io/release-1.0/docs/design/principles.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- diff --git a/resources.md b/resources.md index 4172cdb4..0457eb44 100644 --- a/resources.md +++ b/resources.md @@ -2,20 +2,30 @@ -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) - -

PLEASE NOTE: This document applies to the HEAD of the source -tree only. If you are using a released version of Kubernetes, you almost -certainly want the docs that go with that version.

- -Documentation for specific releases can be found at -[releases.k8s.io](http://releases.k8s.io). - -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest 1.0.x release of this document can be found +[here](http://releases.k8s.io/release-1.0/docs/design/resources.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- diff --git a/secrets.md b/secrets.md index b4bc8385..8aab1088 100644 --- a/secrets.md +++ b/secrets.md @@ -2,20 +2,30 @@ -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) - -

PLEASE NOTE: This document applies to the HEAD of the source -tree only. If you are using a released version of Kubernetes, you almost -certainly want the docs that go with that version.

- -Documentation for specific releases can be found at -[releases.k8s.io](http://releases.k8s.io). - -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest 1.0.x release of this document can be found +[here](http://releases.k8s.io/release-1.0/docs/design/secrets.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- diff --git a/security.md b/security.md index e2ab4fb7..2989148b 100644 --- a/security.md +++ b/security.md @@ -2,20 +2,30 @@ -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) - -

PLEASE NOTE: This document applies to the HEAD of the source -tree only. If you are using a released version of Kubernetes, you almost -certainly want the docs that go with that version.

- -Documentation for specific releases can be found at -[releases.k8s.io](http://releases.k8s.io). - -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest 1.0.x release of this document can be found +[here](http://releases.k8s.io/release-1.0/docs/design/security.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- diff --git a/security_context.md b/security_context.md index 6b0601e6..6940aae2 100644 --- a/security_context.md +++ b/security_context.md @@ -2,20 +2,30 @@ -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) - -

PLEASE NOTE: This document applies to the HEAD of the source -tree only. If you are using a released version of Kubernetes, you almost -certainly want the docs that go with that version.

- -Documentation for specific releases can be found at -[releases.k8s.io](http://releases.k8s.io). - -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest 1.0.x release of this document can be found +[here](http://releases.k8s.io/release-1.0/docs/design/security_context.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- diff --git a/service_accounts.md b/service_accounts.md index ddb127f2..c53b4633 100644 --- a/service_accounts.md +++ b/service_accounts.md @@ -2,20 +2,30 @@ -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) - -

PLEASE NOTE: This document applies to the HEAD of the source -tree only. If you are using a released version of Kubernetes, you almost -certainly want the docs that go with that version.

- -Documentation for specific releases can be found at -[releases.k8s.io](http://releases.k8s.io). - -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest 1.0.x release of this document can be found +[here](http://releases.k8s.io/release-1.0/docs/design/service_accounts.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- diff --git a/simple-rolling-update.md b/simple-rolling-update.md index b74264d6..b142c6e5 100644 --- a/simple-rolling-update.md +++ b/simple-rolling-update.md @@ -2,20 +2,30 @@ -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) - -

PLEASE NOTE: This document applies to the HEAD of the source -tree only. If you are using a released version of Kubernetes, you almost -certainly want the docs that go with that version.

- -Documentation for specific releases can be found at -[releases.k8s.io](http://releases.k8s.io). - -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest 1.0.x release of this document can be found +[here](http://releases.k8s.io/release-1.0/docs/design/simple-rolling-update.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- diff --git a/versioning.md b/versioning.md index 85e3f56f..3f9bf614 100644 --- a/versioning.md +++ b/versioning.md @@ -2,20 +2,30 @@ -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) - -

PLEASE NOTE: This document applies to the HEAD of the source -tree only. If you are using a released version of Kubernetes, you almost -certainly want the docs that go with that version.

- -Documentation for specific releases can be found at -[releases.k8s.io](http://releases.k8s.io). - -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest 1.0.x release of this document can be found +[here](http://releases.k8s.io/release-1.0/docs/design/versioning.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- -- cgit v1.2.3 From 5ecd6ef5041931b9d768dd2deeb4ffaf48a45ce1 Mon Sep 17 00:00:00 2001 From: Tim Hockin Date: Thu, 16 Jul 2015 10:02:26 -0700 Subject: Better scary message --- autoscaling.md | 38 ++++++++++++++++++++++++-------------- federation.md | 38 ++++++++++++++++++++++++-------------- high-availability.md | 38 ++++++++++++++++++++++++-------------- 3 files changed, 72 insertions(+), 42 deletions(-) diff --git a/autoscaling.md b/autoscaling.md index 15071645..ebc49905 100644 --- a/autoscaling.md +++ b/autoscaling.md @@ -2,20 +2,30 @@ -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) - -

PLEASE NOTE: This document applies to the HEAD of the source -tree only. If you are using a released version of Kubernetes, you almost -certainly want the docs that go with that version.

- -Documentation for specific releases can be found at -[releases.k8s.io](http://releases.k8s.io). - -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest 1.0.x release of this document can be found +[here](http://releases.k8s.io/release-1.0/docs/proposals/autoscaling.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- diff --git a/federation.md b/federation.md index a573050f..713db4b3 100644 --- a/federation.md +++ b/federation.md @@ -2,20 +2,30 @@ -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) - -

PLEASE NOTE: This document applies to the HEAD of the source -tree only. If you are using a released version of Kubernetes, you almost -certainly want the docs that go with that version.

- -Documentation for specific releases can be found at -[releases.k8s.io](http://releases.k8s.io). - -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest 1.0.x release of this document can be found +[here](http://releases.k8s.io/release-1.0/docs/proposals/federation.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- diff --git a/high-availability.md b/high-availability.md index b61148f9..fd6bef7b 100644 --- a/high-availability.md +++ b/high-availability.md @@ -2,20 +2,30 @@ -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) - -

PLEASE NOTE: This document applies to the HEAD of the source -tree only. If you are using a released version of Kubernetes, you almost -certainly want the docs that go with that version.

- -Documentation for specific releases can be found at -[releases.k8s.io](http://releases.k8s.io). - -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) -![WARNING](http://kubernetes.io/img/warning.png) +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest 1.0.x release of this document can be found +[here](http://releases.k8s.io/release-1.0/docs/proposals/high-availability.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- -- cgit v1.2.3 From c4d505d98dd2d215e0b52d8d0332bce5d9be11e0 Mon Sep 17 00:00:00 2001 From: David Oppenheimer Date: Fri, 17 Jul 2015 10:12:08 -0700 Subject: Various minor edits/clarifications to docs/admin/ docs. Deleted docs/admin/namespaces.md as it was content-free and the topic is already covered well in docs/user-guide/namespaces.md --- api-conventions.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/api-conventions.md b/api-conventions.md index 50e1e1d2..9f362097 100644 --- a/api-conventions.md +++ b/api-conventions.md @@ -138,7 +138,7 @@ These fields are required for proper decoding of the object. They may be populat Every object kind MUST have the following metadata in a nested object field called "metadata": -* namespace: a namespace is a DNS compatible subdomain that objects are subdivided into. The default namespace is 'default'. See [docs/admin/namespaces.md](../admin/namespaces.md) for more. +* namespace: a namespace is a DNS compatible subdomain that objects are subdivided into. The default namespace is 'default'. See [docs/user-guide/namespaces.md](../user-guide/namespaces.md) for more. * name: a string that uniquely identifies this object within the current namespace (see [docs/user-guide/identifiers.md](../user-guide/identifiers.md)). This value is used in the path when retrieving an individual object. * uid: a unique in time and space value (typically an RFC 4122 generated identifier, see [docs/user-guide/identifiers.md](../user-guide/identifiers.md)) used to distinguish between objects with the same name that have been deleted and recreated -- cgit v1.2.3 From 35f2829ae014c08b847b59ce06a205cc3fbb8770 Mon Sep 17 00:00:00 2001 From: Daniel Smith Date: Thu, 16 Jul 2015 19:01:02 -0700 Subject: apply changes --- api-conventions.md | 4 ++++ api_changes.md | 1 + developer-guides/vagrant.md | 5 +++++ development.md | 11 +++++++++++ flaky-tests.md | 2 ++ getting-builds.md | 1 + making-release-notes.md | 1 + profiling.md | 6 ++++++ releasing.md | 2 ++ 9 files changed, 33 insertions(+) diff --git a/api-conventions.md b/api-conventions.md index 50e1e1d2..f3c9ed6b 100644 --- a/api-conventions.md +++ b/api-conventions.md @@ -196,12 +196,15 @@ References in the status of the referee to the referrer may be permitted, when t Discussed in [#2004](https://github.com/GoogleCloudPlatform/kubernetes/issues/2004) and elsewhere. There are no maps of subobjects in any API objects. Instead, the convention is to use a list of subobjects containing name fields. For example: + ```yaml ports: - name: www containerPort: 80 ``` + vs. + ```yaml ports: www: @@ -518,6 +521,7 @@ A ```Status``` kind will be returned by the API in two cases: The status object is encoded as JSON and provided as the body of the response. The status object contains fields for humans and machine consumers of the API to get more detailed information for the cause of the failure. The information in the status object supplements, but does not override, the HTTP status code's meaning. When fields in the status object have the same meaning as generally defined HTTP headers and that header is returned with the response, the header should be considered as having higher priority. **Example:** + ``` $ curl -v -k -H "Authorization: Bearer WhCDvq4VPpYhrcfmF6ei7V9qlbqTubUc" https://10.240.122.184:443/api/v1/namespaces/default/pods/grafana diff --git a/api_changes.md b/api_changes.md index c7458d53..edf227cc 100644 --- a/api_changes.md +++ b/api_changes.md @@ -282,6 +282,7 @@ conversion functions when writing your conversion functions. Once all the necessary manually written conversions are added, you need to regenerate auto-generated ones. To regenerate them: - run + ``` $ hack/update-generated-conversions.sh ``` diff --git a/developer-guides/vagrant.md b/developer-guides/vagrant.md index 5b4013e3..2b6fcc42 100644 --- a/developer-guides/vagrant.md +++ b/developer-guides/vagrant.md @@ -83,6 +83,7 @@ vagrant ssh minion-3 ``` To view the service status and/or logs on the kubernetes-master: + ```sh vagrant ssh master [vagrant@kubernetes-master ~] $ sudo systemctl status kube-apiserver @@ -96,6 +97,7 @@ vagrant ssh master ``` To view the services on any of the nodes: + ```sh vagrant ssh minion-1 [vagrant@kubernetes-minion-1] $ sudo systemctl status docker @@ -109,17 +111,20 @@ vagrant ssh minion-1 With your Kubernetes cluster up, you can manage the nodes in your cluster with the regular Vagrant commands. To push updates to new Kubernetes code after making source changes: + ```sh ./cluster/kube-push.sh ``` To stop and then restart the cluster: + ```sh vagrant halt ./cluster/kube-up.sh ``` To destroy the cluster: + ```sh vagrant destroy ``` diff --git a/development.md b/development.md index 1255b7a8..e258f841 100644 --- a/development.md +++ b/development.md @@ -109,6 +109,7 @@ source control system). Use ```apt-get install mercurial``` or ```yum install m directly from mercurial. 2) Create a new GOPATH for your tools and install godep: + ``` export GOPATH=$HOME/go-tools mkdir -p $GOPATH @@ -116,6 +117,7 @@ go get github.com/tools/godep ``` 3) Add $GOPATH/bin to your path. Typically you'd add this to your ~/.profile: + ``` export GOPATH=$HOME/go-tools export PATH=$PATH:$GOPATH/bin @@ -125,6 +127,7 @@ export PATH=$PATH:$GOPATH/bin Here's a quick walkthrough of one way to use godeps to add or update a Kubernetes dependency into Godeps/_workspace. For more details, please see the instructions in [godep's documentation](https://github.com/tools/godep). 1) Devote a directory to this endeavor: + ``` export KPATH=$HOME/code/kubernetes mkdir -p $KPATH/src/github.com/GoogleCloudPlatform/kubernetes @@ -134,6 +137,7 @@ git clone https://path/to/your/fork . ``` 2) Set up your GOPATH. + ``` # Option A: this will let your builds see packages that exist elsewhere on your system. export GOPATH=$KPATH:$GOPATH @@ -143,12 +147,14 @@ export GOPATH=$KPATH ``` 3) Populate your new GOPATH. + ``` cd $KPATH/src/github.com/GoogleCloudPlatform/kubernetes godep restore ``` 4) Next, you can either add a new dependency or update an existing one. + ``` # To add a new dependency, do: cd $KPATH/src/github.com/GoogleCloudPlatform/kubernetes @@ -218,6 +224,7 @@ KUBE_COVER=y hack/test-go.sh At the end of the run, an the HTML report will be generated with the path printed to stdout. To run tests and collect coverage in only one package, pass its relative path under the `kubernetes` directory as an argument, for example: + ``` cd kubernetes KUBE_COVER=y hack/test-go.sh pkg/kubectl @@ -230,6 +237,7 @@ Coverage results for the project can also be viewed on [Coveralls](https://cover ## Integration tests You need an [etcd](https://github.com/coreos/etcd/releases/tag/v2.0.0) in your path, please make sure it is installed and in your ``$PATH``. + ``` cd kubernetes hack/test-integration.sh @@ -238,12 +246,14 @@ hack/test-integration.sh ## End-to-End tests You can run an end-to-end test which will bring up a master and two nodes, perform some tests, and then tear everything down. Make sure you have followed the getting started steps for your chosen cloud platform (which might involve changing the `KUBERNETES_PROVIDER` environment variable to something other than "gce". + ``` cd kubernetes hack/e2e-test.sh ``` Pressing control-C should result in an orderly shutdown but if something goes wrong and you still have some VMs running you can force a cleanup with this command: + ``` go run hack/e2e.go --down ``` @@ -281,6 +291,7 @@ hack/ginkgo-e2e.sh --ginkgo.focus=Pods.*env ``` ### Combining flags + ```sh # Flags can be combined, and their actions will take place in this order: # -build, -push|-up|-pushup, -test|-tests=..., -down diff --git a/flaky-tests.md b/flaky-tests.md index 52ba45a2..0fbf643c 100644 --- a/flaky-tests.md +++ b/flaky-tests.md @@ -42,6 +42,7 @@ _Note: these instructions are mildly hacky for now, as we get run once semantics There is a testing image ```brendanburns/flake``` up on the docker hub. We will use this image to test our fix. Create a replication controller with the following config: + ```yaml apiVersion: v1 kind: ReplicationController @@ -63,6 +64,7 @@ spec: - name: REPO_SPEC value: https://github.com/GoogleCloudPlatform/kubernetes ``` + Note that we omit the labels and the selector fields of the replication controller, because they will be populated from the labels field of the pod template by default. ``` diff --git a/getting-builds.md b/getting-builds.md index e41c4fbf..f59a753b 100644 --- a/getting-builds.md +++ b/getting-builds.md @@ -45,6 +45,7 @@ usage: ``` You can also use the gsutil tool to explore the Google Cloud Storage release bucket. Here are some examples: + ``` gsutil cat gs://kubernetes-release/ci/latest.txt # output the latest ci version number gsutil cat gs://kubernetes-release/ci/latest-green.txt # output the latest ci version number that passed gce e2e diff --git a/making-release-notes.md b/making-release-notes.md index b362d857..343b9203 100644 --- a/making-release-notes.md +++ b/making-release-notes.md @@ -40,6 +40,7 @@ _TODO_: Figure out a way to record this somewhere to save the next release engin Find the most-recent PR that was merged with the current .0 release. Remeber this as $CURRENTPR. ### 2) Run the release-notes tool + ```bash ${KUBERNETES_ROOT}/build/make-release-notes.sh $LASTPR $CURRENTPR ``` diff --git a/profiling.md b/profiling.md index 215f0c41..fbb54c9f 100644 --- a/profiling.md +++ b/profiling.md @@ -41,24 +41,30 @@ Go comes with inbuilt 'net/http/pprof' profiling library and profiling web servi ## Adding profiling to services to APIserver. TL;DR: Add lines: + ``` m.mux.HandleFunc("/debug/pprof/", pprof.Index) m.mux.HandleFunc("/debug/pprof/profile", pprof.Profile) m.mux.HandleFunc("/debug/pprof/symbol", pprof.Symbol) ``` + to the init(c *Config) method in 'pkg/master/master.go' and import 'net/http/pprof' package. In most use cases to use profiler service it's enough to do 'import _ net/http/pprof', which automatically registers a handler in the default http.Server. Slight inconvenience is that APIserver uses default server for intra-cluster communication, so plugging profiler to it is not really useful. In 'pkg/master/server/server.go' more servers are created and started as separate goroutines. The one that is usually serving external traffic is secureServer. The handler for this traffic is defined in 'pkg/master/master.go' and stored in Handler variable. It is created from HTTP multiplexer, so the only thing that needs to be done is adding profiler handler functions to this multiplexer. This is exactly what lines after TL;DR do. ## Connecting to the profiler Even when running profiler I found not really straightforward to use 'go tool pprof' with it. The problem is that at least for dev purposes certificates generated for APIserver are not signed by anyone trusted and because secureServer serves only secure traffic it isn't straightforward to connect to the service. The best workaround I found is by creating an ssh tunnel from the kubernetes_master open unsecured port to some external server, and use this server as a proxy. To save everyone looking for correct ssh flags, it is done by running: + ``` ssh kubernetes_master -L:localhost:8080 ``` + or analogous one for you Cloud provider. Afterwards you can e.g. run + ``` go tool pprof http://localhost:/debug/pprof/profile ``` + to get 30 sec. CPU profile. ## Contention profiling diff --git a/releasing.md b/releasing.md index 8469fc40..8b1a661c 100644 --- a/releasing.md +++ b/releasing.md @@ -78,9 +78,11 @@ and you're trying to cut a release, don't hesitate to contact the GKE oncall. Before proceeding to the next step: + ``` export BRANCHPOINT=v0.20.2-322-g974377b ``` + Where `v0.20.2-322-g974377b` is the git hash you decided on. This will become our (retroactive) branch point. -- cgit v1.2.3 From 60cec0f5fa87f28f2a7f1357817d06db433b1e75 Mon Sep 17 00:00:00 2001 From: Daniel Smith Date: Thu, 16 Jul 2015 19:01:02 -0700 Subject: apply changes --- admission_control_limit_range.md | 2 +- admission_control_resource_quota.md | 2 +- event_compression.md | 1 + resources.md | 7 +++++++ security_context.md | 1 + service_accounts.md | 1 + 6 files changed, 12 insertions(+), 2 deletions(-) diff --git a/admission_control_limit_range.md b/admission_control_limit_range.md index 824d4a35..90329815 100644 --- a/admission_control_limit_range.md +++ b/admission_control_limit_range.md @@ -164,7 +164,7 @@ It is expected we will want to define limits for particular pods or containers b To make a **LimitRangeItem** more restrictive, we will intend to add these additional restrictions at a future point in time. ## Example -See the [example of Limit Range](../user-guide/limitrange) for more information. +See the [example of Limit Range](../user-guide/limitrange/) for more information. diff --git a/admission_control_resource_quota.md b/admission_control_resource_quota.md index e262eb2d..d5cdc9a1 100644 --- a/admission_control_resource_quota.md +++ b/admission_control_resource_quota.md @@ -185,7 +185,7 @@ services 3 5 ``` ## More information -See [resource quota document](../admin/resource-quota.md) and the [example of Resource Quota](../user-guide/resourcequota) for more information. +See [resource quota document](../admin/resource-quota.md) and the [example of Resource Quota](../user-guide/resourcequota/) for more information. diff --git a/event_compression.md b/event_compression.md index 0b458c8d..af823972 100644 --- a/event_compression.md +++ b/event_compression.md @@ -84,6 +84,7 @@ Each binary that generates events: ## Example Sample kubectl output + ``` FIRSTSEEN LASTSEEN COUNT NAME KIND SUBOBJECT REASON SOURCE MESSAGE Thu, 12 Feb 2015 01:13:02 +0000 Thu, 12 Feb 2015 01:13:02 +0000 1 kubernetes-minion-4.c.saad-dev-vms.internal Minion starting {kubelet kubernetes-minion-4.c.saad-dev-vms.internal} Starting kubelet. diff --git a/resources.md b/resources.md index 0457eb44..2effb5cf 100644 --- a/resources.md +++ b/resources.md @@ -87,23 +87,27 @@ Internally (i.e., everywhere else), Kubernetes will represent resource quantitie Both users and a number of system components, such as schedulers, (horizontal) auto-scalers, (vertical) auto-sizers, load balancers, and worker-pool managers need to reason about resource requirements of workloads, resource capacities of nodes, and resource usage. Kubernetes divides specifications of *desired state*, aka the Spec, and representations of *current state*, aka the Status. Resource requirements and total node capacity fall into the specification category, while resource usage, characterizations derived from usage (e.g., maximum usage, histograms), and other resource demand signals (e.g., CPU load) clearly fall into the status category and are discussed in the Appendix for now. Resource requirements for a container or pod should have the following form: + ``` resourceRequirementSpec: [ request: [ cpu: 2.5, memory: "40Mi" ], limit: [ cpu: 4.0, memory: "99Mi" ], ] ``` + Where: * _request_ [optional]: the amount of resources being requested, or that were requested and have been allocated. Scheduler algorithms will use these quantities to test feasibility (whether a pod will fit onto a node). If a container (or pod) tries to use more resources than its _request_, any associated SLOs are voided — e.g., the program it is running may be throttled (compressible resource types), or the attempt may be denied. If _request_ is omitted for a container, it defaults to _limit_ if that is explicitly specified, otherwise to an implementation-defined value; this will always be 0 for a user-defined resource type. If _request_ is omitted for a pod, it defaults to the sum of the (explicit or implicit) _request_ values for the containers it encloses. * _limit_ [optional]: an upper bound or cap on the maximum amount of resources that will be made available to a container or pod; if a container or pod uses more resources than its _limit_, it may be terminated. The _limit_ defaults to "unbounded"; in practice, this probably means the capacity of an enclosing container, pod, or node, but may result in non-deterministic behavior, especially for memory. Total capacity for a node should have a similar structure: + ``` resourceCapacitySpec: [ total: [ cpu: 12, memory: "128Gi" ] ] ``` + Where: * _total_: the total allocatable resources of a node. Initially, the resources at a given scope will bound the resources of the sum of inner scopes. @@ -149,6 +153,7 @@ rather than decimal ones: "64MiB" rather than "64MB". ## Resource metadata A resource type may have an associated read-only ResourceType structure, that contains metadata about the type. For example: + ``` resourceTypes: [ "kubernetes.io/memory": [ @@ -194,6 +199,7 @@ resourceStatus: [ ``` where a `` or `` structure looks like this: + ``` { mean: # arithmetic mean @@ -209,6 +215,7 @@ where a `` or `` structure looks like this: ] } ``` + All parts of this structure are optional, although we strongly encourage including quantities for 50, 90, 95, 99, 99.5, and 99.9 percentiles. _[In practice, it will be important to include additional info such as the length of the time window over which the averages are calculated, the confidence level, and information-quality metrics such as the number of dropped or discarded data points.]_ and predicted diff --git a/security_context.md b/security_context.md index 6940aae2..bc76495a 100644 --- a/security_context.md +++ b/security_context.md @@ -179,6 +179,7 @@ type SELinuxOptions struct { Level string } ``` + ### Admission It is up to an admission plugin to determine if the security context is acceptable or not. At the diff --git a/service_accounts.md b/service_accounts.md index c53b4633..c6acbd24 100644 --- a/service_accounts.md +++ b/service_accounts.md @@ -61,6 +61,7 @@ A service account binds together several things: ## Design Discussion A new object Kind is added: + ```go type ServiceAccount struct { TypeMeta `json:",inline" yaml:",inline"` -- cgit v1.2.3 From e1a268be8375b68f4b4a1d546d2538fcdaa33da1 Mon Sep 17 00:00:00 2001 From: Tim Hockin Date: Fri, 17 Jul 2015 09:20:19 -0700 Subject: Make TOC munge include blank line before TOC --- api-conventions.md | 1 + 1 file changed, 1 insertion(+) diff --git a/api-conventions.md b/api-conventions.md index 323cde41..271efed4 100644 --- a/api-conventions.md +++ b/api-conventions.md @@ -41,6 +41,7 @@ using resources with kubectl can be found in (working_with_resources.md).* **Table of Contents** + - [Types (Kinds)](#types-kinds) - [Resources](#resources) - [Objects](#objects) -- cgit v1.2.3 From da3e5f056b57f17ae5234085a00e792adaa02d57 Mon Sep 17 00:00:00 2001 From: Tim Hockin Date: Fri, 17 Jul 2015 15:35:41 -0700 Subject: Run gendocs --- README.md | 1 + api-conventions.md | 1 + api_changes.md | 2 ++ cherry-picks.md | 1 + cli-roadmap.md | 1 + client-libraries.md | 3 +++ collab.md | 1 + developer-guides/vagrant.md | 3 +++ development.md | 8 ++++++++ faster_reviews.md | 1 + flaky-tests.md | 2 ++ getting-builds.md | 1 + making-release-notes.md | 6 ++++++ profiling.md | 2 ++ releasing.md | 2 ++ scheduler_algorithm.md | 2 ++ writing-a-getting-started-guide.md | 4 ++++ 17 files changed, 41 insertions(+) diff --git a/README.md b/README.md index a06efc8d..9a73d949 100644 --- a/README.md +++ b/README.md @@ -30,6 +30,7 @@ Documentation for other releases can be found at + # Kubernetes Developer Guide The developer guide is for anyone wanting to either write code which directly accesses the diff --git a/api-conventions.md b/api-conventions.md index 323cde41..7f46d5be 100644 --- a/api-conventions.md +++ b/api-conventions.md @@ -455,6 +455,7 @@ The following HTTP status codes may be returned by the API. * Returned in response to HTTP OPTIONS requests. #### Error codes + * `307 StatusTemporaryRedirect` * Indicates that the address for the requested resource has changed. * Suggested client recovery behavior diff --git a/api_changes.md b/api_changes.md index edf227cc..7a0418e8 100644 --- a/api_changes.md +++ b/api_changes.md @@ -30,6 +30,7 @@ Documentation for other releases can be found at + # So you want to change the API? The Kubernetes API has two major components - the internal structures and @@ -365,6 +366,7 @@ $ hack/update-swagger-spec.sh The API spec changes should be in a commit separate from your other changes. ## Incompatible API changes + If your change is going to be backward incompatible or might be a breaking change for API consumers, please send an announcement to `kubernetes-dev@googlegroups.com` before the change gets in. If you are unsure, ask. Also make sure that the change gets documented in diff --git a/cherry-picks.md b/cherry-picks.md index 1d59eaef..7ed63d08 100644 --- a/cherry-picks.md +++ b/cherry-picks.md @@ -30,6 +30,7 @@ Documentation for other releases can be found at + # Overview This document explains cherry picks are managed on release branches within the diff --git a/cli-roadmap.md b/cli-roadmap.md index 45c26827..00b454fa 100644 --- a/cli-roadmap.md +++ b/cli-roadmap.md @@ -30,6 +30,7 @@ Documentation for other releases can be found at + # Kubernetes CLI/Configuration Roadmap See also issues with the following labels: diff --git a/client-libraries.md b/client-libraries.md index ae7cb623..69cba1e6 100644 --- a/client-libraries.md +++ b/client-libraries.md @@ -30,12 +30,15 @@ Documentation for other releases can be found at + ## kubernetes API client libraries ### Supported + * [Go](../../pkg/client/) ### User Contributed + *Note: Libraries provided by outside parties are supported by their authors, not the core Kubernetes team* * [Java (OSGI)](https://bitbucket.org/amdatulabs/amdatu-kubernetes) diff --git a/collab.md b/collab.md index 38b6d586..96db64c8 100644 --- a/collab.md +++ b/collab.md @@ -30,6 +30,7 @@ Documentation for other releases can be found at + # On Collaborative Development Kubernetes is open source, but many of the people working on it do so as their day job. In order to avoid forcing people to be "at work" effectively 24/7, we want to establish some semi-formal protocols around development. Hopefully these rules make things go more smoothly. If you find that this is not the case, please complain loudly. diff --git a/developer-guides/vagrant.md b/developer-guides/vagrant.md index 2b6fcc42..e704bf3b 100644 --- a/developer-guides/vagrant.md +++ b/developer-guides/vagrant.md @@ -30,11 +30,13 @@ Documentation for other releases can be found at + ## Getting started with Vagrant Running kubernetes with Vagrant (and VirtualBox) is an easy way to run/test/develop on your local machine (Linux, Mac OS X). ### Prerequisites + 1. Install latest version >= 1.6.2 of vagrant from http://www.vagrantup.com/downloads.html 2. Install one of: 1. The latest version of Virtual Box from https://www.virtualbox.org/wiki/Downloads @@ -371,6 +373,7 @@ export KUBERNETES_MINION_MEMORY=2048 ``` #### I ran vagrant suspend and nothing works! + ```vagrant suspend``` seems to mess up the network. It's not supported at this time. diff --git a/development.md b/development.md index e258f841..6822ab5e 100644 --- a/development.md +++ b/development.md @@ -30,6 +30,7 @@ Documentation for other releases can be found at + # Development Guide # Releases and Official Builds @@ -45,6 +46,7 @@ Kubernetes is written in [Go](http://golang.org) programming language. If you ha Below, we outline one of the more common git workflows that core developers use. Other git workflows are also valid. ### Visual overview + ![Git workflow](git_workflow.png) ### Fork the main repository @@ -93,6 +95,7 @@ $ git push -f origin myfeature ``` ### Creating a pull request + 1. Visit http://github.com/$YOUR_GITHUB_USERNAME/kubernetes 2. Click the "Compare and pull request" button next to your "myfeature" branch. @@ -102,6 +105,7 @@ $ git push -f origin myfeature Kubernetes uses [godep](https://github.com/tools/godep) to manage dependencies. It is not strictly required for building Kubernetes but it is required when managing dependencies under the Godeps/ tree, and is required by a number of the build and test scripts. Please make sure that ``godep`` is installed and in your ``$PATH``. ### Installing godep + There are many ways to build and host go binaries. Here is an easy way to get utilities like ```godep``` installed: 1) Ensure that [mercurial](http://mercurial.selenic.com/wiki/Download) is installed on your system. (some of godep's dependencies use the mercurial @@ -124,6 +128,7 @@ export PATH=$PATH:$GOPATH/bin ``` ### Using godep + Here's a quick walkthrough of one way to use godeps to add or update a Kubernetes dependency into Godeps/_workspace. For more details, please see the instructions in [godep's documentation](https://github.com/tools/godep). 1) Devote a directory to this endeavor: @@ -259,6 +264,7 @@ go run hack/e2e.go --down ``` ### Flag options + See the flag definitions in `hack/e2e.go` for more options, such as reusing an existing cluster, here is an overview: ```sh @@ -309,6 +315,7 @@ go run hack/e2e.go -v -ctl='delete pod foobar' ``` ## Conformance testing + End-to-end testing, as described above, is for [development distributions](writing-a-getting-started-guide.md). A conformance test is used on a [versioned distro](writing-a-getting-started-guide.md). @@ -320,6 +327,7 @@ intended to run against a cluster at a specific binary release of Kubernetes. See [conformance-test.sh](../../hack/conformance-test.sh). ## Testing out flaky tests + [Instructions here](flaky-tests.md) ## Regenerating the CLI documentation diff --git a/faster_reviews.md b/faster_reviews.md index 20e3e990..d28e9b55 100644 --- a/faster_reviews.md +++ b/faster_reviews.md @@ -30,6 +30,7 @@ Documentation for other releases can be found at + # How to get faster PR reviews Most of what is written here is not at all specific to Kubernetes, but it bears diff --git a/flaky-tests.md b/flaky-tests.md index 0fbf643c..1e7f5fcb 100644 --- a/flaky-tests.md +++ b/flaky-tests.md @@ -30,7 +30,9 @@ Documentation for other releases can be found at + # Hunting flaky tests in Kubernetes + Sometimes unit tests are flaky. This means that due to (usually) race conditions, they will occasionally fail, even though most of the time they pass. We have a goal of 99.9% flake free tests. This means that there is only one flake in one thousand runs of a test. diff --git a/getting-builds.md b/getting-builds.md index f59a753b..4c92a446 100644 --- a/getting-builds.md +++ b/getting-builds.md @@ -30,6 +30,7 @@ Documentation for other releases can be found at + # Getting Kubernetes Builds You can use [hack/get-build.sh](../../hack/get-build.sh) to or use as a reference on how to get the most recent builds with curl. With `get-build.sh` you can grab the most recent stable build, the most recent release candidate, or the most recent build to pass our ci and gce e2e tests (essentially a nightly build). diff --git a/making-release-notes.md b/making-release-notes.md index 343b9203..d76f7415 100644 --- a/making-release-notes.md +++ b/making-release-notes.md @@ -30,10 +30,13 @@ Documentation for other releases can be found at + ## Making release notes + This documents the process for making release notes for a release. ### 1) Note the PR number of the previous release + Find the most-recent PR that was merged with the previous .0 release. Remember this as $LASTPR. _TODO_: Figure out a way to record this somewhere to save the next release engineer time. @@ -46,6 +49,7 @@ ${KUBERNETES_ROOT}/build/make-release-notes.sh $LASTPR $CURRENTPR ``` ### 3) Trim the release notes + This generates a list of the entire set of PRs merged since the last minor release. It is likely long and many PRs aren't worth mentioning. If any of the PRs were cherrypicked into patches on the last minor release, you should exclude @@ -57,9 +61,11 @@ Remove, regroup, organize to your hearts content. ### 4) Update CHANGELOG.md + With the final markdown all set, cut and paste it to the top of ```CHANGELOG.md``` ### 5) Update the Release page + * Switch to the [releases](https://github.com/GoogleCloudPlatform/kubernetes/releases) page. * Open up the release you are working on. * Cut and paste the final markdown from above into the release notes diff --git a/profiling.md b/profiling.md index fbb54c9f..d36885dd 100644 --- a/profiling.md +++ b/profiling.md @@ -30,6 +30,7 @@ Documentation for other releases can be found at + # Profiling Kubernetes This document explain how to plug in profiler and how to profile Kubernetes services. @@ -53,6 +54,7 @@ to the init(c *Config) method in 'pkg/master/master.go' and import 'net/http/ppr In most use cases to use profiler service it's enough to do 'import _ net/http/pprof', which automatically registers a handler in the default http.Server. Slight inconvenience is that APIserver uses default server for intra-cluster communication, so plugging profiler to it is not really useful. In 'pkg/master/server/server.go' more servers are created and started as separate goroutines. The one that is usually serving external traffic is secureServer. The handler for this traffic is defined in 'pkg/master/master.go' and stored in Handler variable. It is created from HTTP multiplexer, so the only thing that needs to be done is adding profiler handler functions to this multiplexer. This is exactly what lines after TL;DR do. ## Connecting to the profiler + Even when running profiler I found not really straightforward to use 'go tool pprof' with it. The problem is that at least for dev purposes certificates generated for APIserver are not signed by anyone trusted and because secureServer serves only secure traffic it isn't straightforward to connect to the service. The best workaround I found is by creating an ssh tunnel from the kubernetes_master open unsecured port to some external server, and use this server as a proxy. To save everyone looking for correct ssh flags, it is done by running: ``` diff --git a/releasing.md b/releasing.md index 8b1a661c..65db081d 100644 --- a/releasing.md +++ b/releasing.md @@ -30,6 +30,7 @@ Documentation for other releases can be found at + # Releasing Kubernetes This document explains how to cut a release, and the theory behind it. If you @@ -87,6 +88,7 @@ Where `v0.20.2-322-g974377b` is the git hash you decided on. This will become our (retroactive) branch point. #### Branching, Tagging and Merging + Do the following: 1. `export VER=x.y` (e.g. `0.20` for v0.20) diff --git a/scheduler_algorithm.md b/scheduler_algorithm.md index 791de7c4..e73e4f27 100644 --- a/scheduler_algorithm.md +++ b/scheduler_algorithm.md @@ -30,11 +30,13 @@ Documentation for other releases can be found at + # Scheduler Algorithm in Kubernetes For each unscheduled Pod, the Kubernetes scheduler tries to find a node across the cluster according to a set of rules. A general introduction to the Kubernetes scheduler can be found at [scheduler.md](scheduler.md). In this document, the algorithm of how to select a node for the Pod is explained. There are two steps before a destination node of a Pod is chosen. The first step is filtering all the nodes and the second is ranking the remaining nodes to find a best fit for the Pod. ## Filtering the nodes + The purpose of filtering the nodes is to filter out the nodes that do not meet certain requirements of the Pod. For example, if the free resource on a node (measured by the capacity minus the sum of the resource limits of all the Pods that already run on the node) is less than the Pod's required resource, the node should not be considered in the ranking phase so it is filtered out. Currently, there are several "predicates" implementing different filtering policies, including: - `NoDiskConflict`: Evaluate if a pod can fit due to the volumes it requests, and those that are already mounted. diff --git a/writing-a-getting-started-guide.md b/writing-a-getting-started-guide.md index 3e67b632..c22d9204 100644 --- a/writing-a-getting-started-guide.md +++ b/writing-a-getting-started-guide.md @@ -32,6 +32,7 @@ Documentation for other releases can be found at # Writing a Getting Started Guide + This page gives some advice for anyone planning to write or update a Getting Started Guide for Kubernetes. It also gives some guidelines which reviewers should follow when reviewing a pull request for a guide. @@ -57,6 +58,7 @@ Distros fall into two categories: There are different guidelines for each. ## Versioned Distro Guidelines + These guidelines say *what* to do. See the Rationale section for *why*. - Send us a PR. - Put the instructions in `docs/getting-started-guides/...`. Scripts go there too. This helps devs easily @@ -77,6 +79,7 @@ we still want to hear from you. We suggest you write a blog post or a Gist, and Just file an issue or chat us on IRC and one of the committers will link to it from the wiki. ## Development Distro Guidelines + These guidelines say *what* to do. See the Rationale section for *why*. - the main reason to add a new development distro is to support a new IaaS provider (VM and network management). This means implementing a new `pkg/cloudprovider/$IAAS_NAME`. @@ -93,6 +96,7 @@ These guidelines say *what* to do. See the Rationale section for *why*. refactoring and feature additions that affect code for their IaaS. ## Rationale + - We want people to create Kubernetes clusters with whatever IaaS, Node OS, configuration management tools, and so on, which they are familiar with. The guidelines for **versioned distros** are designed for flexibility. -- cgit v1.2.3 From fabd20afce30e947425346fa2938ad0edfa8b867 Mon Sep 17 00:00:00 2001 From: Tim Hockin Date: Fri, 17 Jul 2015 15:35:41 -0700 Subject: Run gendocs --- README.md | 1 + access.md | 15 ++++++++++++--- admission_control.md | 1 + admission_control_limit_range.md | 2 ++ admission_control_resource_quota.md | 2 ++ architecture.md | 2 ++ clustering.md | 2 ++ clustering/README.md | 1 + command_execution_port_forwarding.md | 3 +++ event_compression.md | 6 ++++++ expansion.md | 1 + identifiers.md | 1 + namespaces.md | 1 + networking.md | 1 + persistent-storage.md | 1 + principles.md | 1 + resources.md | 10 ++++++++++ security.md | 1 + security_context.md | 6 ++++++ service_accounts.md | 6 +++++- simple-rolling-update.md | 9 +++++++++ versioning.md | 1 + 22 files changed, 70 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index b0f3115a..62946cb6 100644 --- a/README.md +++ b/README.md @@ -30,6 +30,7 @@ Documentation for other releases can be found at + # Kubernetes Design Overview Kubernetes is a system for managing containerized applications across multiple hosts, providing basic mechanisms for deployment, maintenance, and scaling of applications. diff --git a/access.md b/access.md index e42d7859..9a0c0d3d 100644 --- a/access.md +++ b/access.md @@ -30,6 +30,7 @@ Documentation for other releases can be found at + # K8s Identity and Access Management Sketch This document suggests a direction for identity and access management in the Kubernetes system. @@ -43,6 +44,7 @@ High level goals are: - Ease integration with existing enterprise and hosted scenarios. ### Actors + Each of these can act as normal users or attackers. - External Users: People who are accessing applications running on K8s (e.g. a web site served by webserver running in a container on K8s), but who do not have K8s API access. - K8s Users : People who access the K8s API (e.g. create K8s API objects like Pods) @@ -51,6 +53,7 @@ Each of these can act as normal users or attackers. - K8s Admin means K8s Cluster Admins and K8s Project Admins taken together. ### Threats + Both intentional attacks and accidental use of privilege are concerns. For both cases it may be useful to think about these categories differently: @@ -81,6 +84,7 @@ K8s Cluster assets: This document is primarily about protecting K8s User assets and K8s cluster assets from other K8s Users and K8s Project and Cluster Admins. ### Usage environments + Cluster in Small organization: - K8s Admins may be the same people as K8s Users. - few K8s Admins. @@ -112,6 +116,7 @@ Pods configs should be largely portable between Org-run and hosted configuration # Design + Related discussion: - https://github.com/GoogleCloudPlatform/kubernetes/issues/442 - https://github.com/GoogleCloudPlatform/kubernetes/issues/443 @@ -125,7 +130,9 @@ K8s distribution should include templates of config, and documentation, for simp Features in this doc are divided into "Initial Feature", and "Improvements". Initial features would be candidates for version 1.00. ## Identity -###userAccount + +### userAccount + K8s will have a `userAccount` API object. - `userAccount` has a UID which is immutable. This is used to associate users with objects and to record actions in audit logs. - `userAccount` has a name which is a string and human readable and unique among userAccounts. It is used to refer to users in Policies, to ensure that the Policies are human readable. It can be changed only when there are no Policy objects or other objects which refer to that name. An email address is a suggested format for this field. @@ -158,7 +165,8 @@ Enterprise Profile: - each service using the API has own `userAccount` too. (e.g. `scheduler`, `repcontroller`) - automated jobs to denormalize the ldap group info into the local system list of users into the K8s userAccount file. -###Unix accounts +### Unix accounts + A `userAccount` is not a Unix user account. The fact that a pod is started by a `userAccount` does not mean that the processes in that pod's containers run as a Unix user with a corresponding name or identity. Initially: @@ -170,7 +178,8 @@ Improvements: - requires docker to integrate user namespace support, and deciding what getpwnam() does for these uids. - any features that help users avoid use of privileged containers (https://github.com/GoogleCloudPlatform/kubernetes/issues/391) -###Namespaces +### Namespaces + K8s will have a have a `namespace` API object. It is similar to a Google Compute Engine `project`. It provides a namespace for objects created by a group of people co-operating together, preventing name collisions with non-cooperating groups. It also serves as a reference point for authorization policies. Namespaces are described in [namespaces.md](namespaces.md). diff --git a/admission_control.md b/admission_control.md index aaa6ed16..c75d5535 100644 --- a/admission_control.md +++ b/admission_control.md @@ -30,6 +30,7 @@ Documentation for other releases can be found at + # Kubernetes Proposal - Admission Control **Related PR:** diff --git a/admission_control_limit_range.md b/admission_control_limit_range.md index 90329815..ccdb44d8 100644 --- a/admission_control_limit_range.md +++ b/admission_control_limit_range.md @@ -30,6 +30,7 @@ Documentation for other releases can be found at + # Admission control plugin: LimitRanger ## Background @@ -164,6 +165,7 @@ It is expected we will want to define limits for particular pods or containers b To make a **LimitRangeItem** more restrictive, we will intend to add these additional restrictions at a future point in time. ## Example + See the [example of Limit Range](../user-guide/limitrange/) for more information. diff --git a/admission_control_resource_quota.md b/admission_control_resource_quota.md index d5cdc9a1..99d5431a 100644 --- a/admission_control_resource_quota.md +++ b/admission_control_resource_quota.md @@ -30,6 +30,7 @@ Documentation for other releases can be found at + # Admission control plugin: ResourceQuota ## Background @@ -185,6 +186,7 @@ services 3 5 ``` ## More information + See [resource quota document](../admin/resource-quota.md) and the [example of Resource Quota](../user-guide/resourcequota/) for more information. diff --git a/architecture.md b/architecture.md index 2e4afc62..f7c55171 100644 --- a/architecture.md +++ b/architecture.md @@ -30,6 +30,7 @@ Documentation for other releases can be found at + # Kubernetes architecture A running Kubernetes cluster contains node agents (kubelet) and master components (APIs, scheduler, etc), on top of a distributed storage solution. This diagram shows our desired eventual state, though we're still working on a few things, like making kubelet itself (all our components, really) run within containers, and making the scheduler 100% pluggable. @@ -45,6 +46,7 @@ The Kubernetes node has the services necessary to run application containers and Each node runs Docker, of course. Docker takes care of the details of downloading images and running containers. ### Kubelet + The **Kubelet** manages [pods](../user-guide/pods.md) and their containers, their images, their volumes, etc. ### Kube-Proxy diff --git a/clustering.md b/clustering.md index 8673284f..1fcb8aa3 100644 --- a/clustering.md +++ b/clustering.md @@ -30,10 +30,12 @@ Documentation for other releases can be found at + # Clustering in Kubernetes ## Overview + The term "clustering" refers to the process of having all members of the kubernetes cluster find and trust each other. There are multiple different ways to achieve clustering with different security and usability profiles. This document attempts to lay out the user experiences for clustering that Kubernetes aims to address. Once a cluster is established, the following is true: diff --git a/clustering/README.md b/clustering/README.md index f05168d6..53649a31 100644 --- a/clustering/README.md +++ b/clustering/README.md @@ -41,6 +41,7 @@ pip install seqdiag Just call `make` to regenerate the diagrams. ## Building with Docker + If you are on a Mac or your pip install is messed up, you can easily build with docker. ``` diff --git a/command_execution_port_forwarding.md b/command_execution_port_forwarding.md index c7408b58..1d319adf 100644 --- a/command_execution_port_forwarding.md +++ b/command_execution_port_forwarding.md @@ -30,6 +30,7 @@ Documentation for other releases can be found at + # Container Command Execution & Port Forwarding in Kubernetes ## Abstract @@ -87,12 +88,14 @@ won't be able to work with this mechanism, unless adapters can be written. ## Process Flow ### Remote Command Execution Flow + 1. The client connects to the Kubernetes Master to initiate a remote command execution request 2. The Master proxies the request to the Kubelet where the container lives 3. The Kubelet executes nsenter + the requested command and streams stdin/stdout/stderr back and forth between the client and the container ### Port Forwarding Flow + 1. The client connects to the Kubernetes Master to initiate a remote command execution request 2. The Master proxies the request to the Kubelet where the container lives diff --git a/event_compression.md b/event_compression.md index af823972..29e65917 100644 --- a/event_compression.md +++ b/event_compression.md @@ -30,6 +30,7 @@ Documentation for other releases can be found at + # Kubernetes Event Compression This document captures the design of event compression. @@ -40,11 +41,13 @@ This document captures the design of event compression. Kubernetes components can get into a state where they generate tons of events which are identical except for the timestamp. For example, when pulling a non-existing image, Kubelet will repeatedly generate ```image_not_existing``` and ```container_is_waiting``` events until upstream components correct the image. When this happens, the spam from the repeated events makes the entire event mechanism useless. It also appears to cause memory pressure in etcd (see [#3853](https://github.com/GoogleCloudPlatform/kubernetes/issues/3853)). ## Proposal + Each binary that generates events (for example, ```kubelet```) should keep track of previously generated events so that it can collapse recurring events into a single event instead of creating a new instance for each new event. Event compression should be best effort (not guaranteed). Meaning, in the worst case, ```n``` identical (minus timestamp) events may still result in ```n``` event entries. ## Design + Instead of a single Timestamp, each event object [contains](../../pkg/api/types.go#L1111) the following fields: * ```FirstTimestamp util.Time``` * The date/time of the first occurrence of the event. @@ -78,11 +81,13 @@ Each binary that generates events: * An entry for the event is also added to the previously generated events cache. ## Issues/Risks + * Compression is not guaranteed, because each component keeps track of event history in memory * An application restart causes event history to be cleared, meaning event history is not preserved across application restarts and compression will not occur across component restarts. * Because an LRU cache is used to keep track of previously generated events, if too many unique events are generated, old events will be evicted from the cache, so events will only be compressed until they age out of the events cache, at which point any new instance of the event will cause a new entry to be created in etcd. ## Example + Sample kubectl output ``` @@ -104,6 +109,7 @@ Thu, 12 Feb 2015 01:13:20 +0000 Thu, 12 Feb 2015 01:13:20 +0000 1 This demonstrates what would have been 20 separate entries (indicating scheduling failure) collapsed/compressed down to 5 entries. ## Related Pull Requests/Issues + * Issue [#4073](https://github.com/GoogleCloudPlatform/kubernetes/issues/4073): Compress duplicate events * PR [#4157](https://github.com/GoogleCloudPlatform/kubernetes/issues/4157): Add "Update Event" to Kubernetes API * PR [#4206](https://github.com/GoogleCloudPlatform/kubernetes/issues/4206): Modify Event struct to allow compressing multiple recurring events in to a single event diff --git a/expansion.md b/expansion.md index 5cc08c6c..096b8a9d 100644 --- a/expansion.md +++ b/expansion.md @@ -30,6 +30,7 @@ Documentation for other releases can be found at + # Variable expansion in pod command, args, and env ## Abstract diff --git a/identifiers.md b/identifiers.md index eda7254b..9e269993 100644 --- a/identifiers.md +++ b/identifiers.md @@ -30,6 +30,7 @@ Documentation for other releases can be found at + # Identifiers and Names in Kubernetes A summarization of the goals and recommendations for identifiers in Kubernetes. Described in [GitHub issue #199](https://github.com/GoogleCloudPlatform/kubernetes/issues/199). diff --git a/namespaces.md b/namespaces.md index 7bd7ab67..1f1a767c 100644 --- a/namespaces.md +++ b/namespaces.md @@ -30,6 +30,7 @@ Documentation for other releases can be found at + # Namespaces ## Abstract diff --git a/networking.md b/networking.md index ac6e5794..d7822d4d 100644 --- a/networking.md +++ b/networking.md @@ -30,6 +30,7 @@ Documentation for other releases can be found at + # Networking There are 4 distinct networking problems to solve: diff --git a/persistent-storage.md b/persistent-storage.md index f919baa9..3e9edd3e 100644 --- a/persistent-storage.md +++ b/persistent-storage.md @@ -30,6 +30,7 @@ Documentation for other releases can be found at + # Persistent Storage This document proposes a model for managing persistent, cluster-scoped storage for applications requiring long lived data. diff --git a/principles.md b/principles.md index 1ae3bc3a..c208fb6b 100644 --- a/principles.md +++ b/principles.md @@ -30,6 +30,7 @@ Documentation for other releases can be found at + # Design Principles Principles to follow when extending Kubernetes. diff --git a/resources.md b/resources.md index 2effb5cf..055c5d86 100644 --- a/resources.md +++ b/resources.md @@ -48,6 +48,7 @@ The resource model aims to be: * precise, to avoid misunderstandings and promote pod portability. ## The resource model + A Kubernetes _resource_ is something that can be requested by, allocated to, or consumed by a pod or container. Examples include memory (RAM), CPU, disk-time, and network bandwidth. Once resources on a node have been allocated to one pod, they should not be allocated to another until that pod is removed or exits. This means that Kubernetes schedulers should ensure that the sum of the resources allocated (requested and granted) to its pods never exceeds the usable capacity of the node. Testing whether a pod will fit on a node is called _feasibility checking_. @@ -124,9 +125,11 @@ Where: ## Kubernetes-defined resource types + The following resource types are predefined ("reserved") by Kubernetes in the `kubernetes.io` namespace, and so cannot be used for user-defined resources. Note that the syntax of all resource types in the resource spec is deliberately similar, but some resource types (e.g., CPU) may receive significantly more support than simply tracking quantities in the schedulers and/or the Kubelet. ### Processor cycles + * Name: `cpu` (or `kubernetes.io/cpu`) * Units: Kubernetes Compute Unit seconds/second (i.e., CPU cores normalized to a canonical "Kubernetes CPU") * Internal representation: milli-KCUs @@ -141,6 +144,7 @@ Note that requesting 2 KCU won't guarantee that precisely 2 physical cores will ### Memory + * Name: `memory` (or `kubernetes.io/memory`) * Units: bytes * Compressible? no (at least initially) @@ -152,6 +156,7 @@ rather than decimal ones: "64MiB" rather than "64MB". ## Resource metadata + A resource type may have an associated read-only ResourceType structure, that contains metadata about the type. For example: ``` @@ -222,16 +227,19 @@ and predicted ## Future resource types ### _[future] Network bandwidth_ + * Name: "network-bandwidth" (or `kubernetes.io/network-bandwidth`) * Units: bytes per second * Compressible? yes ### _[future] Network operations_ + * Name: "network-iops" (or `kubernetes.io/network-iops`) * Units: operations (messages) per second * Compressible? yes ### _[future] Storage space_ + * Name: "storage-space" (or `kubernetes.io/storage-space`) * Units: bytes * Compressible? no @@ -239,6 +247,7 @@ and predicted The amount of secondary storage space available to a container. The main target is local disk drives and SSDs, although this could also be used to qualify remotely-mounted volumes. Specifying whether a resource is a raw disk, an SSD, a disk array, or a file system fronting any of these, is left for future work. ### _[future] Storage time_ + * Name: storage-time (or `kubernetes.io/storage-time`) * Units: seconds per second of disk time * Internal representation: milli-units @@ -247,6 +256,7 @@ The amount of secondary storage space available to a container. The main target This is the amount of time a container spends accessing disk, including actuator and transfer time. A standard disk drive provides 1.0 diskTime seconds per second. ### _[future] Storage operations_ + * Name: "storage-iops" (or `kubernetes.io/storage-iops`) * Units: operations per second * Compressible? yes diff --git a/security.md b/security.md index 2989148b..522ff4ca 100644 --- a/security.md +++ b/security.md @@ -30,6 +30,7 @@ Documentation for other releases can be found at + # Security in Kubernetes Kubernetes should define a reasonable set of security best practices that allows processes to be isolated from each other, from the cluster infrastructure, and which preserves important boundaries between those who manage the cluster, and those who use the cluster. diff --git a/security_context.md b/security_context.md index bc76495a..03213927 100644 --- a/security_context.md +++ b/security_context.md @@ -30,8 +30,11 @@ Documentation for other releases can be found at + # Security Contexts + ## Abstract + A security context is a set of constraints that are applied to a container in order to achieve the following goals (from [security design](security.md)): 1. Ensure a clear isolation between container and the underlying host it runs on @@ -53,11 +56,13 @@ to the container process. Support for user namespaces has recently been [merged](https://github.com/docker/libcontainer/pull/304) into Docker's libcontainer project and should soon surface in Docker itself. It will make it possible to assign a range of unprivileged uids and gids from the host to each container, improving the isolation between host and container and between containers. ### External integration with shared storage + In order to support external integration with shared storage, processes running in a Kubernetes cluster should be able to be uniquely identified by their Unix UID, such that a chain of ownership can be established. Processes in pods will need to have consistent UID/GID/SELinux category labels in order to access shared disks. ## Constraints and Assumptions + * It is out of the scope of this document to prescribe a specific set of constraints to isolate containers from their host. Different use cases need different settings. @@ -96,6 +101,7 @@ be addressed with security contexts: ## Proposed Design ### Overview + A *security context* consists of a set of constraints that determine how a container is secured before getting created and run. A security context resides on the container and represents the runtime parameters that will be used to create and run the container via container APIs. A *security context provider* is passed to the Kubelet so it can have a chance diff --git a/service_accounts.md b/service_accounts.md index c6acbd24..d9535de5 100644 --- a/service_accounts.md +++ b/service_accounts.md @@ -30,7 +30,8 @@ Documentation for other releases can be found at -#Service Accounts + +# Service Accounts ## Motivation @@ -50,6 +51,7 @@ They also may interact with services other than the Kubernetes API, such as: - accessing files in an NFS volume attached to the pod ## Design Overview + A service account binds together several things: - a *name*, understood by users, and perhaps by peripheral systems, for an identity - a *principal* that can be authenticated and [authorized](../admin/authorization.md) @@ -137,6 +139,7 @@ are added to the map of tokens used by the authentication process in the apiserv might have some types that do not do anything on apiserver but just get pushed to the kubelet.) ### Pods + The `PodSpec` is extended to have a `Pods.Spec.ServiceAccountUsername` field. If this is unset, then a default value is chosen. If it is set, then the corresponding value of `Pods.Spec.SecurityContext` is set by the Service Account Finalizer (see below). @@ -144,6 +147,7 @@ Service Account Finalizer (see below). TBD: how policy limits which users can make pods with which service accounts. ### Authorization + Kubernetes API Authorization Policies refer to users. Pods created with a `Pods.Spec.ServiceAccountUsername` typically get a `Secret` which allows them to authenticate to the Kubernetes APIserver as a particular user. So any policy that is desired can be applied to them. diff --git a/simple-rolling-update.md b/simple-rolling-update.md index b142c6e5..80bc6566 100644 --- a/simple-rolling-update.md +++ b/simple-rolling-update.md @@ -30,12 +30,15 @@ Documentation for other releases can be found at + ## Simple rolling update + This is a lightweight design document for simple [rolling update](../user-guide/kubectl/kubectl_rolling-update.md) in ```kubectl```. Complete execution flow can be found [here](#execution-details). See the [example of rolling update](../user-guide/update-demo/) for more information. ### Lightweight rollout + Assume that we have a current replication controller named ```foo``` and it is running image ```image:v1``` ```kubectl rolling-update foo [foo-v2] --image=myimage:v2``` @@ -51,6 +54,7 @@ and the old 'foo' replication controller is deleted. For the purposes of the ro The value of that label is the hash of the complete JSON representation of the```foo-next``` or```foo``` replication controller. The name of this label can be overridden by the user with the ```--deployment-label-key``` flag. #### Recovery + If a rollout fails or is terminated in the middle, it is important that the user be able to resume the roll out. To facilitate recovery in the case of a crash of the updating process itself, we add the following annotations to each replication controller in the ```kubernetes.io/``` annotation namespace: * ```desired-replicas``` The desired number of replicas for this replication controller (either N or zero) @@ -68,6 +72,7 @@ it is assumed that the rollout is nearly completed, and ```foo-next``` is rename ### Aborting a rollout + Abort is assumed to want to reverse a rollout in progress. ```kubectl rolling-update foo [foo-v2] --rollback``` @@ -87,6 +92,7 @@ If the user doesn't specify a ```foo-next``` name, then it is either discovered then ```foo-next``` is synthesized using the pattern ```-``` #### Initialization + * If ```foo``` and ```foo-next``` do not exist: * Exit, and indicate an error to the user, that the specified controller doesn't exist. * If ```foo``` exists, but ```foo-next``` does not: @@ -102,6 +108,7 @@ then ```foo-next``` is synthesized using the pattern ```- 0 @@ -109,11 +116,13 @@ then ```foo-next``` is synthesized using the pattern ```- + # Kubernetes API and Release Versioning Legend: -- cgit v1.2.3 From 665ea7d2cf7acf22aec1ff8eac8609d13fe50768 Mon Sep 17 00:00:00 2001 From: Tim Hockin Date: Fri, 17 Jul 2015 15:35:41 -0700 Subject: Run gendocs --- autoscaling.md | 4 ++++ federation.md | 9 ++++++--- high-availability.md | 7 +++++++ 3 files changed, 17 insertions(+), 3 deletions(-) diff --git a/autoscaling.md b/autoscaling.md index ebc49905..86a9a819 100644 --- a/autoscaling.md +++ b/autoscaling.md @@ -30,7 +30,9 @@ Documentation for other releases can be found at + ## Abstract + Auto-scaling is a data-driven feature that allows users to increase or decrease capacity as needed by controlling the number of pods deployed within the system automatically. @@ -230,6 +232,7 @@ Since an auto-scaler is a durable object it is best represented as a resource. ``` #### Boundary Definitions + The `AutoScaleThreshold` definitions provide the boundaries for the auto-scaler. By defining comparisons that form a range along with positive and negative increments you may define bi-directional scaling. For example the upper bound may be specified as "when requests per second rise above 50 for 30 seconds scale the application up by 1" and a lower bound may @@ -251,6 +254,7 @@ Of note: If the statistics gathering mechanisms can be initialized with a regist potentially piggyback on this registry. ### Multi-target Scaling Policy + If multiple scalable targets satisfy the `TargetSelector` criteria the auto-scaler should be configurable as to which target(s) are scaled. To begin with, if multiple targets are found the auto-scaler will scale the largest target up or down as appropriate. In the future this may be more configurable. diff --git a/federation.md b/federation.md index 713db4b3..8de05a9c 100644 --- a/federation.md +++ b/federation.md @@ -30,12 +30,15 @@ Documentation for other releases can be found at -#Kubernetes Cluster Federation -##(a.k.a. "Ubernetes") + +# Kubernetes Cluster Federation + +## (a.k.a. "Ubernetes") ## Requirements Analysis and Product Proposal ## _by Quinton Hoole ([quinton@google.com](mailto:quinton@google.com))_ + _Initial revision: 2015-03-05_ _Last updated: 2015-03-09_ This doc: [tinyurl.com/ubernetesv2](http://tinyurl.com/ubernetesv2) @@ -417,7 +420,7 @@ TBD: All very hand-wavey still, but some initial thoughts to get the conversatio ![image](federation-high-level-arch.png) -## Ubernetes API +## Ubernetes API This looks a lot like the existing Kubernetes API but is explicitly multi-cluster. diff --git a/high-availability.md b/high-availability.md index fd6bef7b..ecb9966e 100644 --- a/high-availability.md +++ b/high-availability.md @@ -30,10 +30,13 @@ Documentation for other releases can be found at + # High Availability of Scheduling and Controller Components in Kubernetes + This document serves as a proposal for high availability of the scheduler and controller components in kubernetes. This proposal is intended to provide a simple High Availability api for kubernetes components with the potential to extend to services running on kubernetes. Those services would be subject to their own constraints. ## Design Options + For complete reference see [this](https://www.ibm.com/developerworks/community/blogs/RohitShetty/entry/high_availability_cold_warm_hot?lang=en) 1. Hot Standby: In this scenario, data and state are shared between the two components such that an immediate failure in one component causes the standby daemon to take over exactly where the failed component had left off. This would be an ideal solution for kubernetes, however it poses a series of challenges in the case of controllers where component-state is cached locally and not persisted in a transactional way to a storage facility. This would also introduce additional load on the apiserver, which is not desirable. As a result, we are **NOT** planning on this approach at this time. @@ -43,6 +46,7 @@ For complete reference see [this](https://www.ibm.com/developerworks/community/b 3. Active-Active (Load Balanced): Clients can simply load-balance across any number of servers that are currently running. Their general availability can be continuously updated, or published, such that load balancing only occurs across active participants. This aspect of HA is outside of the scope of *this* proposal because there is already a partial implementation in the apiserver. ## Design Discussion Notes on Leader Election + Implementation References: * [zookeeper](http://zookeeper.apache.org/doc/trunk/recipes.html#sc_leaderElection) * [etcd](https://groups.google.com/forum/#!topic/etcd-dev/EbAa4fjypb4) @@ -55,11 +59,13 @@ The first component to request leadership will become the master. All other com The component that becomes master should create a thread to manage the lease. This thread should be created with a channel that the main process can use to release the master lease. The master should release the lease in cases of an unrecoverable error and clean shutdown. Otherwise, this process will renew the lease and sleep, waiting for the next renewal time or notification to release the lease. If there is a failure to renew the lease, this process should force the entire component to exit. Daemon exit is meant to prevent potential split-brain conditions. Daemon restart is implied in this scenario, by either the init system (systemd), or possible watchdog processes. (See Design Discussion Notes) ## Options added to components with HA functionality + Some command line options would be added to components that can do HA: * Lease Duration - How long a component can be master ## Design Discussion Notes + Some components may run numerous threads in order to perform tasks in parallel. Upon losing master status, such components should exit instantly instead of attempting to gracefully shut down such threads. This is to ensure that, in the case there's some propagation delay in informing the threads they should stop, the lame-duck threads won't interfere with the new master. The component should exit with an exit code indicating that the component is not the master. Since all components will be run by systemd or some other monitoring system, this will just result in a restart. There is a short window after a new master acquires the lease, during which data from the old master might be committed. This is because there is currently no way to condition a write on its source being the master. Having the daemons exit shortens this window but does not eliminate it. A proper solution for this problem will be addressed at a later date. The proposed solution is: @@ -75,6 +81,7 @@ There is a short window after a new master acquires the lease, during which data 5. When the API server makes the corresponding write to etcd, it includes it in a transaction that does a compare-and-swap on the "current master" entry (old value == new value == host:port and sequence number from the replica that sent the mutating operation). This basically guarantees that if we elect the new master, all transactions coming from the old master will fail. You can think of this as the master attaching a "precondition" of its belief about who is the latest master. ## Open Questions + * Is there a desire to keep track of all nodes for a specific component type? -- cgit v1.2.3 From 9d1ae2e76424babe7f7975ddb86433a6b93e1812 Mon Sep 17 00:00:00 2001 From: Brian Grant Date: Sat, 18 Jul 2015 00:05:57 +0000 Subject: Gut stale roadmaps. Move useful content elsewhere. --- cli-roadmap.md | 74 +--------------------------------------------------------- 1 file changed, 1 insertion(+), 73 deletions(-) diff --git a/cli-roadmap.md b/cli-roadmap.md index 00b454fa..69084555 100644 --- a/cli-roadmap.md +++ b/cli-roadmap.md @@ -33,83 +33,11 @@ Documentation for other releases can be found at # Kubernetes CLI/Configuration Roadmap -See also issues with the following labels: +See github issues with the following labels: * [area/app-config-deployment](https://github.com/GoogleCloudPlatform/kubernetes/labels/area/app-config-deployment) * [component/CLI](https://github.com/GoogleCloudPlatform/kubernetes/labels/component/CLI) * [component/client](https://github.com/GoogleCloudPlatform/kubernetes/labels/component/client) -1. Create services before other objects, or at least before objects that depend upon them. Namespace-relative DNS mitigates this some, but most users are still using service environment variables. [#1768](https://github.com/GoogleCloudPlatform/kubernetes/issues/1768) -1. Finish rolling update [#1353](https://github.com/GoogleCloudPlatform/kubernetes/issues/1353) - 1. Friendly to auto-scaling [#2863](https://github.com/GoogleCloudPlatform/kubernetes/pull/2863#issuecomment-69701562) - 1. Rollback (make rolling-update reversible, and complete an in-progress rolling update by taking 2 replication controller names rather than always taking a file) - 1. Rollover (replace multiple replication controllers with one, such as to clean up an aborted partial rollout) - 1. Write a ReplicationController generator to derive the new ReplicationController from an old one (e.g., `--image-version=newversion`, which would apply a name suffix, update a label value, and apply an image tag) - 1. Use readiness [#620](https://github.com/GoogleCloudPlatform/kubernetes/issues/620) - 1. Perhaps factor this in a way that it can be shared with [Openshift’s deployment controller](https://github.com/GoogleCloudPlatform/kubernetes/issues/1743) - 1. Rolling update service as a plugin -1. Kind-based filtering on object streams -- only operate on the kinds of objects specified. This would make directory-based kubectl operations much more useful. Users should be able to instantiate the example applications using `kubectl create -f ...` -1. Improved pretty printing of endpoints, such as in the case that there are more than a few endpoints -1. Service address/port lookup command(s) -1. List supported resources -1. Swagger lookups [#3060](https://github.com/GoogleCloudPlatform/kubernetes/issues/3060) -1. --name, --name-suffix applied during creation and updates -1. --labels and opinionated label injection: --app=foo, --tier={fe,cache,be,db}, --uservice=redis, --env={dev,test,prod}, --stage={canary,final}, --track={hourly,daily,weekly}, --release=0.4.3c2. Exact ones TBD. We could allow arbitrary values -- the keys are important. The actual label keys would be (optionally?) namespaced with kubectl.kubernetes.io/, or perhaps the user’s namespace. -1. --annotations and opinionated annotation injection: --description, --revision -1. Imperative updates. We'll want to optionally make these safe(r) by supporting preconditions based on the current value and resourceVersion. - 1. annotation updates similar to label updates - 1. other custom commands for common imperative updates - 1. more user-friendly (but still generic) on-command-line json for patch -1. We also want to support the following flavors of more general updates: - 1. whichever we don’t support: - 1. safe update: update the full resource, guarded by resourceVersion precondition (and perhaps selected value-based preconditions) - 1. forced update: update the full resource, blowing away the previous Spec without preconditions; delete and re-create if necessary - 1. diff/dryrun: Compare new config with current Spec [#6284](https://github.com/GoogleCloudPlatform/kubernetes/issues/6284) - 1. submit/apply/reconcile/ensure/merge: Merge user-provided fields with current Spec. Keep track of user-provided fields using an annotation -- see [#1702](https://github.com/GoogleCloudPlatform/kubernetes/issues/1702). Delete all objects with deployment-specific labels. -1. --dry-run for all commands -1. Support full label selection syntax, including support for namespaces. -1. Wait on conditions [#1899](https://github.com/GoogleCloudPlatform/kubernetes/issues/1899) -1. Make kubectl scriptable: make output and exit code behavior consistent and useful for wrapping in workflows and piping back into kubectl and/or xargs (e.g., dump full URLs?, distinguish permanent and retry-able failure, identify objects that should be retried) - 1. Here's [an example](http://techoverflow.net/blog/2013/10/22/docker-remove-all-images-and-containers/) where multiple objects on the command line and an option to dump object names only (`-q`) would be useful in combination. [#5906](https://github.com/GoogleCloudPlatform/kubernetes/issues/5906) -1. Easy generation of clean configuration files from existing objects (including containers -- podex) -- remove readonly fields, status - 1. Export from one namespace, import into another is an important use case -1. Derive objects from other objects - 1. pod clone - 1. rc from pod - 1. --labels-from (services from pods or rcs) -1. Kind discovery (i.e., operate on objects of all kinds) [#5278](https://github.com/GoogleCloudPlatform/kubernetes/issues/5278) -1. A fairly general-purpose way to specify fields on the command line during creation and update, not just from a config file -1. Extensible API-based generator framework (i.e. invoke generators via an API/URL rather than building them into kubectl), so that complex client libraries don’t need to be rewritten in multiple languages, and so that the abstractions are available through all interfaces: API, CLI, UI, logs, ... [#5280](https://github.com/GoogleCloudPlatform/kubernetes/issues/5280) - 1. Need schema registry, and some way to invoke generator (e.g., using a container) - 1. Convert run command to API-based generator -1. Transformation framework - 1. More intelligent defaulting of fields (e.g., [#2643](https://github.com/GoogleCloudPlatform/kubernetes/issues/2643)) -1. Update preconditions based on the values of arbitrary object fields. -1. Deployment manager compatibility on GCP: [#3685](https://github.com/GoogleCloudPlatform/kubernetes/issues/3685) -1. Describe multiple objects, multiple kinds of objects [#5905](https://github.com/GoogleCloudPlatform/kubernetes/issues/5905) -1. Support yaml document separator [#5840](https://github.com/GoogleCloudPlatform/kubernetes/issues/5840) - -TODO: -* watch -* attach [#1521](https://github.com/GoogleCloudPlatform/kubernetes/issues/1521) -* image/registry commands -* do any other server paths make sense? validate? generic curl functionality? -* template parameterization -* dynamic/runtime configuration - -Server-side support: - -1. Default selectors from labels [#1698](https://github.com/GoogleCloudPlatform/kubernetes/issues/1698#issuecomment-71048278) -1. Stop [#1535](https://github.com/GoogleCloudPlatform/kubernetes/issues/1535) -1. Deleted objects [#2789](https://github.com/GoogleCloudPlatform/kubernetes/issues/2789) -1. Clone [#170](https://github.com/GoogleCloudPlatform/kubernetes/issues/170) -1. Resize [#1629](https://github.com/GoogleCloudPlatform/kubernetes/issues/1629) -1. Useful /operations API: wait for finalization/reification -1. List supported resources [#2057](https://github.com/GoogleCloudPlatform/kubernetes/issues/2057) -1. Reverse label lookup [#1348](https://github.com/GoogleCloudPlatform/kubernetes/issues/1348) -1. Field selection [#1362](https://github.com/GoogleCloudPlatform/kubernetes/issues/1362) -1. Field filtering [#1459](https://github.com/GoogleCloudPlatform/kubernetes/issues/1459) -1. Operate on uids - [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/cli-roadmap.md?pixel)]() -- cgit v1.2.3 From 33ff550b17290853b10e8106492b05d184c3b98e Mon Sep 17 00:00:00 2001 From: Alex Robinson Date: Sun, 19 Jul 2015 08:46:02 +0000 Subject: Improve design docs syntax highlighting. --- admission_control_limit_range.md | 4 ++-- admission_control_resource_quota.md | 4 ++-- clustering/README.md | 4 ++-- event_compression.md | 2 +- namespaces.md | 14 +++++++------- networking.md | 2 +- persistent-storage.md | 38 ++++++++++++++----------------------- resources.md | 17 +++++++++-------- simple-rolling-update.md | 2 +- 9 files changed, 39 insertions(+), 48 deletions(-) diff --git a/admission_control_limit_range.md b/admission_control_limit_range.md index ccdb44d8..48a7880f 100644 --- a/admission_control_limit_range.md +++ b/admission_control_limit_range.md @@ -128,7 +128,7 @@ The server is updated to be aware of **LimitRange** objects. The constraints are only enforced if the kube-apiserver is started as follows: -``` +```console $ kube-apiserver -admission_control=LimitRanger ``` @@ -140,7 +140,7 @@ kubectl is modified to support the **LimitRange** resource. For example, -```shell +```console $ kubectl namespace myspace $ kubectl create -f docs/user-guide/limitrange/limits.yaml $ kubectl get limits diff --git a/admission_control_resource_quota.md b/admission_control_resource_quota.md index 99d5431a..a3781d64 100644 --- a/admission_control_resource_quota.md +++ b/admission_control_resource_quota.md @@ -140,7 +140,7 @@ The server is updated to be aware of **ResourceQuota** objects. The quota is only enforced if the kube-apiserver is started as follows: -``` +```console $ kube-apiserver -admission_control=ResourceQuota ``` @@ -167,7 +167,7 @@ kubectl is modified to support the **ResourceQuota** resource. For example, -``` +```console $ kubectl namespace myspace $ kubectl create -f docs/user-guide/resourcequota/quota.yaml $ kubectl get quota diff --git a/clustering/README.md b/clustering/README.md index 53649a31..d02b7d50 100644 --- a/clustering/README.md +++ b/clustering/README.md @@ -34,7 +34,7 @@ This directory contains diagrams for the clustering design doc. This depends on the `seqdiag` [utility](http://blockdiag.com/en/seqdiag/index.html). Assuming you have a non-borked python install, this should be installable with -```bash +```sh pip install seqdiag ``` @@ -44,7 +44,7 @@ Just call `make` to regenerate the diagrams. If you are on a Mac or your pip install is messed up, you can easily build with docker. -``` +```sh make docker ``` diff --git a/event_compression.md b/event_compression.md index 29e65917..3b988048 100644 --- a/event_compression.md +++ b/event_compression.md @@ -90,7 +90,7 @@ Each binary that generates events: Sample kubectl output -``` +```console FIRSTSEEN LASTSEEN COUNT NAME KIND SUBOBJECT REASON SOURCE MESSAGE Thu, 12 Feb 2015 01:13:02 +0000 Thu, 12 Feb 2015 01:13:02 +0000 1 kubernetes-minion-4.c.saad-dev-vms.internal Minion starting {kubelet kubernetes-minion-4.c.saad-dev-vms.internal} Starting kubelet. Thu, 12 Feb 2015 01:13:09 +0000 Thu, 12 Feb 2015 01:13:09 +0000 1 kubernetes-minion-1.c.saad-dev-vms.internal Minion starting {kubelet kubernetes-minion-1.c.saad-dev-vms.internal} Starting kubelet. diff --git a/namespaces.md b/namespaces.md index 1f1a767c..da3bb2c5 100644 --- a/namespaces.md +++ b/namespaces.md @@ -74,7 +74,7 @@ The Namespace provides a unique scope for: A *Namespace* defines a logically named group for multiple *Kind*s of resources. -``` +```go type Namespace struct { TypeMeta `json:",inline"` ObjectMeta `json:"metadata,omitempty"` @@ -125,7 +125,7 @@ See [Admission control: Resource Quota](admission_control_resource_quota.md) Upon creation of a *Namespace*, the creator may provide a list of *Finalizer* objects. -``` +```go type FinalizerName string // These are internal finalizers to Kubernetes, must be qualified name unless defined here @@ -154,7 +154,7 @@ set by default. A *Namespace* may exist in the following phases. -``` +```go type NamespacePhase string const( NamespaceActive NamespacePhase = "Active" @@ -262,7 +262,7 @@ to take part in Namespace termination. OpenShift creates a Namespace in Kubernetes -``` +```json { "apiVersion":"v1", "kind": "Namespace", @@ -287,7 +287,7 @@ own storage associated with the "development" namespace unknown to Kubernetes. User deletes the Namespace in Kubernetes, and Namespace now has following state: -``` +```json { "apiVersion":"v1", "kind": "Namespace", @@ -312,7 +312,7 @@ and begins to terminate all of the content in the namespace that it knows about. success, it executes a *finalize* action that modifies the *Namespace* by removing *kubernetes* from the list of finalizers: -``` +```json { "apiVersion":"v1", "kind": "Namespace", @@ -340,7 +340,7 @@ from the list of finalizers. This results in the following state: -``` +```json { "apiVersion":"v1", "kind": "Namespace", diff --git a/networking.md b/networking.md index d7822d4d..b1d5a460 100644 --- a/networking.md +++ b/networking.md @@ -131,7 +131,7 @@ differentiate it from `docker0`) is set up outside of Docker proper. Example of GCE's advanced routing rules: -``` +```sh gcloud compute routes add "${MINION_NAMES[$i]}" \ --project "${PROJECT}" \ --destination-range "${MINION_IP_RANGES[$i]}" \ diff --git a/persistent-storage.md b/persistent-storage.md index 3e9edd3e..9b0cd0d7 100644 --- a/persistent-storage.md +++ b/persistent-storage.md @@ -127,7 +127,7 @@ Events that communicate the state of a mounted volume are left to the volume plu An administrator provisions storage by posting PVs to the API. Various way to automate this task can be scripted. Dynamic provisioning is a future feature that can maintain levels of PVs. -``` +```yaml POST: kind: PersistentVolume @@ -140,15 +140,13 @@ spec: persistentDisk: pdName: "abc123" fsType: "ext4" +``` --------------------------------------------------- - -kubectl get pv +```console +$ kubectl get pv NAME LABELS CAPACITY ACCESSMODES STATUS CLAIM REASON pv0001 map[] 10737418240 RWO Pending - - ``` #### Users request storage @@ -157,9 +155,9 @@ A user requests storage by posting a PVC to the API. Their request contains the The user must be within a namespace to create PVCs. -``` - +```yaml POST: + kind: PersistentVolumeClaim apiVersion: v1 metadata: @@ -170,15 +168,13 @@ spec: resources: requests: storage: 3 +``` --------------------------------------------------- - -kubectl get pvc - +```console +$ kubectl get pvc NAME LABELS STATUS VOLUME myclaim-1 map[] pending - ``` @@ -186,9 +182,8 @@ myclaim-1 map[] pending The ```PersistentVolumeClaimBinder``` attempts to find an available volume that most closely matches the user's request. If one exists, they are bound by putting a reference on the PV to the PVC. Requests can go unfulfilled if a suitable match is not found. -``` - -kubectl get pv +```console +$ kubectl get pv NAME LABELS CAPACITY ACCESSMODES STATUS CLAIM REASON pv0001 map[] 10737418240 RWO Bound myclaim-1 / f4b3d283-c0ef-11e4-8be4-80e6500a981e @@ -198,8 +193,6 @@ kubectl get pvc NAME LABELS STATUS VOLUME myclaim-1 map[] Bound b16e91d6-c0ef-11e4-8be4-80e6500a981e - - ``` #### Claim usage @@ -208,7 +201,7 @@ The claim holder can use their claim as a volume. The ```PersistentVolumeClaimV The claim holder owns the claim and its data for as long as the claim exists. The pod using the claim can be deleted, but the claim remains in the user's namespace. It can be used again and again by many pods. -``` +```yaml POST: kind: Pod @@ -229,17 +222,14 @@ spec: accessMode: ReadWriteOnce claimRef: name: myclaim-1 - ``` #### Releasing a claim and Recycling a volume When a claim holder is finished with their data, they can delete their claim. -``` - -kubectl delete pvc myclaim-1 - +```console +$ kubectl delete pvc myclaim-1 ``` The ```PersistentVolumeClaimBinder``` will reconcile this by removing the claim reference from the PV and change the PVs status to 'Released'. diff --git a/resources.md b/resources.md index 055c5d86..7bcce84a 100644 --- a/resources.md +++ b/resources.md @@ -89,7 +89,7 @@ Both users and a number of system components, such as schedulers, (horizontal) a Resource requirements for a container or pod should have the following form: -``` +```yaml resourceRequirementSpec: [ request: [ cpu: 2.5, memory: "40Mi" ], limit: [ cpu: 4.0, memory: "99Mi" ], @@ -103,7 +103,7 @@ Where: Total capacity for a node should have a similar structure: -``` +```yaml resourceCapacitySpec: [ total: [ cpu: 12, memory: "128Gi" ] ] @@ -159,15 +159,16 @@ rather than decimal ones: "64MiB" rather than "64MB". A resource type may have an associated read-only ResourceType structure, that contains metadata about the type. For example: -``` +```yaml resourceTypes: [ "kubernetes.io/memory": [ isCompressible: false, ... ] "kubernetes.io/cpu": [ - isCompressible: true, internalScaleExponent: 3, ... + isCompressible: true, + internalScaleExponent: 3, ... ] - "kubernetes.io/disk-space": [ ... } + "kubernetes.io/disk-space": [ ... ] ] ``` @@ -195,7 +196,7 @@ Because resource usage and related metrics change continuously, need to be track Singleton values for observed and predicted future usage will rapidly prove inadequate, so we will support the following structure for extended usage information: -``` +```yaml resourceStatus: [ usage: [ cpu: , memory: ], maxusage: [ cpu: , memory: ], @@ -205,7 +206,7 @@ resourceStatus: [ where a `` or `` structure looks like this: -``` +```yaml { mean: # arithmetic mean max: # minimum value @@ -218,7 +219,7 @@ where a `` or `` structure looks like this: "99.9": <99.9th-percentile-value>, ... ] - } +} ``` All parts of this structure are optional, although we strongly encourage including quantities for 50, 90, 95, 99, 99.5, and 99.9 percentiles. _[In practice, it will be important to include additional info such as the length of the time window over which the averages are calculated, the confidence level, and information-quality metrics such as the number of dropped or discarded data points.]_ diff --git a/simple-rolling-update.md b/simple-rolling-update.md index 80bc6566..f5ef348a 100644 --- a/simple-rolling-update.md +++ b/simple-rolling-update.md @@ -62,7 +62,7 @@ To facilitate recovery in the case of a crash of the updating process itself, we Recovery is achieved by issuing the same command again: -``` +```sh kubectl rolling-update foo [foo-v2] --image=myimage:v2 ``` -- cgit v1.2.3 From 883791a848441058457a4ab4ac50388b42396af8 Mon Sep 17 00:00:00 2001 From: Alex Robinson Date: Sun, 19 Jul 2015 08:54:49 +0000 Subject: Improve devel docs syntax highlighting. --- api-conventions.md | 2 +- api_changes.md | 8 +++--- cherry-picks.md | 2 +- developer-guides/vagrant.md | 34 ++++++++++++------------- development.md | 62 ++++++++++++++++++++++----------------------- flaky-tests.md | 2 +- getting-builds.md | 4 +-- profiling.md | 14 +++++----- releasing.md | 26 +++++++++---------- 9 files changed, 77 insertions(+), 77 deletions(-) diff --git a/api-conventions.md b/api-conventions.md index c2d71078..64509dae 100644 --- a/api-conventions.md +++ b/api-conventions.md @@ -524,7 +524,7 @@ The status object is encoded as JSON and provided as the body of the response. **Example:** -``` +```console $ curl -v -k -H "Authorization: Bearer WhCDvq4VPpYhrcfmF6ei7V9qlbqTubUc" https://10.240.122.184:443/api/v1/namespaces/default/pods/grafana > GET /api/v1/namespaces/default/pods/grafana HTTP/1.1 diff --git a/api_changes.md b/api_changes.md index 7a0418e8..d8e20014 100644 --- a/api_changes.md +++ b/api_changes.md @@ -284,8 +284,8 @@ Once all the necessary manually written conversions are added, you need to regenerate auto-generated ones. To regenerate them: - run -``` - $ hack/update-generated-conversions.sh +```sh +hack/update-generated-conversions.sh ``` If running the above script is impossible due to compile errors, the easiest @@ -359,8 +359,8 @@ an example to illustrate your change. Make sure you update the swagger API spec by running: -```shell -$ hack/update-swagger-spec.sh +```sh +hack/update-swagger-spec.sh ``` The API spec changes should be in a commit separate from your other changes. diff --git a/cherry-picks.md b/cherry-picks.md index 7ed63d08..c36741c4 100644 --- a/cherry-picks.md +++ b/cherry-picks.md @@ -40,7 +40,7 @@ Kubernetes projects. Any contributor can propose a cherry pick of any pull request, like so: -``` +```sh hack/cherry_pick_pull.sh upstream/release-3.14 98765 ``` diff --git a/developer-guides/vagrant.md b/developer-guides/vagrant.md index e704bf3b..c1e02ff4 100644 --- a/developer-guides/vagrant.md +++ b/developer-guides/vagrant.md @@ -86,8 +86,8 @@ vagrant ssh minion-3 To view the service status and/or logs on the kubernetes-master: -```sh -vagrant ssh master +```console +$ vagrant ssh master [vagrant@kubernetes-master ~] $ sudo systemctl status kube-apiserver [vagrant@kubernetes-master ~] $ sudo journalctl -r -u kube-apiserver @@ -100,8 +100,8 @@ vagrant ssh master To view the services on any of the nodes: -```sh -vagrant ssh minion-1 +```console +$ vagrant ssh minion-1 [vagrant@kubernetes-minion-1] $ sudo systemctl status docker [vagrant@kubernetes-minion-1] $ sudo journalctl -r -u docker [vagrant@kubernetes-minion-1] $ sudo systemctl status kubelet @@ -135,7 +135,7 @@ Once your Vagrant machines are up and provisioned, the first thing to do is to c You may need to build the binaries first, you can do this with ```make``` -```sh +```console $ ./cluster/kubectl.sh get nodes NAME LABELS STATUS @@ -182,8 +182,8 @@ Interact with the cluster When using the vagrant provider in Kubernetes, the `cluster/kubectl.sh` script will cache your credentials in a `~/.kubernetes_vagrant_auth` file so you will not be prompted for them in the future. -```sh -cat ~/.kubernetes_vagrant_auth +```console +$ cat ~/.kubernetes_vagrant_auth { "User": "vagrant", "Password": "vagrant" "CAFile": "/home/k8s_user/.kubernetes.vagrant.ca.crt", @@ -202,7 +202,7 @@ You should now be set to use the `cluster/kubectl.sh` script. For example try to Your cluster is running, you can list the nodes in your cluster: -```sh +```console $ ./cluster/kubectl.sh get nodes NAME LABELS STATUS @@ -216,7 +216,7 @@ Now start running some containers! You can now use any of the cluster/kube-*.sh commands to interact with your VM machines. Before starting a container there will be no pods, services and replication controllers. -``` +```console $ cluster/kubectl.sh get pods NAME READY STATUS RESTARTS AGE @@ -229,7 +229,7 @@ CONTROLLER CONTAINER(S) IMAGE(S) SELECTOR REPLICAS Start a container running nginx with a replication controller and three replicas -``` +```console $ cluster/kubectl.sh run my-nginx --image=nginx --replicas=3 --port=80 CONTROLLER CONTAINER(S) IMAGE(S) SELECTOR REPLICAS my-nginx my-nginx nginx run=my-nginx 3 @@ -237,7 +237,7 @@ my-nginx my-nginx nginx run=my-nginx 3 When listing the pods, you will see that three containers have been started and are in Waiting state: -``` +```console $ cluster/kubectl.sh get pods NAME READY STATUS RESTARTS AGE my-nginx-389da 1/1 Waiting 0 33s @@ -247,7 +247,7 @@ my-nginx-nyj3x 1/1 Waiting 0 33s You need to wait for the provisioning to complete, you can monitor the minions by doing: -```sh +```console $ sudo salt '*minion-1' cmd.run 'docker images' kubernetes-minion-1: REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE @@ -257,7 +257,7 @@ kubernetes-minion-1: Once the docker image for nginx has been downloaded, the container will start and you can list it: -```sh +```console $ sudo salt '*minion-1' cmd.run 'docker ps' kubernetes-minion-1: CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES @@ -267,7 +267,7 @@ kubernetes-minion-1: Going back to listing the pods, services and replicationcontrollers, you now have: -``` +```console $ cluster/kubectl.sh get pods NAME READY STATUS RESTARTS AGE my-nginx-389da 1/1 Running 0 33s @@ -286,7 +286,7 @@ We did not start any services, hence there are none listed. But we see three rep Check the [guestbook](../../../examples/guestbook/README.md) application to learn how to create a service. You can already play with scaling the replicas with: -```sh +```console $ ./cluster/kubectl.sh scale rc my-nginx --replicas=2 $ ./cluster/kubectl.sh get pods NAME READY STATUS RESTARTS AGE @@ -327,8 +327,8 @@ rm ~/.kubernetes_vagrant_auth After using kubectl.sh make sure that the correct credentials are set: -```sh -cat ~/.kubernetes_vagrant_auth +```console +$ cat ~/.kubernetes_vagrant_auth { "User": "vagrant", "Password": "vagrant" diff --git a/development.md b/development.md index 6822ab5e..bb233051 100644 --- a/development.md +++ b/development.md @@ -58,40 +58,40 @@ Below, we outline one of the more common git workflows that core developers use. The commands below require that you have $GOPATH set ([$GOPATH docs](https://golang.org/doc/code.html#GOPATH)). We highly recommend you put kubernetes' code into your GOPATH. Note: the commands below will not work if there is more than one directory in your `$GOPATH`. -``` -$ mkdir -p $GOPATH/src/github.com/GoogleCloudPlatform/ -$ cd $GOPATH/src/github.com/GoogleCloudPlatform/ +```sh +mkdir -p $GOPATH/src/github.com/GoogleCloudPlatform/ +cd $GOPATH/src/github.com/GoogleCloudPlatform/ # Replace "$YOUR_GITHUB_USERNAME" below with your github username -$ git clone https://github.com/$YOUR_GITHUB_USERNAME/kubernetes.git -$ cd kubernetes -$ git remote add upstream 'https://github.com/GoogleCloudPlatform/kubernetes.git' +git clone https://github.com/$YOUR_GITHUB_USERNAME/kubernetes.git +cd kubernetes +git remote add upstream 'https://github.com/GoogleCloudPlatform/kubernetes.git' ``` ### Create a branch and make changes -``` -$ git checkout -b myfeature +```sh +git checkout -b myfeature # Make your code changes ``` ### Keeping your development fork in sync -``` -$ git fetch upstream -$ git rebase upstream/master +```sh +git fetch upstream +git rebase upstream/master ``` Note: If you have write access to the main repository at github.com/GoogleCloudPlatform/kubernetes, you should modify your git configuration so that you can't accidentally push to upstream: -``` +```sh git remote set-url --push upstream no_push ``` ### Commiting changes to your fork -``` -$ git commit -$ git push -f origin myfeature +```sh +git commit +git push -f origin myfeature ``` ### Creating a pull request @@ -114,7 +114,7 @@ directly from mercurial. 2) Create a new GOPATH for your tools and install godep: -``` +```sh export GOPATH=$HOME/go-tools mkdir -p $GOPATH go get github.com/tools/godep @@ -122,7 +122,7 @@ go get github.com/tools/godep 3) Add $GOPATH/bin to your path. Typically you'd add this to your ~/.profile: -``` +```sh export GOPATH=$HOME/go-tools export PATH=$PATH:$GOPATH/bin ``` @@ -133,7 +133,7 @@ Here's a quick walkthrough of one way to use godeps to add or update a Kubernete 1) Devote a directory to this endeavor: -``` +```sh export KPATH=$HOME/code/kubernetes mkdir -p $KPATH/src/github.com/GoogleCloudPlatform/kubernetes cd $KPATH/src/github.com/GoogleCloudPlatform/kubernetes @@ -143,7 +143,7 @@ git clone https://path/to/your/fork . 2) Set up your GOPATH. -``` +```sh # Option A: this will let your builds see packages that exist elsewhere on your system. export GOPATH=$KPATH:$GOPATH # Option B: This will *not* let your local builds see packages that exist elsewhere on your system. @@ -153,14 +153,14 @@ export GOPATH=$KPATH 3) Populate your new GOPATH. -``` +```sh cd $KPATH/src/github.com/GoogleCloudPlatform/kubernetes godep restore ``` 4) Next, you can either add a new dependency or update an existing one. -``` +```sh # To add a new dependency, do: cd $KPATH/src/github.com/GoogleCloudPlatform/kubernetes go get path/to/dependency @@ -185,28 +185,28 @@ Please send dependency updates in separate commits within your PR, for easier re Before committing any changes, please link/copy these hooks into your .git directory. This will keep you from accidentally committing non-gofmt'd go code. -``` +```sh cd kubernetes/.git/hooks/ ln -s ../../hooks/pre-commit . ``` ## Unit tests -``` +```sh cd kubernetes hack/test-go.sh ``` Alternatively, you could also run: -``` +```sh cd kubernetes godep go test ./... ``` If you only want to run unit tests in one package, you could run ``godep go test`` under the package directory. For example, the following commands will run all unit tests in package kubelet: -``` +```console $ cd kubernetes # step into kubernetes' directory. $ cd pkg/kubelet $ godep go test @@ -221,7 +221,7 @@ Currently, collecting coverage is only supported for the Go unit tests. To run all unit tests and generate an HTML coverage report, run the following: -``` +```sh cd kubernetes KUBE_COVER=y hack/test-go.sh ``` @@ -230,7 +230,7 @@ At the end of the run, an the HTML report will be generated with the path printe To run tests and collect coverage in only one package, pass its relative path under the `kubernetes` directory as an argument, for example: -``` +```sh cd kubernetes KUBE_COVER=y hack/test-go.sh pkg/kubectl ``` @@ -243,7 +243,7 @@ Coverage results for the project can also be viewed on [Coveralls](https://cover You need an [etcd](https://github.com/coreos/etcd/releases/tag/v2.0.0) in your path, please make sure it is installed and in your ``$PATH``. -``` +```sh cd kubernetes hack/test-integration.sh ``` @@ -252,14 +252,14 @@ hack/test-integration.sh You can run an end-to-end test which will bring up a master and two nodes, perform some tests, and then tear everything down. Make sure you have followed the getting started steps for your chosen cloud platform (which might involve changing the `KUBERNETES_PROVIDER` environment variable to something other than "gce". -``` +```sh cd kubernetes hack/e2e-test.sh ``` Pressing control-C should result in an orderly shutdown but if something goes wrong and you still have some VMs running you can force a cleanup with this command: -``` +```sh go run hack/e2e.go --down ``` @@ -332,7 +332,7 @@ See [conformance-test.sh](../../hack/conformance-test.sh). ## Regenerating the CLI documentation -``` +```sh hack/run-gendocs.sh ``` diff --git a/flaky-tests.md b/flaky-tests.md index 1e7f5fcb..1568baed 100644 --- a/flaky-tests.md +++ b/flaky-tests.md @@ -69,7 +69,7 @@ spec: Note that we omit the labels and the selector fields of the replication controller, because they will be populated from the labels field of the pod template by default. -``` +```sh kubectl create -f ./controller.yaml ``` diff --git a/getting-builds.md b/getting-builds.md index 4c92a446..4265b77a 100644 --- a/getting-builds.md +++ b/getting-builds.md @@ -35,7 +35,7 @@ Documentation for other releases can be found at You can use [hack/get-build.sh](../../hack/get-build.sh) to or use as a reference on how to get the most recent builds with curl. With `get-build.sh` you can grab the most recent stable build, the most recent release candidate, or the most recent build to pass our ci and gce e2e tests (essentially a nightly build). -``` +```console usage: ./hack/get-build.sh [stable|release|latest|latest-green] @@ -47,7 +47,7 @@ usage: You can also use the gsutil tool to explore the Google Cloud Storage release bucket. Here are some examples: -``` +```sh gsutil cat gs://kubernetes-release/ci/latest.txt # output the latest ci version number gsutil cat gs://kubernetes-release/ci/latest-green.txt # output the latest ci version number that passed gce e2e gsutil ls gs://kubernetes-release/ci/v0.20.0-29-g29a55cc/ # list the contents of a ci release diff --git a/profiling.md b/profiling.md index d36885dd..36bbfbae 100644 --- a/profiling.md +++ b/profiling.md @@ -43,10 +43,10 @@ Go comes with inbuilt 'net/http/pprof' profiling library and profiling web servi TL;DR: Add lines: -``` - m.mux.HandleFunc("/debug/pprof/", pprof.Index) - m.mux.HandleFunc("/debug/pprof/profile", pprof.Profile) - m.mux.HandleFunc("/debug/pprof/symbol", pprof.Symbol) +```go +m.mux.HandleFunc("/debug/pprof/", pprof.Index) +m.mux.HandleFunc("/debug/pprof/profile", pprof.Profile) +m.mux.HandleFunc("/debug/pprof/symbol", pprof.Symbol) ``` to the init(c *Config) method in 'pkg/master/master.go' and import 'net/http/pprof' package. @@ -57,13 +57,13 @@ In most use cases to use profiler service it's enough to do 'import _ net/http/p Even when running profiler I found not really straightforward to use 'go tool pprof' with it. The problem is that at least for dev purposes certificates generated for APIserver are not signed by anyone trusted and because secureServer serves only secure traffic it isn't straightforward to connect to the service. The best workaround I found is by creating an ssh tunnel from the kubernetes_master open unsecured port to some external server, and use this server as a proxy. To save everyone looking for correct ssh flags, it is done by running: -``` - ssh kubernetes_master -L:localhost:8080 +```sh +ssh kubernetes_master -L:localhost:8080 ``` or analogous one for you Cloud provider. Afterwards you can e.g. run -``` +```sh go tool pprof http://localhost:/debug/pprof/profile ``` diff --git a/releasing.md b/releasing.md index 65db081d..9950e6e4 100644 --- a/releasing.md +++ b/releasing.md @@ -65,7 +65,7 @@ to make sure they're solid around then as well. Once you find some greens, you can find the Git hash for a build by looking at the "Console Log", then look for `githash=`. You should see a line line: -``` +```console + githash=v0.20.2-322-g974377b ``` @@ -80,7 +80,7 @@ oncall. Before proceeding to the next step: -``` +```sh export BRANCHPOINT=v0.20.2-322-g974377b ``` @@ -230,11 +230,11 @@ present. We are using `pkg/version/base.go` as the source of versioning in absence of information from git. Here is a sample of that file's contents: -``` - var ( - gitVersion string = "v0.4-dev" // version from git, output of $(git describe) - gitCommit string = "" // sha1 from git, output of $(git rev-parse HEAD) - ) +```go +var ( + gitVersion string = "v0.4-dev" // version from git, output of $(git describe) + gitCommit string = "" // sha1 from git, output of $(git rev-parse HEAD) +) ``` This means a build with `go install` or `go get` or a build from a tarball will @@ -313,14 +313,14 @@ projects seem to live with that and it does not really become a large problem. As an example, Docker commit a327d9b91edf has a `v1.1.1-N-gXXX` label but it is not present in Docker `v1.2.0`: -``` - $ git describe a327d9b91edf - v1.1.1-822-ga327d9b91edf +```console +$ git describe a327d9b91edf +v1.1.1-822-ga327d9b91edf - $ git log --oneline v1.2.0..a327d9b91edf - a327d9b91edf Fix data space reporting from Kb/Mb to KB/MB +$ git log --oneline v1.2.0..a327d9b91edf +a327d9b91edf Fix data space reporting from Kb/Mb to KB/MB - (Non-empty output here means the commit is not present on v1.2.0.) +(Non-empty output here means the commit is not present on v1.2.0.) ``` ## Release Notes -- cgit v1.2.3 From dc711364b082ae691bcf0592653b025db0fa2ef5 Mon Sep 17 00:00:00 2001 From: Alex Robinson Date: Sun, 19 Jul 2015 09:04:42 +0000 Subject: Fix gendocs --- api-conventions.md | 1 + 1 file changed, 1 insertion(+) diff --git a/api-conventions.md b/api-conventions.md index c2d71078..0c12e5a6 100644 --- a/api-conventions.md +++ b/api-conventions.md @@ -69,6 +69,7 @@ using resources with kubectl can be found in (working_with_resources.md).* - [Success codes](#success-codes) - [Error codes](#error-codes) - [Response Status Kind](#response-status-kind) + - [Events](#events) -- cgit v1.2.3 From 4bef20df2177f38a04f0cab82d8d1ca5abe8be5c Mon Sep 17 00:00:00 2001 From: Alex Robinson Date: Sun, 19 Jul 2015 05:58:13 +0000 Subject: Replace ``` with ` when emphasizing something inline in docs/ --- admission_control_limit_range.md | 2 +- admission_control_resource_quota.md | 2 +- event_compression.md | 34 +++++++++--------- persistent-storage.md | 2 +- secrets.md | 4 +-- simple-rolling-update.md | 72 ++++++++++++++++++------------------- 6 files changed, 58 insertions(+), 58 deletions(-) diff --git a/admission_control_limit_range.md b/admission_control_limit_range.md index ccdb44d8..d7a478ab 100644 --- a/admission_control_limit_range.md +++ b/admission_control_limit_range.md @@ -136,7 +136,7 @@ $ kube-apiserver -admission_control=LimitRanger kubectl is modified to support the **LimitRange** resource. -```kubectl describe``` provides a human-readable output of limits. +`kubectl describe` provides a human-readable output of limits. For example, diff --git a/admission_control_resource_quota.md b/admission_control_resource_quota.md index 99d5431a..9ac3dd80 100644 --- a/admission_control_resource_quota.md +++ b/admission_control_resource_quota.md @@ -163,7 +163,7 @@ this being the resource most closely running at the prescribed quota limits. kubectl is modified to support the **ResourceQuota** resource. -```kubectl describe``` provides a human-readable output of quota. +`kubectl describe` provides a human-readable output of quota. For example, diff --git a/event_compression.md b/event_compression.md index 29e65917..bbc63155 100644 --- a/event_compression.md +++ b/event_compression.md @@ -38,41 +38,41 @@ This document captures the design of event compression. ## Background -Kubernetes components can get into a state where they generate tons of events which are identical except for the timestamp. For example, when pulling a non-existing image, Kubelet will repeatedly generate ```image_not_existing``` and ```container_is_waiting``` events until upstream components correct the image. When this happens, the spam from the repeated events makes the entire event mechanism useless. It also appears to cause memory pressure in etcd (see [#3853](https://github.com/GoogleCloudPlatform/kubernetes/issues/3853)). +Kubernetes components can get into a state where they generate tons of events which are identical except for the timestamp. For example, when pulling a non-existing image, Kubelet will repeatedly generate `image_not_existing` and `container_is_waiting` events until upstream components correct the image. When this happens, the spam from the repeated events makes the entire event mechanism useless. It also appears to cause memory pressure in etcd (see [#3853](https://github.com/GoogleCloudPlatform/kubernetes/issues/3853)). ## Proposal -Each binary that generates events (for example, ```kubelet```) should keep track of previously generated events so that it can collapse recurring events into a single event instead of creating a new instance for each new event. +Each binary that generates events (for example, `kubelet`) should keep track of previously generated events so that it can collapse recurring events into a single event instead of creating a new instance for each new event. -Event compression should be best effort (not guaranteed). Meaning, in the worst case, ```n``` identical (minus timestamp) events may still result in ```n``` event entries. +Event compression should be best effort (not guaranteed). Meaning, in the worst case, `n` identical (minus timestamp) events may still result in `n` event entries. ## Design Instead of a single Timestamp, each event object [contains](../../pkg/api/types.go#L1111) the following fields: - * ```FirstTimestamp util.Time``` + * `FirstTimestamp util.Time` * The date/time of the first occurrence of the event. - * ```LastTimestamp util.Time``` + * `LastTimestamp util.Time` * The date/time of the most recent occurrence of the event. * On first occurrence, this is equal to the FirstTimestamp. - * ```Count int``` + * `Count int` * The number of occurrences of this event between FirstTimestamp and LastTimestamp * On first occurrence, this is 1. Each binary that generates events: * Maintains a historical record of previously generated events: - * Implemented with ["Least Recently Used Cache"](https://github.com/golang/groupcache/blob/master/lru/lru.go) in [```pkg/client/record/events_cache.go```](../../pkg/client/record/events_cache.go). + * Implemented with ["Least Recently Used Cache"](https://github.com/golang/groupcache/blob/master/lru/lru.go) in [`pkg/client/record/events_cache.go`](../../pkg/client/record/events_cache.go). * The key in the cache is generated from the event object minus timestamps/count/transient fields, specifically the following events fields are used to construct a unique key for an event: - * ```event.Source.Component``` - * ```event.Source.Host``` - * ```event.InvolvedObject.Kind``` - * ```event.InvolvedObject.Namespace``` - * ```event.InvolvedObject.Name``` - * ```event.InvolvedObject.UID``` - * ```event.InvolvedObject.APIVersion``` - * ```event.Reason``` - * ```event.Message``` + * `event.Source.Component` + * `event.Source.Host` + * `event.InvolvedObject.Kind` + * `event.InvolvedObject.Namespace` + * `event.InvolvedObject.Name` + * `event.InvolvedObject.UID` + * `event.InvolvedObject.APIVersion` + * `event.Reason` + * `event.Message` * The LRU cache is capped at 4096 events. That means if a component (e.g. kubelet) runs for a long period of time and generates tons of unique events, the previously generated events cache will not grow unchecked in memory. Instead, after 4096 unique events are generated, the oldest events are evicted from the cache. - * When an event is generated, the previously generated events cache is checked (see [```pkg/client/record/event.go```](../../pkg/client/record/event.go)). + * When an event is generated, the previously generated events cache is checked (see [`pkg/client/record/event.go`](../../pkg/client/record/event.go)). * If the key for the new event matches the key for a previously generated event (meaning all of the above fields match between the new event and some previously generated event), then the event is considered to be a duplicate and the existing event entry is updated in etcd: * The new PUT (update) event API is called to update the existing event entry in etcd with the new last seen timestamp and count. * The event is also updated in the previously generated events cache with an incremented count, updated last seen timestamp, name, and new resource version (all required to issue a future event update). diff --git a/persistent-storage.md b/persistent-storage.md index 3e9edd3e..d064e701 100644 --- a/persistent-storage.md +++ b/persistent-storage.md @@ -65,7 +65,7 @@ Kubernetes makes no guarantees at runtime that the underlying storage exists or #### Describe available storage -Cluster administrators use the API to manage *PersistentVolumes*. A custom store ```NewPersistentVolumeOrderedIndex``` will index volumes by access modes and sort by storage capacity. The ```PersistentVolumeClaimBinder``` watches for new claims for storage and binds them to an available volume by matching the volume's characteristics (AccessModes and storage size) to the user's request. +Cluster administrators use the API to manage *PersistentVolumes*. A custom store `NewPersistentVolumeOrderedIndex` will index volumes by access modes and sort by storage capacity. The `PersistentVolumeClaimBinder` watches for new claims for storage and binds them to an available volume by matching the volume's characteristics (AccessModes and storage size) to the user's request. PVs are system objects and, thus, have no namespace. diff --git a/secrets.md b/secrets.md index 8aab1088..876a9390 100644 --- a/secrets.md +++ b/secrets.md @@ -297,7 +297,7 @@ storing it. Secrets contain multiple pieces of data that are presented as differ the secret volume (example: SSH key pair). In order to remove the burden from the end user in specifying every file that a secret consists of, -it should be possible to mount all files provided by a secret with a single ```VolumeMount``` entry +it should be possible to mount all files provided by a secret with a single `VolumeMount` entry in the container specification. ### Secret API Resource @@ -349,7 +349,7 @@ finer points of secrets and resource allocation are fleshed out. ### Secret Volume Source -A new `SecretSource` type of volume source will be added to the ```VolumeSource``` struct in the +A new `SecretSource` type of volume source will be added to the `VolumeSource` struct in the API: ```go diff --git a/simple-rolling-update.md b/simple-rolling-update.md index 80bc6566..be38f20e 100644 --- a/simple-rolling-update.md +++ b/simple-rolling-update.md @@ -33,15 +33,15 @@ Documentation for other releases can be found at ## Simple rolling update -This is a lightweight design document for simple [rolling update](../user-guide/kubectl/kubectl_rolling-update.md) in ```kubectl```. +This is a lightweight design document for simple [rolling update](../user-guide/kubectl/kubectl_rolling-update.md) in `kubectl`. Complete execution flow can be found [here](#execution-details). See the [example of rolling update](../user-guide/update-demo/) for more information. ### Lightweight rollout -Assume that we have a current replication controller named ```foo``` and it is running image ```image:v1``` +Assume that we have a current replication controller named `foo` and it is running image `image:v1` -```kubectl rolling-update foo [foo-v2] --image=myimage:v2``` +`kubectl rolling-update foo [foo-v2] --image=myimage:v2` If the user doesn't specify a name for the 'next' replication controller, then the 'next' replication controller is renamed to the name of the original replication controller. @@ -50,15 +50,15 @@ Obviously there is a race here, where if you kill the client between delete foo, See [Recovery](#recovery) below If the user does specify a name for the 'next' replication controller, then the 'next' replication controller is retained with its existing name, -and the old 'foo' replication controller is deleted. For the purposes of the rollout, we add a unique-ifying label ```kubernetes.io/deployment``` to both the ```foo``` and ```foo-next``` replication controllers. -The value of that label is the hash of the complete JSON representation of the```foo-next``` or```foo``` replication controller. The name of this label can be overridden by the user with the ```--deployment-label-key``` flag. +and the old 'foo' replication controller is deleted. For the purposes of the rollout, we add a unique-ifying label `kubernetes.io/deployment` to both the `foo` and `foo-next` replication controllers. +The value of that label is the hash of the complete JSON representation of the`foo-next` or`foo` replication controller. The name of this label can be overridden by the user with the `--deployment-label-key` flag. #### Recovery If a rollout fails or is terminated in the middle, it is important that the user be able to resume the roll out. -To facilitate recovery in the case of a crash of the updating process itself, we add the following annotations to each replication controller in the ```kubernetes.io/``` annotation namespace: - * ```desired-replicas``` The desired number of replicas for this replication controller (either N or zero) - * ```update-partner``` A pointer to the replication controller resource that is the other half of this update (syntax `````` the namespace is assumed to be identical to the namespace of this replication controller.) +To facilitate recovery in the case of a crash of the updating process itself, we add the following annotations to each replication controller in the `kubernetes.io/` annotation namespace: + * `desired-replicas` The desired number of replicas for this replication controller (either N or zero) + * `update-partner` A pointer to the replication controller resource that is the other half of this update (syntax `` the namespace is assumed to be identical to the namespace of this replication controller.) Recovery is achieved by issuing the same command again: @@ -66,70 +66,70 @@ Recovery is achieved by issuing the same command again: kubectl rolling-update foo [foo-v2] --image=myimage:v2 ``` -Whenever the rolling update command executes, the kubectl client looks for replication controllers called ```foo``` and ```foo-next```, if they exist, an attempt is -made to roll ```foo``` to ```foo-next```. If ```foo-next``` does not exist, then it is created, and the rollout is a new rollout. If ```foo``` doesn't exist, then -it is assumed that the rollout is nearly completed, and ```foo-next``` is renamed to ```foo```. Details of the execution flow are given below. +Whenever the rolling update command executes, the kubectl client looks for replication controllers called `foo` and `foo-next`, if they exist, an attempt is +made to roll `foo` to `foo-next`. If `foo-next` does not exist, then it is created, and the rollout is a new rollout. If `foo` doesn't exist, then +it is assumed that the rollout is nearly completed, and `foo-next` is renamed to `foo`. Details of the execution flow are given below. ### Aborting a rollout Abort is assumed to want to reverse a rollout in progress. -```kubectl rolling-update foo [foo-v2] --rollback``` +`kubectl rolling-update foo [foo-v2] --rollback` This is really just semantic sugar for: -```kubectl rolling-update foo-v2 foo``` +`kubectl rolling-update foo-v2 foo` -With the added detail that it moves the ```desired-replicas``` annotation from ```foo-v2``` to ```foo``` +With the added detail that it moves the `desired-replicas` annotation from `foo-v2` to `foo` ### Execution Details -For the purposes of this example, assume that we are rolling from ```foo``` to ```foo-next``` where the only change is an image update from `v1` to `v2` +For the purposes of this example, assume that we are rolling from `foo` to `foo-next` where the only change is an image update from `v1` to `v2` -If the user doesn't specify a ```foo-next``` name, then it is either discovered from the ```update-partner``` annotation on ```foo```. If that annotation doesn't exist, -then ```foo-next``` is synthesized using the pattern ```-``` +If the user doesn't specify a `foo-next` name, then it is either discovered from the `update-partner` annotation on `foo`. If that annotation doesn't exist, +then `foo-next` is synthesized using the pattern `-` #### Initialization - * If ```foo``` and ```foo-next``` do not exist: + * If `foo` and `foo-next` do not exist: * Exit, and indicate an error to the user, that the specified controller doesn't exist. - * If ```foo``` exists, but ```foo-next``` does not: - * Create ```foo-next``` populate it with the ```v2``` image, set ```desired-replicas``` to ```foo.Spec.Replicas``` + * If `foo` exists, but `foo-next` does not: + * Create `foo-next` populate it with the `v2` image, set `desired-replicas` to `foo.Spec.Replicas` * Goto Rollout - * If ```foo-next``` exists, but ```foo``` does not: + * If `foo-next` exists, but `foo` does not: * Assume that we are in the rename phase. * Goto Rename - * If both ```foo``` and ```foo-next``` exist: + * If both `foo` and `foo-next` exist: * Assume that we are in a partial rollout - * If ```foo-next``` is missing the ```desired-replicas``` annotation - * Populate the ```desired-replicas``` annotation to ```foo-next``` using the current size of ```foo``` + * If `foo-next` is missing the `desired-replicas` annotation + * Populate the `desired-replicas` annotation to `foo-next` using the current size of `foo` * Goto Rollout #### Rollout - * While size of ```foo-next``` < ```desired-replicas``` annotation on ```foo-next``` - * increase size of ```foo-next``` - * if size of ```foo``` > 0 - decrease size of ```foo``` + * While size of `foo-next` < `desired-replicas` annotation on `foo-next` + * increase size of `foo-next` + * if size of `foo` > 0 + decrease size of `foo` * Goto Rename #### Rename - * delete ```foo``` - * create ```foo``` that is identical to ```foo-next``` - * delete ```foo-next``` + * delete `foo` + * create `foo` that is identical to `foo-next` + * delete `foo-next` #### Abort - * If ```foo-next``` doesn't exist + * If `foo-next` doesn't exist * Exit and indicate to the user that they may want to simply do a new rollout with the old version - * If ```foo``` doesn't exist + * If `foo` doesn't exist * Exit and indicate not found to the user - * Otherwise, ```foo-next``` and ```foo``` both exist - * Set ```desired-replicas``` annotation on ```foo``` to match the annotation on ```foo-next``` - * Goto Rollout with ```foo``` and ```foo-next``` trading places. + * Otherwise, `foo-next` and `foo` both exist + * Set `desired-replicas` annotation on `foo` to match the annotation on `foo-next` + * Goto Rollout with `foo` and `foo-next` trading places. -- cgit v1.2.3 From 753fab889e0f6de95ba44a06b3b0c60a8fd34f5b Mon Sep 17 00:00:00 2001 From: Alex Robinson Date: Sun, 19 Jul 2015 05:58:13 +0000 Subject: Replace ``` with ` when emphasizing something inline in docs/ --- api-conventions.md | 16 ++++++++-------- developer-guides/vagrant.md | 4 ++-- development.md | 6 +++--- flaky-tests.md | 6 +++--- making-release-notes.md | 4 ++-- profiling.md | 2 +- 6 files changed, 19 insertions(+), 19 deletions(-) diff --git a/api-conventions.md b/api-conventions.md index 0c12e5a6..1438bc8c 100644 --- a/api-conventions.md +++ b/api-conventions.md @@ -498,7 +498,7 @@ The following HTTP status codes may be returned by the API. * `429 StatusTooManyRequests` * Indicates that the either the client rate limit has been exceeded or the server has received more requests then it can process. * Suggested client recovery behavior: - * Read the ```Retry-After``` HTTP header from the response, and wait at least that long before retrying. + * Read the `Retry-After` HTTP header from the response, and wait at least that long before retrying. * `500 StatusInternalServerError` * Indicates that the server can be reached and understood the request, but either an unexpected internal error occurred and the outcome of the call is unknown, or the server cannot complete the action in a reasonable time (this maybe due to temporary server load or a transient communication issue with another server). * Suggested client recovery behavior: @@ -514,12 +514,12 @@ The following HTTP status codes may be returned by the API. ## Response Status Kind -Kubernetes will always return the ```Status``` kind from any API endpoint when an error occurs. +Kubernetes will always return the `Status` kind from any API endpoint when an error occurs. Clients SHOULD handle these types of objects when appropriate. -A ```Status``` kind will be returned by the API in two cases: +A `Status` kind will be returned by the API in two cases: * When an operation is not successful (i.e. when the server would return a non 2xx HTTP status code). - * When a HTTP ```DELETE``` call is successful. + * When a HTTP `DELETE` call is successful. The status object is encoded as JSON and provided as the body of the response. The status object contains fields for humans and machine consumers of the API to get more detailed information for the cause of the failure. The information in the status object supplements, but does not override, the HTTP status code's meaning. When fields in the status object have the same meaning as generally defined HTTP headers and that header is returned with the response, the header should be considered as having higher priority. @@ -555,17 +555,17 @@ $ curl -v -k -H "Authorization: Bearer WhCDvq4VPpYhrcfmF6ei7V9qlbqTubUc" https:/ } ``` -```status``` field contains one of two possible values: +`status` field contains one of two possible values: * `Success` * `Failure` `message` may contain human-readable description of the error -```reason``` may contain a machine-readable description of why this operation is in the `Failure` status. If this value is empty there is no information available. The `reason` clarifies an HTTP status code but does not override it. +`reason` may contain a machine-readable description of why this operation is in the `Failure` status. If this value is empty there is no information available. The `reason` clarifies an HTTP status code but does not override it. -```details``` may contain extended data associated with the reason. Each reason may define its own extended details. This field is optional and the data returned is not guaranteed to conform to any schema except that defined by the reason type. +`details` may contain extended data associated with the reason. Each reason may define its own extended details. This field is optional and the data returned is not guaranteed to conform to any schema except that defined by the reason type. -Possible values for the ```reason``` and ```details``` fields: +Possible values for the `reason` and `details` fields: * `BadRequest` * Indicates that the request itself was invalid, because the request doesn't make any sense, for example deleting a read-only object. * This is different than `status reason` `Invalid` above which indicates that the API call could possibly succeed, but the data was invalid. diff --git a/developer-guides/vagrant.md b/developer-guides/vagrant.md index e704bf3b..bf4ca862 100644 --- a/developer-guides/vagrant.md +++ b/developer-guides/vagrant.md @@ -133,7 +133,7 @@ vagrant destroy Once your Vagrant machines are up and provisioned, the first thing to do is to check that you can use the `kubectl.sh` script. -You may need to build the binaries first, you can do this with ```make``` +You may need to build the binaries first, you can do this with `make` ```sh $ ./cluster/kubectl.sh get nodes @@ -374,7 +374,7 @@ export KUBERNETES_MINION_MEMORY=2048 #### I ran vagrant suspend and nothing works! -```vagrant suspend``` seems to mess up the network. It's not supported at this time. +`vagrant suspend` seems to mess up the network. It's not supported at this time. diff --git a/development.md b/development.md index 6822ab5e..cbcac1de 100644 --- a/development.md +++ b/development.md @@ -106,10 +106,10 @@ Kubernetes uses [godep](https://github.com/tools/godep) to manage dependencies. ### Installing godep -There are many ways to build and host go binaries. Here is an easy way to get utilities like ```godep``` installed: +There are many ways to build and host go binaries. Here is an easy way to get utilities like `godep` installed: 1) Ensure that [mercurial](http://mercurial.selenic.com/wiki/Download) is installed on your system. (some of godep's dependencies use the mercurial -source control system). Use ```apt-get install mercurial``` or ```yum install mercurial``` on Linux, or [brew.sh](http://brew.sh) on OS X, or download +source control system). Use `apt-get install mercurial` or `yum install mercurial` on Linux, or [brew.sh](http://brew.sh) on OS X, or download directly from mercurial. 2) Create a new GOPATH for your tools and install godep: @@ -174,7 +174,7 @@ go get -u path/to/dependency godep update path/to/dependency ``` -5) Before sending your PR, it's a good idea to sanity check that your Godeps.json file is ok by re-restoring: ```godep restore``` +5) Before sending your PR, it's a good idea to sanity check that your Godeps.json file is ok by re-restoring: `godep restore` It is sometimes expedient to manually fix the /Godeps/godeps.json file to minimize the changes. diff --git a/flaky-tests.md b/flaky-tests.md index 1e7f5fcb..522c684f 100644 --- a/flaky-tests.md +++ b/flaky-tests.md @@ -41,7 +41,7 @@ Running a test 1000 times on your own machine can be tedious and time consuming. _Note: these instructions are mildly hacky for now, as we get run once semantics and logging they will get better_ -There is a testing image ```brendanburns/flake``` up on the docker hub. We will use this image to test our fix. +There is a testing image `brendanburns/flake` up on the docker hub. We will use this image to test our fix. Create a replication controller with the following config: @@ -74,7 +74,7 @@ kubectl create -f ./controller.yaml ``` This will spin up 24 instances of the test. They will run to completion, then exit, and the kubelet will restart them, accumulating more and more runs of the test. -You can examine the recent runs of the test by calling ```docker ps -a``` and looking for tasks that exited with non-zero exit codes. Unfortunately, docker ps -a only keeps around the exit status of the last 15-20 containers with the same image, so you have to check them frequently. +You can examine the recent runs of the test by calling `docker ps -a` and looking for tasks that exited with non-zero exit codes. Unfortunately, docker ps -a only keeps around the exit status of the last 15-20 containers with the same image, so you have to check them frequently. You can use this script to automate checking for failures, assuming your cluster is running on GCE and has four nodes: ```sh @@ -93,7 +93,7 @@ Eventually you will have sufficient runs for your purposes. At that point you ca kubectl stop replicationcontroller flakecontroller ``` -If you do a final check for flakes with ```docker ps -a```, ignore tasks that exited -1, since that's what happens when you stop the replication controller. +If you do a final check for flakes with `docker ps -a`, ignore tasks that exited -1, since that's what happens when you stop the replication controller. Happy flake hunting! diff --git a/making-release-notes.md b/making-release-notes.md index d76f7415..d4ec6ccf 100644 --- a/making-release-notes.md +++ b/making-release-notes.md @@ -55,14 +55,14 @@ release. It is likely long and many PRs aren't worth mentioning. If any of the PRs were cherrypicked into patches on the last minor release, you should exclude them from the current release's notes. -Open up ```candidate-notes.md``` in your favorite editor. +Open up `candidate-notes.md` in your favorite editor. Remove, regroup, organize to your hearts content. ### 4) Update CHANGELOG.md -With the final markdown all set, cut and paste it to the top of ```CHANGELOG.md``` +With the final markdown all set, cut and paste it to the top of `CHANGELOG.md` ### 5) Update the Release page diff --git a/profiling.md b/profiling.md index d36885dd..816e600c 100644 --- a/profiling.md +++ b/profiling.md @@ -71,7 +71,7 @@ to get 30 sec. CPU profile. ## Contention profiling -To enable contention profiling you need to add line ```rt.SetBlockProfileRate(1)``` in addition to ```m.mux.HandleFunc(...)``` added before (```rt``` stands for ```runtime``` in ```master.go```). This enables 'debug/pprof/block' subpage, which can be used as an input to ```go tool pprof```. +To enable contention profiling you need to add line `rt.SetBlockProfileRate(1)` in addition to `m.mux.HandleFunc(...)` added before (`rt` stands for `runtime` in `master.go`). This enables 'debug/pprof/block' subpage, which can be used as an input to `go tool pprof`. -- cgit v1.2.3 From 0302cf3c1a511e975f8be11395603a508c52d348 Mon Sep 17 00:00:00 2001 From: David Oppenheimer Date: Mon, 20 Jul 2015 00:25:07 -0700 Subject: Absolutize links that leave the docs/ tree to go anywhere other than to examples/ or back to docs/ --- event_compression.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/event_compression.md b/event_compression.md index 5dfb0311..aea04e41 100644 --- a/event_compression.md +++ b/event_compression.md @@ -48,7 +48,7 @@ Event compression should be best effort (not guaranteed). Meaning, in the worst ## Design -Instead of a single Timestamp, each event object [contains](../../pkg/api/types.go#L1111) the following fields: +Instead of a single Timestamp, each event object [contains](http://releases.k8s.io/HEAD/pkg/api/types.go#L1111) the following fields: * `FirstTimestamp util.Time` * The date/time of the first occurrence of the event. * `LastTimestamp util.Time` @@ -72,7 +72,7 @@ Each binary that generates events: * `event.Reason` * `event.Message` * The LRU cache is capped at 4096 events. That means if a component (e.g. kubelet) runs for a long period of time and generates tons of unique events, the previously generated events cache will not grow unchecked in memory. Instead, after 4096 unique events are generated, the oldest events are evicted from the cache. - * When an event is generated, the previously generated events cache is checked (see [`pkg/client/record/event.go`](../../pkg/client/record/event.go)). + * When an event is generated, the previously generated events cache is checked (see [`pkg/client/record/event.go`](http://releases.k8s.io/HEAD/pkg/client/record/event.go)). * If the key for the new event matches the key for a previously generated event (meaning all of the above fields match between the new event and some previously generated event), then the event is considered to be a duplicate and the existing event entry is updated in etcd: * The new PUT (update) event API is called to update the existing event entry in etcd with the new last seen timestamp and count. * The event is also updated in the previously generated events cache with an incremented count, updated last seen timestamp, name, and new resource version (all required to issue a future event update). -- cgit v1.2.3 From 4ebeb731ad8c73ebd05b63c160c033ced6904505 Mon Sep 17 00:00:00 2001 From: David Oppenheimer Date: Mon, 20 Jul 2015 00:25:07 -0700 Subject: Absolutize links that leave the docs/ tree to go anywhere other than to examples/ or back to docs/ --- cherry-picks.md | 2 +- client-libraries.md | 2 +- development.md | 4 +-- getting-builds.md | 2 +- scheduler.md | 12 ++++----- scheduler_algorithm.md | 68 +++++++++++++++++++++++++------------------------- 6 files changed, 45 insertions(+), 45 deletions(-) diff --git a/cherry-picks.md b/cherry-picks.md index c36741c4..519c73c3 100644 --- a/cherry-picks.md +++ b/cherry-picks.md @@ -54,7 +54,7 @@ particular, they may be self-merged by the release branch owner without fanfare, in the case the release branch owner knows the cherry pick was already requested - this should not be the norm, but it may happen. -[Contributor License Agreements](../../CONTRIBUTING.md) is considered implicit +[Contributor License Agreements](http://releases.k8s.io/HEAD/CONTRIBUTING.md) is considered implicit for all code within cherry-pick pull requests, ***unless there is a large conflict***. diff --git a/client-libraries.md b/client-libraries.md index 69cba1e6..e41c6514 100644 --- a/client-libraries.md +++ b/client-libraries.md @@ -35,7 +35,7 @@ Documentation for other releases can be found at ### Supported - * [Go](../../pkg/client/) + * [Go](http://releases.k8s.io/HEAD/pkg/client/) ### User Contributed diff --git a/development.md b/development.md index 3ff03fdd..f5233a0e 100644 --- a/development.md +++ b/development.md @@ -35,7 +35,7 @@ Documentation for other releases can be found at # Releases and Official Builds -Official releases are built in Docker containers. Details are [here](../../build/README.md). You can do simple builds and development with just a local Docker installation. If want to build go locally outside of docker, please continue below. +Official releases are built in Docker containers. Details are [here](http://releases.k8s.io/HEAD/build/README.md). You can do simple builds and development with just a local Docker installation. If want to build go locally outside of docker, please continue below. ## Go development environment @@ -324,7 +324,7 @@ The conformance test runs a subset of the e2e-tests against a manually-created c require support for up/push/down and other operations. To run a conformance test, you need to know the IP of the master for your cluster and the authorization arguments to use. The conformance test is intended to run against a cluster at a specific binary release of Kubernetes. -See [conformance-test.sh](../../hack/conformance-test.sh). +See [conformance-test.sh](http://releases.k8s.io/HEAD/hack/conformance-test.sh). ## Testing out flaky tests diff --git a/getting-builds.md b/getting-builds.md index 4265b77a..bcb981c4 100644 --- a/getting-builds.md +++ b/getting-builds.md @@ -33,7 +33,7 @@ Documentation for other releases can be found at # Getting Kubernetes Builds -You can use [hack/get-build.sh](../../hack/get-build.sh) to or use as a reference on how to get the most recent builds with curl. With `get-build.sh` you can grab the most recent stable build, the most recent release candidate, or the most recent build to pass our ci and gce e2e tests (essentially a nightly build). +You can use [hack/get-build.sh](http://releases.k8s.io/HEAD/hack/get-build.sh) to or use as a reference on how to get the most recent builds with curl. With `get-build.sh` you can grab the most recent stable build, the most recent release candidate, or the most recent build to pass our ci and gce e2e tests (essentially a nightly build). ```console usage: diff --git a/scheduler.md b/scheduler.md index 1fccc7ad..b2a137d5 100644 --- a/scheduler.md +++ b/scheduler.md @@ -53,30 +53,30 @@ divided by the node's capacity). Finally, the node with the highest priority is chosen (or, if there are multiple such nodes, then one of them is chosen at random). The code for this main scheduling loop is in the function `Schedule()` in -[plugin/pkg/scheduler/generic_scheduler.go](../../plugin/pkg/scheduler/generic_scheduler.go) +[plugin/pkg/scheduler/generic_scheduler.go](http://releases.k8s.io/HEAD/plugin/pkg/scheduler/generic_scheduler.go) ## Scheduler extensibility The scheduler is extensible: the cluster administrator can choose which of the pre-defined scheduling policies to apply, and can add new ones. The built-in predicates and priorities are -defined in [plugin/pkg/scheduler/algorithm/predicates/predicates.go](../../plugin/pkg/scheduler/algorithm/predicates/predicates.go) and -[plugin/pkg/scheduler/algorithm/priorities/priorities.go](../../plugin/pkg/scheduler/algorithm/priorities/priorities.go), respectively. +defined in [plugin/pkg/scheduler/algorithm/predicates/predicates.go](http://releases.k8s.io/HEAD/plugin/pkg/scheduler/algorithm/predicates/predicates.go) and +[plugin/pkg/scheduler/algorithm/priorities/priorities.go](http://releases.k8s.io/HEAD/plugin/pkg/scheduler/algorithm/priorities/priorities.go), respectively. The policies that are applied when scheduling can be chosen in one of two ways. Normally, the policies used are selected by the functions `defaultPredicates()` and `defaultPriorities()` in -[plugin/pkg/scheduler/algorithmprovider/defaults/defaults.go](../../plugin/pkg/scheduler/algorithmprovider/defaults/defaults.go). +[plugin/pkg/scheduler/algorithmprovider/defaults/defaults.go](http://releases.k8s.io/HEAD/plugin/pkg/scheduler/algorithmprovider/defaults/defaults.go). However, the choice of policies can be overridden by passing the command-line flag `--policy-config-file` to the scheduler, pointing to a JSON file specifying which scheduling policies to use. See [examples/scheduler-policy-config.json](../../examples/scheduler-policy-config.json) for an example config file. (Note that the config file format is versioned; the API is defined in -[plugin/pkg/scheduler/api](../../plugin/pkg/scheduler/api/)). +[plugin/pkg/scheduler/api](http://releases.k8s.io/HEAD/plugin/pkg/scheduler/api/)). Thus to add a new scheduling policy, you should modify predicates.go or priorities.go, and either register the policy in `defaultPredicates()` or `defaultPriorities()`, or use a policy config file. ## Exploring the code If you want to get a global picture of how the scheduler works, you can start in -[plugin/cmd/kube-scheduler/app/server.go](../../plugin/cmd/kube-scheduler/app/server.go) +[plugin/cmd/kube-scheduler/app/server.go](http://releases.k8s.io/HEAD/plugin/cmd/kube-scheduler/app/server.go) diff --git a/scheduler_algorithm.md b/scheduler_algorithm.md index e73e4f27..c67bcdbf 100644 --- a/scheduler_algorithm.md +++ b/scheduler_algorithm.md @@ -31,40 +31,40 @@ Documentation for other releases can be found at -# Scheduler Algorithm in Kubernetes - -For each unscheduled Pod, the Kubernetes scheduler tries to find a node across the cluster according to a set of rules. A general introduction to the Kubernetes scheduler can be found at [scheduler.md](scheduler.md). In this document, the algorithm of how to select a node for the Pod is explained. There are two steps before a destination node of a Pod is chosen. The first step is filtering all the nodes and the second is ranking the remaining nodes to find a best fit for the Pod. - -## Filtering the nodes - -The purpose of filtering the nodes is to filter out the nodes that do not meet certain requirements of the Pod. For example, if the free resource on a node (measured by the capacity minus the sum of the resource limits of all the Pods that already run on the node) is less than the Pod's required resource, the node should not be considered in the ranking phase so it is filtered out. Currently, there are several "predicates" implementing different filtering policies, including: - -- `NoDiskConflict`: Evaluate if a pod can fit due to the volumes it requests, and those that are already mounted. -- `PodFitsResources`: Check if the free resource (CPU and Memory) meets the requirement of the Pod. The free resource is measured by the capacity minus the sum of limits of all Pods on the node. -- `PodFitsPorts`: Check if any HostPort required by the Pod is already occupied on the node. -- `PodFitsHost`: Filter out all nodes except the one specified in the PodSpec's NodeName field. -- `PodSelectorMatches`: Check if the labels of the node match the labels specified in the Pod's `nodeSelector` field ([Here](../user-guide/node-selection/) is an example of how to use `nodeSelector` field). -- `CheckNodeLabelPresence`: Check if all the specified labels exist on a node or not, regardless of the value. - -The details of the above predicates can be found in [plugin/pkg/scheduler/algorithm/predicates/predicates.go](../../plugin/pkg/scheduler/algorithm/predicates/predicates.go). All predicates mentioned above can be used in combination to perform a sophisticated filtering policy. Kubernetes uses some, but not all, of these predicates by default. You can see which ones are used by default in [plugin/pkg/scheduler/algorithmprovider/defaults/defaults.go](../../plugin/pkg/scheduler/algorithmprovider/defaults/defaults.go). - -## Ranking the nodes - -The filtered nodes are considered suitable to host the Pod, and it is often that there are more than one nodes remaining. Kubernetes prioritizes the remaining nodes to find the "best" one for the Pod. The prioritization is performed by a set of priority functions. For each remaining node, a priority function gives a score which scales from 0-10 with 10 representing for "most preferred" and 0 for "least preferred". Each priority function is weighted by a positive number and the final score of each node is calculated by adding up all the weighted scores. For example, suppose there are two priority functions, `priorityFunc1` and `priorityFunc2` with weighting factors `weight1` and `weight2` respectively, the final score of some NodeA is: - - finalScoreNodeA = (weight1 * priorityFunc1) + (weight2 * priorityFunc2) - -After the scores of all nodes are calculated, the node with highest score is chosen as the host of the Pod. If there are more than one nodes with equal highest scores, a random one among them is chosen. - -Currently, Kubernetes scheduler provides some practical priority functions, including: - -- `LeastRequestedPriority`: The node is prioritized based on the fraction of the node that would be free if the new Pod were scheduled onto the node. (In other words, (capacity - sum of limits of all Pods already on the node - limit of Pod that is being scheduled) / capacity). CPU and memory are equally weighted. The node with the highest free fraction is the most preferred. Note that this priority function has the effect of spreading Pods across the nodes with respect to resource consumption. -- `CalculateNodeLabelPriority`: Prefer nodes that have the specified label. -- `BalancedResourceAllocation`: This priority function tries to put the Pod on a node such that the CPU and Memory utilization rate is balanced after the Pod is deployed. -- `CalculateSpreadPriority`: Spread Pods by minimizing the number of Pods belonging to the same service on the same node. -- `CalculateAntiAffinityPriority`: Spread Pods by minimizing the number of Pods belonging to the same service on nodes with the same value for a particular label. - -The details of the above priority functions can be found in [plugin/pkg/scheduler/algorithm/priorities](../../plugin/pkg/scheduler/algorithm/priorities/). Kubernetes uses some, but not all, of these priority functions by default. You can see which ones are used by default in [plugin/pkg/scheduler/algorithmprovider/defaults/defaults.go](../../plugin/pkg/scheduler/algorithmprovider/defaults/defaults.go). Similar as predicates, you can combine the above priority functions and assign weight factors (positive number) to them as you want (check [scheduler.md](scheduler.md) for how to customize). +# Scheduler Algorithm in Kubernetes + +For each unscheduled Pod, the Kubernetes scheduler tries to find a node across the cluster according to a set of rules. A general introduction to the Kubernetes scheduler can be found at [scheduler.md](scheduler.md). In this document, the algorithm of how to select a node for the Pod is explained. There are two steps before a destination node of a Pod is chosen. The first step is filtering all the nodes and the second is ranking the remaining nodes to find a best fit for the Pod. + +## Filtering the nodes + +The purpose of filtering the nodes is to filter out the nodes that do not meet certain requirements of the Pod. For example, if the free resource on a node (measured by the capacity minus the sum of the resource limits of all the Pods that already run on the node) is less than the Pod's required resource, the node should not be considered in the ranking phase so it is filtered out. Currently, there are several "predicates" implementing different filtering policies, including: + +- `NoDiskConflict`: Evaluate if a pod can fit due to the volumes it requests, and those that are already mounted. +- `PodFitsResources`: Check if the free resource (CPU and Memory) meets the requirement of the Pod. The free resource is measured by the capacity minus the sum of limits of all Pods on the node. +- `PodFitsPorts`: Check if any HostPort required by the Pod is already occupied on the node. +- `PodFitsHost`: Filter out all nodes except the one specified in the PodSpec's NodeName field. +- `PodSelectorMatches`: Check if the labels of the node match the labels specified in the Pod's `nodeSelector` field ([Here](../user-guide/node-selection/) is an example of how to use `nodeSelector` field). +- `CheckNodeLabelPresence`: Check if all the specified labels exist on a node or not, regardless of the value. + +The details of the above predicates can be found in [plugin/pkg/scheduler/algorithm/predicates/predicates.go](http://releases.k8s.io/HEAD/plugin/pkg/scheduler/algorithm/predicates/predicates.go). All predicates mentioned above can be used in combination to perform a sophisticated filtering policy. Kubernetes uses some, but not all, of these predicates by default. You can see which ones are used by default in [plugin/pkg/scheduler/algorithmprovider/defaults/defaults.go](http://releases.k8s.io/HEAD/plugin/pkg/scheduler/algorithmprovider/defaults/defaults.go). + +## Ranking the nodes + +The filtered nodes are considered suitable to host the Pod, and it is often that there are more than one nodes remaining. Kubernetes prioritizes the remaining nodes to find the "best" one for the Pod. The prioritization is performed by a set of priority functions. For each remaining node, a priority function gives a score which scales from 0-10 with 10 representing for "most preferred" and 0 for "least preferred". Each priority function is weighted by a positive number and the final score of each node is calculated by adding up all the weighted scores. For example, suppose there are two priority functions, `priorityFunc1` and `priorityFunc2` with weighting factors `weight1` and `weight2` respectively, the final score of some NodeA is: + + finalScoreNodeA = (weight1 * priorityFunc1) + (weight2 * priorityFunc2) + +After the scores of all nodes are calculated, the node with highest score is chosen as the host of the Pod. If there are more than one nodes with equal highest scores, a random one among them is chosen. + +Currently, Kubernetes scheduler provides some practical priority functions, including: + +- `LeastRequestedPriority`: The node is prioritized based on the fraction of the node that would be free if the new Pod were scheduled onto the node. (In other words, (capacity - sum of limits of all Pods already on the node - limit of Pod that is being scheduled) / capacity). CPU and memory are equally weighted. The node with the highest free fraction is the most preferred. Note that this priority function has the effect of spreading Pods across the nodes with respect to resource consumption. +- `CalculateNodeLabelPriority`: Prefer nodes that have the specified label. +- `BalancedResourceAllocation`: This priority function tries to put the Pod on a node such that the CPU and Memory utilization rate is balanced after the Pod is deployed. +- `CalculateSpreadPriority`: Spread Pods by minimizing the number of Pods belonging to the same service on the same node. +- `CalculateAntiAffinityPriority`: Spread Pods by minimizing the number of Pods belonging to the same service on nodes with the same value for a particular label. + +The details of the above priority functions can be found in [plugin/pkg/scheduler/algorithm/priorities](http://releases.k8s.io/HEAD/plugin/pkg/scheduler/algorithm/priorities/). Kubernetes uses some, but not all, of these priority functions by default. You can see which ones are used by default in [plugin/pkg/scheduler/algorithmprovider/defaults/defaults.go](http://releases.k8s.io/HEAD/plugin/pkg/scheduler/algorithmprovider/defaults/defaults.go). Similar as predicates, you can combine the above priority functions and assign weight factors (positive number) to them as you want (check [scheduler.md](scheduler.md) for how to customize). -- cgit v1.2.3 From 19a1346560fc7b5681e29427e9c1899b5c551b24 Mon Sep 17 00:00:00 2001 From: Tim Hockin Date: Mon, 20 Jul 2015 09:40:32 -0700 Subject: Collected markedown fixes around syntax. --- admission_control_resource_quota.md | 1 - event_compression.md | 1 - 2 files changed, 2 deletions(-) diff --git a/admission_control_resource_quota.md b/admission_control_resource_quota.md index 1cc81771..c86577ac 100644 --- a/admission_control_resource_quota.md +++ b/admission_control_resource_quota.md @@ -100,7 +100,6 @@ type ResourceQuotaList struct { // Items is a list of ResourceQuota objects Items []ResourceQuota `json:"items"` } - ``` ## AdmissionControl plugin: ResourceQuota diff --git a/event_compression.md b/event_compression.md index aea04e41..bfa2c5d6 100644 --- a/event_compression.md +++ b/event_compression.md @@ -103,7 +103,6 @@ Thu, 12 Feb 2015 01:13:05 +0000 Thu, 12 Feb 2015 01:13:12 +0000 4 Thu, 12 Feb 2015 01:13:05 +0000 Thu, 12 Feb 2015 01:13:12 +0000 4 monitoring-heapster-controller-oh43e Pod failedScheduling {scheduler } Error scheduling: no minions available to schedule pods Thu, 12 Feb 2015 01:13:20 +0000 Thu, 12 Feb 2015 01:13:20 +0000 1 kibana-logging-controller-gziey BoundPod implicitly required container POD pulled {kubelet kubernetes-minion-4.c.saad-dev-vms.internal} Successfully pulled image "kubernetes/pause:latest" Thu, 12 Feb 2015 01:13:20 +0000 Thu, 12 Feb 2015 01:13:20 +0000 1 kibana-logging-controller-gziey Pod scheduled {scheduler } Successfully assigned kibana-logging-controller-gziey to kubernetes-minion-4.c.saad-dev-vms.internal - ``` This demonstrates what would have been 20 separate entries (indicating scheduling failure) collapsed/compressed down to 5 entries. -- cgit v1.2.3 From 51f581c03534c250238c6ec0531fc2c1f0f70f95 Mon Sep 17 00:00:00 2001 From: Alex Robinson Date: Mon, 20 Jul 2015 13:45:36 -0700 Subject: Fix capitalization of Kubernetes in the documentation. --- access.md | 2 +- clustering.md | 2 +- expansion.md | 2 +- secrets.md | 4 ++-- security.md | 6 +++--- service_accounts.md | 12 ++++++------ 6 files changed, 14 insertions(+), 14 deletions(-) diff --git a/access.md b/access.md index 9a0c0d3d..d2fe44ca 100644 --- a/access.md +++ b/access.md @@ -200,7 +200,7 @@ Namespaces versus userAccount vs Labels: Goals for K8s authentication: - Include a built-in authentication system with no configuration required to use in single-user mode, and little configuration required to add several user accounts, and no https proxy required. -- Allow for authentication to be handled by a system external to Kubernetes, to allow integration with existing to enterprise authorization systems. The kubernetes namespace itself should avoid taking contributions of multiple authorization schemes. Instead, a trusted proxy in front of the apiserver can be used to authenticate users. +- Allow for authentication to be handled by a system external to Kubernetes, to allow integration with existing to enterprise authorization systems. The Kubernetes namespace itself should avoid taking contributions of multiple authorization schemes. Instead, a trusted proxy in front of the apiserver can be used to authenticate users. - For organizations whose security requirements only allow FIPS compliant implementations (e.g. apache) for authentication. - So the proxy can terminate SSL, and isolate the CA-signed certificate from less trusted, higher-touch APIserver. - For organizations that already have existing SaaS web services (e.g. storage, VMs) and want a common authentication portal. diff --git a/clustering.md b/clustering.md index 1fcb8aa3..757c1f0b 100644 --- a/clustering.md +++ b/clustering.md @@ -36,7 +36,7 @@ Documentation for other releases can be found at ## Overview -The term "clustering" refers to the process of having all members of the kubernetes cluster find and trust each other. There are multiple different ways to achieve clustering with different security and usability profiles. This document attempts to lay out the user experiences for clustering that Kubernetes aims to address. +The term "clustering" refers to the process of having all members of the Kubernetes cluster find and trust each other. There are multiple different ways to achieve clustering with different security and usability profiles. This document attempts to lay out the user experiences for clustering that Kubernetes aims to address. Once a cluster is established, the following is true: diff --git a/expansion.md b/expansion.md index 096b8a9d..75c748ca 100644 --- a/expansion.md +++ b/expansion.md @@ -94,7 +94,7 @@ script that sets up the environment and runs the command. This has a number of 1. Solutions that require a shell are unfriendly to images that do not contain a shell 2. Wrapper scripts make it harder to use images as base images -3. Wrapper scripts increase coupling to kubernetes +3. Wrapper scripts increase coupling to Kubernetes Users should be able to do the 80% case of variable expansion in command without writing a wrapper script or adding a shell invocation to their containers' commands. diff --git a/secrets.md b/secrets.md index 876a9390..f5793133 100644 --- a/secrets.md +++ b/secrets.md @@ -81,7 +81,7 @@ Goals of this design: the kubelet implement some reserved behaviors based on the types of secrets the service account consumes: 1. Use credentials for a docker registry to pull the pod's docker image - 2. Present kubernetes auth token to the pod or transparently decorate traffic between the pod + 2. Present Kubernetes auth token to the pod or transparently decorate traffic between the pod and master service 4. As a user, I want to be able to indicate that a secret expires and for that secret's value to be rotated once it expires, so that the system can help me follow good practices @@ -112,7 +112,7 @@ other system components to take action based on the secret's type. #### Example: service account consumes auth token secret As an example, the service account proposal discusses service accounts consuming secrets which -contain kubernetes auth tokens. When a Kubelet starts a pod associated with a service account +contain Kubernetes auth tokens. When a Kubelet starts a pod associated with a service account which consumes this type of secret, the Kubelet may take a number of actions: 1. Expose the secret in a `.kubernetes_auth` file in a well-known location in the container's diff --git a/security.md b/security.md index 522ff4ca..1d73a529 100644 --- a/security.md +++ b/security.md @@ -55,14 +55,14 @@ While Kubernetes today is not primarily a multi-tenant system, the long term evo We define "user" as a unique identity accessing the Kubernetes API server, which may be a human or an automated process. Human users fall into the following categories: -1. k8s admin - administers a kubernetes cluster and has access to the underlying components of the system +1. k8s admin - administers a Kubernetes cluster and has access to the underlying components of the system 2. k8s project administrator - administrates the security of a small subset of the cluster -3. k8s developer - launches pods on a kubernetes cluster and consumes cluster resources +3. k8s developer - launches pods on a Kubernetes cluster and consumes cluster resources Automated process users fall into the following categories: 1. k8s container user - a user that processes running inside a container (on the cluster) can use to access other cluster resources independent of the human users attached to a project -2. k8s infrastructure user - the user that kubernetes infrastructure components use to perform cluster functions with clearly defined roles +2. k8s infrastructure user - the user that Kubernetes infrastructure components use to perform cluster functions with clearly defined roles ### Description of roles diff --git a/service_accounts.md b/service_accounts.md index d9535de5..8e63e045 100644 --- a/service_accounts.md +++ b/service_accounts.md @@ -76,7 +76,7 @@ type ServiceAccount struct { ``` The name ServiceAccount is chosen because it is widely used already (e.g. by Kerberos and LDAP) -to refer to this type of account. Note that it has no relation to kubernetes Service objects. +to refer to this type of account. Note that it has no relation to Kubernetes Service objects. The ServiceAccount object does not include any information that could not be defined separately: - username can be defined however users are defined. @@ -90,12 +90,12 @@ These features are explained later. ### Names -From the standpoint of the Kubernetes API, a `user` is any principal which can authenticate to kubernetes API. +From the standpoint of the Kubernetes API, a `user` is any principal which can authenticate to Kubernetes API. This includes a human running `kubectl` on her desktop and a container in a Pod on a Node making API calls. -There is already a notion of a username in kubernetes, which is populated into a request context after authentication. +There is already a notion of a username in Kubernetes, which is populated into a request context after authentication. However, there is no API object representing a user. While this may evolve, it is expected that in mature installations, -the canonical storage of user identifiers will be handled by a system external to kubernetes. +the canonical storage of user identifiers will be handled by a system external to Kubernetes. Kubernetes does not dictate how to divide up the space of user identifier strings. User names can be simple Unix-style short usernames, (e.g. `alice`), or may be qualified to allow for federated identity ( @@ -104,7 +104,7 @@ accounts (e.g. `alice@example.com` vs `build-service-account-a3b7f0@foo-namespac but Kubernetes does not require this. Kubernetes also does not require that there be a distinction between human and Pod users. It will be possible -to setup a cluster where Alice the human talks to the kubernetes API as username `alice` and starts pods that +to setup a cluster where Alice the human talks to the Kubernetes API as username `alice` and starts pods that also talk to the API as user `alice` and write files to NFS as user `alice`. But, this is not recommended. Instead, it is recommended that Pods and Humans have distinct identities, and reference implementations will @@ -153,7 +153,7 @@ get a `Secret` which allows them to authenticate to the Kubernetes APIserver as policy that is desired can be applied to them. A higher level workflow is needed to coordinate creation of serviceAccounts, secrets and relevant policy objects. -Users are free to extend kubernetes to put this business logic wherever is convenient for them, though the +Users are free to extend Kubernetes to put this business logic wherever is convenient for them, though the Service Account Finalizer is one place where this can happen (see below). ### Kubelet -- cgit v1.2.3 From 68f5167d9ca0bb17200d638df0662ff2ac749878 Mon Sep 17 00:00:00 2001 From: Alex Robinson Date: Mon, 20 Jul 2015 13:45:36 -0700 Subject: Fix capitalization of Kubernetes in the documentation. --- federation.md | 2 +- high-availability.md | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/federation.md b/federation.md index 8de05a9c..99dbe904 100644 --- a/federation.md +++ b/federation.md @@ -354,7 +354,7 @@ time. This is closely related to location affinity above, and also discussed there. The basic idea is that some controller, logically outside of -the basic kubernetes control plane of the clusters in question, needs +the basic Kubernetes control plane of the clusters in question, needs to be able to: 1. Receive "global" resource creation requests. diff --git a/high-availability.md b/high-availability.md index ecb9966e..6318921e 100644 --- a/high-availability.md +++ b/high-availability.md @@ -33,13 +33,13 @@ Documentation for other releases can be found at # High Availability of Scheduling and Controller Components in Kubernetes -This document serves as a proposal for high availability of the scheduler and controller components in kubernetes. This proposal is intended to provide a simple High Availability api for kubernetes components with the potential to extend to services running on kubernetes. Those services would be subject to their own constraints. +This document serves as a proposal for high availability of the scheduler and controller components in Kubernetes. This proposal is intended to provide a simple High Availability api for Kubernetes components with the potential to extend to services running on Kubernetes. Those services would be subject to their own constraints. ## Design Options For complete reference see [this](https://www.ibm.com/developerworks/community/blogs/RohitShetty/entry/high_availability_cold_warm_hot?lang=en) -1. Hot Standby: In this scenario, data and state are shared between the two components such that an immediate failure in one component causes the standby daemon to take over exactly where the failed component had left off. This would be an ideal solution for kubernetes, however it poses a series of challenges in the case of controllers where component-state is cached locally and not persisted in a transactional way to a storage facility. This would also introduce additional load on the apiserver, which is not desirable. As a result, we are **NOT** planning on this approach at this time. +1. Hot Standby: In this scenario, data and state are shared between the two components such that an immediate failure in one component causes the standby daemon to take over exactly where the failed component had left off. This would be an ideal solution for Kubernetes, however it poses a series of challenges in the case of controllers where component-state is cached locally and not persisted in a transactional way to a storage facility. This would also introduce additional load on the apiserver, which is not desirable. As a result, we are **NOT** planning on this approach at this time. 2. **Warm Standby**: In this scenario there is only one active component acting as the master and additional components running but not providing service or responding to requests. Data and state are not shared between the active and standby components. When a failure occurs, the standby component that becomes the master must determine the current state of the system before resuming functionality. This is the approach that this proposal will leverage. -- cgit v1.2.3 From e0554bbf167b4c0d315fda4a3ddd9511460064c1 Mon Sep 17 00:00:00 2001 From: Alex Robinson Date: Mon, 20 Jul 2015 13:45:36 -0700 Subject: Fix capitalization of Kubernetes in the documentation. --- README.md | 2 +- api-conventions.md | 4 ++-- client-libraries.md | 2 +- development.md | 4 ++-- writing-a-getting-started-guide.md | 2 +- 5 files changed, 7 insertions(+), 7 deletions(-) diff --git a/README.md b/README.md index 9a73d949..267bca23 100644 --- a/README.md +++ b/README.md @@ -34,7 +34,7 @@ Documentation for other releases can be found at # Kubernetes Developer Guide The developer guide is for anyone wanting to either write code which directly accesses the -kubernetes API, or to contribute directly to the kubernetes project. +Kubernetes API, or to contribute directly to the Kubernetes project. It assumes some familiarity with concepts in the [User Guide](../user-guide/README.md) and the [Cluster Admin Guide](../admin/README.md). diff --git a/api-conventions.md b/api-conventions.md index 8b2216cd..8889b721 100644 --- a/api-conventions.md +++ b/api-conventions.md @@ -35,8 +35,8 @@ API Conventions Updated: 4/16/2015 -*This document is oriented at users who want a deeper understanding of the kubernetes -API structure, and developers wanting to extend the kubernetes API. An introduction to +*This document is oriented at users who want a deeper understanding of the Kubernetes +API structure, and developers wanting to extend the Kubernetes API. An introduction to using resources with kubectl can be found in (working_with_resources.md).* **Table of Contents** diff --git a/client-libraries.md b/client-libraries.md index e41c6514..9e41688c 100644 --- a/client-libraries.md +++ b/client-libraries.md @@ -31,7 +31,7 @@ Documentation for other releases can be found at -## kubernetes API client libraries +## Kubernetes API client libraries ### Supported diff --git a/development.md b/development.md index f5233a0e..27cb034d 100644 --- a/development.md +++ b/development.md @@ -56,7 +56,7 @@ Below, we outline one of the more common git workflows that core developers use. ### Clone your fork -The commands below require that you have $GOPATH set ([$GOPATH docs](https://golang.org/doc/code.html#GOPATH)). We highly recommend you put kubernetes' code into your GOPATH. Note: the commands below will not work if there is more than one directory in your `$GOPATH`. +The commands below require that you have $GOPATH set ([$GOPATH docs](https://golang.org/doc/code.html#GOPATH)). We highly recommend you put Kubernetes' code into your GOPATH. Note: the commands below will not work if there is more than one directory in your `$GOPATH`. ```sh mkdir -p $GOPATH/src/github.com/GoogleCloudPlatform/ @@ -207,7 +207,7 @@ godep go test ./... If you only want to run unit tests in one package, you could run ``godep go test`` under the package directory. For example, the following commands will run all unit tests in package kubelet: ```console -$ cd kubernetes # step into kubernetes' directory. +$ cd kubernetes # step into the kubernetes directory. $ cd pkg/kubelet $ godep go test # some output from unit tests diff --git a/writing-a-getting-started-guide.md b/writing-a-getting-started-guide.md index c22d9204..40f513be 100644 --- a/writing-a-getting-started-guide.md +++ b/writing-a-getting-started-guide.md @@ -66,7 +66,7 @@ These guidelines say *what* to do. See the Rationale section for *why*. - We may ask that you host binary assets or large amounts of code in our `contrib` directory or on your own repo. - Add or update a row in [The Matrix](../../docs/getting-started-guides/README.md). - - State the binary version of kubernetes that you tested clearly in your Guide doc. + - State the binary version of Kubernetes that you tested clearly in your Guide doc. - Setup a cluster and run the [conformance test](development.md#conformance-testing) against it, and report the results in your PR. - Versioned distros should typically not modify or add code in `cluster/`. That is just scripts for developer -- cgit v1.2.3 From 39c004737b4cd86da696231aa55c4d8eabb11994 Mon Sep 17 00:00:00 2001 From: Brian Grant Date: Wed, 22 Jul 2015 20:16:41 +0000 Subject: Update post-1.0 release versioning proposal. --- versioning.md | 15 +++++++-------- 1 file changed, 7 insertions(+), 8 deletions(-) diff --git a/versioning.md b/versioning.md index 8a547242..5d1cec0e 100644 --- a/versioning.md +++ b/versioning.md @@ -40,14 +40,13 @@ Legend: ## Release Timeline -### Minor version timeline - -* Kube 1.0.0 -* Kube 1.0.x: We create a 1.0-patch branch and backport critical bugs and security issues to it. Patch releases occur as needed. -* Kube 1.1-alpha1: Cut from HEAD, smoke tested and released two weeks after Kube 1.0's release. Roughly every two weeks a new alpha is released from HEAD. The timeline is flexible; for example, if there is a critical bugfix, a new alpha can be released ahead of schedule. (This applies to the beta and rc releases as well.) -* Kube 1.1-beta1: When HEAD is feature complete, we create a 1.1-snapshot branch and release it as a beta. (The 1.1-snapshot branch may be created earlier if something that definitely won't be in 1.1 needs to be merged to HEAD.) This should occur 6-8 weeks after Kube 1.0. Development continues at HEAD and only fixes are backported to 1.1-snapshot. -* Kube 1.1-rc1: Released from 1.1-snapshot when it is considered stable and ready for testing. Most users should be able to upgrade to this version in production. -* Kube 1.1: Final release. Should occur between 3 and 4 months after 1.0. +### Minor version scheme and timeline + +* Kube 1.0.0, 1.0.1 -- DONE! +* Kube 1.0.X (X>1): Standard operating procedure. We patch the release-1.0 branch as needed and increment the patch number. +* Kube 1.1alpha.X: Released roughly every two weeks by cutting from HEAD. No cherrypick releases. If there is a critical bugfix, a new release from HEAD can be created ahead of schedule. (This applies to the beta releases as well.) +* Kube 1.1beta.X: When HEAD is feature-complete, we go into code freeze 2 weeks prior to the desired 1.1.0 date and only merge PRs essential to 1.1. Releases continue to be cut from HEAD until we're essentially done. +* Kube 1.1.0: Final release. Should occur between 3 and 4 months after 1.0. ### Major version timeline -- cgit v1.2.3 From d6200d0d492377b82ca3afbec822230f857732d8 Mon Sep 17 00:00:00 2001 From: Janet Kuo Date: Wed, 22 Jul 2015 17:16:28 -0700 Subject: Fix doc typos --- networking.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/networking.md b/networking.md index b1d5a460..dfe0f93e 100644 --- a/networking.md +++ b/networking.md @@ -87,7 +87,7 @@ whereas, in general, they don't control what pods land together on a host. ## Pod to pod Because every pod gets a "real" (not machine-private) IP address, pods can -communicate without proxies or translations. The can use well-known port +communicate without proxies or translations. The pod can use well-known port numbers and can avoid the use of higher-level service discovery systems like DNS-SD, Consul, or Etcd. -- cgit v1.2.3 From bd03d6d49788d5dd62e686dcaa3f641b964cea58 Mon Sep 17 00:00:00 2001 From: Brian Grant Date: Thu, 23 Jul 2015 00:42:03 +0000 Subject: Change to semantic versioning. --- versioning.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/versioning.md b/versioning.md index 5d1cec0e..9009dc59 100644 --- a/versioning.md +++ b/versioning.md @@ -44,8 +44,8 @@ Legend: * Kube 1.0.0, 1.0.1 -- DONE! * Kube 1.0.X (X>1): Standard operating procedure. We patch the release-1.0 branch as needed and increment the patch number. -* Kube 1.1alpha.X: Released roughly every two weeks by cutting from HEAD. No cherrypick releases. If there is a critical bugfix, a new release from HEAD can be created ahead of schedule. (This applies to the beta releases as well.) -* Kube 1.1beta.X: When HEAD is feature-complete, we go into code freeze 2 weeks prior to the desired 1.1.0 date and only merge PRs essential to 1.1. Releases continue to be cut from HEAD until we're essentially done. +* Kube 1.1.0-alpha.X: Released roughly every two weeks by cutting from HEAD. No cherrypick releases. If there is a critical bugfix, a new release from HEAD can be created ahead of schedule. (This applies to the beta releases as well.) +* Kube 1.1.0-beta.X: When HEAD is feature-complete, we go into code freeze 2 weeks prior to the desired 1.1.0 date and only merge PRs essential to 1.1. Releases continue to be cut from HEAD until we're essentially done. * Kube 1.1.0: Final release. Should occur between 3 and 4 months after 1.0. ### Major version timeline -- cgit v1.2.3 From 523c886d8abfd0d15d6d43bd6a74bce4c1307357 Mon Sep 17 00:00:00 2001 From: Wojciech Tyczynski Date: Thu, 2 Jul 2015 09:56:54 +0200 Subject: Watch in apiserver proposal --- apiserver_watch.md | 184 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 184 insertions(+) create mode 100644 apiserver_watch.md diff --git a/apiserver_watch.md b/apiserver_watch.md new file mode 100644 index 00000000..10ae98f1 --- /dev/null +++ b/apiserver_watch.md @@ -0,0 +1,184 @@ + + + + +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest 1.0.x release of this document can be found +[here](http://releases.k8s.io/release-1.0/docs/proposals/apiserver_watch.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- + + + + + +## Abstract + +In the current system, all watch requests send to apiserver are in general +redirected to etcd. This means that for every watch request to apiserver, +apiserver opens a watch on etcd. + +The purpose of the proposal is to improve the overall performance of the system +by solving the following problems: + +- having too many open watches on etcd +- avoiding deserializing/converting the same objects multiple times in different +watch results + +In the future, we would also like to add an indexing mechanism to the watch. +Although Indexer is not part of this proposal, it is supposed to be compatible +with it - in the future Indexer should be incorporated into the proposed new +watch solution in apiserver without requiring any redesign. + + +## High level design + +We are going to solve those problems by allowing many clients to watch the same +storage in the apiserver, without being redirected to etcd. + +At the high level, apiserver will have a single watch open to etcd, watching all +the objects (of a given type) without any filtering. The changes delivered from +etcd will then be stored in a cache in apiserver. This cache is in fact a +"rolling history window" that will support clients having some amount of latency +between their list and watch calls. Thus it will have a limited capacity and +whenever a new change comes from etcd when a cache is full, othe oldest change +will be remove to make place for the new one. + +When a client sends a watch request to apiserver, instead of redirecting it to +etcd, it will cause: + + - registering a handler to receive all new changes coming from etcd + - iteratiting though a watch window, starting at the requested resourceVersion + to the head and sending filetered changes directory to the client, blocking + the above until this iteration has caught up + +This will be done be creating a go-routine per watcher that will be responsible +for performing the above. + +The following section describes the proposal in more details, analizes some +corner cases and divides the whole design in more fine-grained steps. + + +## Proposal details + +We would like the cache to be __per-resource-type__ and __optional__. Thanks to +it we will be able to: + - have different cache sizes for different resources (e.g. bigger cache + [= longer history] for pods, which can significantly affect performance) + - avoid any overhead for objects that are watched very rarely (e.g. events + are almost not watched at all, but there are a lot of them) + - filter the cache for each watcher more effectively + +If we decide to support watches spanning different resources in the future and +we have an efficient indexing mechanisms, it should be relatively simple to unify +the cache to be common for all the resources. + +The rest of this section describes the concrete steps that need to be done +to implement the proposal. + +1. Since we want the watch in apiserver to be optional for different resource +types, this needs to be self-contained and hidden behind a well defined API. +This should be a layer very close to etcd - in particular all registries: +"pkg/registry/generic/etcd" should be build on top of it. +We will solve it by turning tools.EtcdHelper by extracting its interface +and treating this interface as this API - the whole watch mechanisms in +apiserver will be hidden behind that interface. +Thanks to it we will get an initial implementation for free and we will just +need to reimplement few relevant functions (probably just Watch and List). +Mover, this will not require any changes in other parts of the code. +This step is about extracting the interface of tools.EtcdHelper. + +2. Create a FIFO cache with a given capacity. In its "rolling history windown" +we will store two things: + + - the resourceVersion of the object (being an etcdIndex) + - the object watched from etcd itself (in a deserialized form) + + This should be as simple as having an array an treating it as a cyclic buffer. + Obviously resourceVersion of objects watched from etcd will be increasing, but + they are necessary for registering a new watcher that is interested in all the + changes since a given etcdIndec. + + Additionally, we should support LIST operation, otherwise clients can never + start watching at now. We may consider passing lists through etcd, however + this will not work once we have Indexer, so we will need that information + in memory anyway. + Thus, we should support LIST operation from the "end of the history" - i.e. + from the moment just after the newest cached watched event. It should be + pretty simple to do, because we can incrementally update this list whenever + the new watch event is watched from etcd. + We may consider reusing existing structures cache.Store or cache.Indexer + ("pkg/client/cache") but this is not a hard requirement. + +3. Create a new implementation of the EtcdHelper interface, that will internally +have a single watch open to etcd and will store data received from etcd in the +FIFO cache. This includes implementing registration of a new watcher that will +start a new go-routine responsible for iterating over the cache and sending +appropriately filtered objects to the watcher. + +4. Create the new implementation of the API, that will internally have a +single watch open to etcd and will store the data received from etcd in +the FIFO cache - this includes implementing registration of a new watcher +which will start a new go-routine responsible for iterating over the cache +and sending all the objects watcher is interested in (by applying filtering +function) to the watcher. + +5. Add a support for processing "error too old" from etcd, which will require: + - disconnect all the watchers + - clear the internal cache and relist all objects from etcd + - start accepting watchers again + +6. Enable watch in apiserver for some of the existing resource types - this +should require only changes at the initialization level. + +7. The next step will be to incorporate some indexing mechanism, but details +of it are TBD. + + + +### Future optimizations: + +1. The implementation of watch in apiserver internally will open a single +watch to etcd, responsible for watching all the changes of objects of a given +resource type. However, this watch can potentially expire at any time and +reconnecting can return "too old resource version". In that case relisting is +necessary. In such case, to avoid LIST requests coming from all watchers at +the same time, we can introduce an additional etcd event type: +[EtcdResync](../../pkg/tools/etcd_helper_watch.go#L36) + + Whenever reslisting will be done to refresh the internal watch to etcd, + EtcdResync event will be send to all the watchers. It will contain the + full list of all the objects the watcher is interested in (appropriately + filtered) as the parameter of this watch event. + Thus, we need to create the EtcdResync event, extend watch.Interface and + its implementations to support it and handle those events appropriately + in places like + [Reflector](../../pkg/client/cache/reflector.go) + + However, this might turn out to be unnecessary optimization if apiserver + will always keep up (which is possible in the new design). We will work + out all necessary details at that point. + + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/apiserver_watch.md?pixel)]() + -- cgit v1.2.3 From 14c0fa29d0751c01edc3889ee32e4dcd80fb4e70 Mon Sep 17 00:00:00 2001 From: Wojciech Tyczynski Date: Mon, 27 Jul 2015 09:54:07 +0200 Subject: Factor out etcdWatcher to a separate file --- apiserver_watch.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/apiserver_watch.md b/apiserver_watch.md index 10ae98f1..ce866b6d 100644 --- a/apiserver_watch.md +++ b/apiserver_watch.md @@ -163,7 +163,7 @@ resource type. However, this watch can potentially expire at any time and reconnecting can return "too old resource version". In that case relisting is necessary. In such case, to avoid LIST requests coming from all watchers at the same time, we can introduce an additional etcd event type: -[EtcdResync](../../pkg/tools/etcd_helper_watch.go#L36) +[EtcdResync](../../pkg/tools/etcd_watcher.go#L36) Whenever reslisting will be done to refresh the internal watch to etcd, EtcdResync event will be send to all the watchers. It will contain the -- cgit v1.2.3 From bf3a0097fc5da8c41e6488198cc629879f7bfc20 Mon Sep 17 00:00:00 2001 From: Wojciech Tyczynski Date: Thu, 30 Jul 2015 13:27:18 +0200 Subject: Move etcd storage to pkg/storage/etcd --- apiserver_watch.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/apiserver_watch.md b/apiserver_watch.md index ce866b6d..5610ccbc 100644 --- a/apiserver_watch.md +++ b/apiserver_watch.md @@ -163,7 +163,7 @@ resource type. However, this watch can potentially expire at any time and reconnecting can return "too old resource version". In that case relisting is necessary. In such case, to avoid LIST requests coming from all watchers at the same time, we can introduce an additional etcd event type: -[EtcdResync](../../pkg/tools/etcd_watcher.go#L36) +[EtcdResync](../../pkg/storage/etcd/etcd_watcher.go#L36) Whenever reslisting will be done to refresh the internal watch to etcd, EtcdResync event will be send to all the watchers. It will contain the -- cgit v1.2.3 From e605969e9a2636a2a1c4c1f86c7ea9596bbf3174 Mon Sep 17 00:00:00 2001 From: Tim Hockin Date: Thu, 30 Jul 2015 15:11:38 -0700 Subject: Add a note on when to use commits --- development.md | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/development.md b/development.md index 27cb034d..87b4b5d0 100644 --- a/development.md +++ b/development.md @@ -99,6 +99,17 @@ git push -f origin myfeature 1. Visit http://github.com/$YOUR_GITHUB_USERNAME/kubernetes 2. Click the "Compare and pull request" button next to your "myfeature" branch. +### When to retain commits and when to squash + +Upon merge, all git commits should represent meaningful milestones or units of +work. Use commits to add clarity to the development and review process. + +Before merging a PR, squash any "fix review feedback", "typo", and "rebased" +sorts of commits. It is not imperative that every commit in a PR compile and +pass tests independently, but it is worth striving for. For mass automated +fixups (e.g. automated doc formatting), use one or more commits for the +changes to tooling and a final commit to apply the fixup en masse. This makes +reviews much easier. ## godep and dependency management -- cgit v1.2.3 From 39eedfac6dd5d287611b2b21d60af7a19560aae8 Mon Sep 17 00:00:00 2001 From: Eric Paris Date: Mon, 20 Jul 2015 11:45:58 -0500 Subject: Rewrite how the munger works The basic idea is that in the main mungedocs we run the entirefile and create an annotated set of lines about that file. All mungers then act on a struct mungeLines instead of on a bytes array. Making use of the metadata where appropriete. Helper functions exist to make updating a 'macro block' extremely easy. --- security_context.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/security_context.md b/security_context.md index 03213927..8a6dd314 100644 --- a/security_context.md +++ b/security_context.md @@ -114,7 +114,7 @@ It is recommended that this design be implemented in two phases: 2. Implement a security context structure that is part of a service account. The default context provider can then be used to apply a security context based on the service account associated with the pod. - + ### Security Context Provider The Kubelet will have an interface that points to a `SecurityContextProvider`. The `SecurityContextProvider` is invoked before creating and running a given container: -- cgit v1.2.3 From 9a5c3748cc3469907bef1c8b053df544ed1d7f54 Mon Sep 17 00:00:00 2001 From: Eric Paris Date: Fri, 24 Jul 2015 17:52:18 -0400 Subject: Fix trailing whitespace in all docs --- api-conventions.md | 6 +++--- api_changes.md | 2 +- collab.md | 2 +- scheduler_algorithm.md | 4 ++-- writing-a-getting-started-guide.md | 14 +++++++------- 5 files changed, 14 insertions(+), 14 deletions(-) diff --git a/api-conventions.md b/api-conventions.md index 8889b721..5a1bfe81 100644 --- a/api-conventions.md +++ b/api-conventions.md @@ -173,11 +173,11 @@ Objects that contain both spec and status should not contain additional top-leve ##### Typical status properties * **phase**: The phase is a simple, high-level summary of the phase of the lifecycle of an object. The phase should progress monotonically. Typical phase values are `Pending` (not yet fully physically realized), `Running` or `Active` (fully realized and active, but not necessarily operating correctly), and `Terminated` (no longer active), but may vary slightly for different types of objects. New phase values should not be added to existing objects in the future. Like other status fields, it must be possible to ascertain the lifecycle phase by observation. Additional details regarding the current phase may be contained in other fields. -* **conditions**: Conditions represent orthogonal observations of an object's current state. Objects may report multiple conditions, and new types of conditions may be added in the future. Condition status values may be `True`, `False`, or `Unknown`. Unlike the phase, conditions are not expected to be monotonic -- their values may change back and forth. A typical condition type is `Ready`, which indicates the object was believed to be fully operational at the time it was last probed. Conditions may carry additional information, such as the last probe time or last transition time. +* **conditions**: Conditions represent orthogonal observations of an object's current state. Objects may report multiple conditions, and new types of conditions may be added in the future. Condition status values may be `True`, `False`, or `Unknown`. Unlike the phase, conditions are not expected to be monotonic -- their values may change back and forth. A typical condition type is `Ready`, which indicates the object was believed to be fully operational at the time it was last probed. Conditions may carry additional information, such as the last probe time or last transition time. TODO(@vishh): Reason and Message. -Phases and conditions are observations and not, themselves, state machines, nor do we define comprehensive state machines for objects with behaviors associated with state transitions. The system is level-based and should assume an Open World. Additionally, new observations and details about these observations may be added over time. +Phases and conditions are observations and not, themselves, state machines, nor do we define comprehensive state machines for objects with behaviors associated with state transitions. The system is level-based and should assume an Open World. Additionally, new observations and details about these observations may be added over time. In order to preserve extensibility, in the future, we intend to explicitly convey properties that users and components care about rather than requiring those properties to be inferred from observations. @@ -376,7 +376,7 @@ Late-initializers should only make the following types of modifications: - Adding keys to maps - Adding values to arrays which have mergeable semantics (`patchStrategy:"merge"` attribute in the type definition). - + These conventions: 1. allow a user (with sufficient privilege) to override any system-default behaviors by setting the fields that would otherwise have been defaulted. diff --git a/api_changes.md b/api_changes.md index d8e20014..687af00a 100644 --- a/api_changes.md +++ b/api_changes.md @@ -309,7 +309,7 @@ a panic from the `serialization_test`. If so, look at the diff it produces (or the backtrace in case of a panic) and figure out what you forgot. Encode that into the fuzzer's custom fuzz functions. Hint: if you added defaults for a field, that field will need to have a custom fuzz function that ensures that the field is -fuzzed to a non-empty value. +fuzzed to a non-empty value. The fuzzer can be found in `pkg/api/testing/fuzzer.go`. diff --git a/collab.md b/collab.md index 96db64c8..624b3bcb 100644 --- a/collab.md +++ b/collab.md @@ -61,7 +61,7 @@ Maintainers will do merges of appropriately reviewed-and-approved changes during There may be discussion an even approvals granted outside of the above hours, but merges will generally be deferred. -If a PR is considered complex or controversial, the merge of that PR should be delayed to give all interested parties in all timezones the opportunity to provide feedback. Concretely, this means that such PRs should be held for 24 +If a PR is considered complex or controversial, the merge of that PR should be delayed to give all interested parties in all timezones the opportunity to provide feedback. Concretely, this means that such PRs should be held for 24 hours before merging. Of course "complex" and "controversial" are left to the judgment of the people involved, but we trust that part of being a committer is the judgment required to evaluate such things honestly, and not be motivated by your desire (or your cube-mate's desire) to get their code merged. Also see "Holds" below, any reviewer can issue a "hold" to indicate that the PR is in fact complicated or complex and deserves further review. diff --git a/scheduler_algorithm.md b/scheduler_algorithm.md index c67bcdbf..ab8e69ef 100644 --- a/scheduler_algorithm.md +++ b/scheduler_algorithm.md @@ -44,7 +44,7 @@ The purpose of filtering the nodes is to filter out the nodes that do not meet c - `PodFitsPorts`: Check if any HostPort required by the Pod is already occupied on the node. - `PodFitsHost`: Filter out all nodes except the one specified in the PodSpec's NodeName field. - `PodSelectorMatches`: Check if the labels of the node match the labels specified in the Pod's `nodeSelector` field ([Here](../user-guide/node-selection/) is an example of how to use `nodeSelector` field). -- `CheckNodeLabelPresence`: Check if all the specified labels exist on a node or not, regardless of the value. +- `CheckNodeLabelPresence`: Check if all the specified labels exist on a node or not, regardless of the value. The details of the above predicates can be found in [plugin/pkg/scheduler/algorithm/predicates/predicates.go](http://releases.k8s.io/HEAD/plugin/pkg/scheduler/algorithm/predicates/predicates.go). All predicates mentioned above can be used in combination to perform a sophisticated filtering policy. Kubernetes uses some, but not all, of these predicates by default. You can see which ones are used by default in [plugin/pkg/scheduler/algorithmprovider/defaults/defaults.go](http://releases.k8s.io/HEAD/plugin/pkg/scheduler/algorithmprovider/defaults/defaults.go). @@ -53,7 +53,7 @@ The details of the above predicates can be found in [plugin/pkg/scheduler/algori The filtered nodes are considered suitable to host the Pod, and it is often that there are more than one nodes remaining. Kubernetes prioritizes the remaining nodes to find the "best" one for the Pod. The prioritization is performed by a set of priority functions. For each remaining node, a priority function gives a score which scales from 0-10 with 10 representing for "most preferred" and 0 for "least preferred". Each priority function is weighted by a positive number and the final score of each node is calculated by adding up all the weighted scores. For example, suppose there are two priority functions, `priorityFunc1` and `priorityFunc2` with weighting factors `weight1` and `weight2` respectively, the final score of some NodeA is: finalScoreNodeA = (weight1 * priorityFunc1) + (weight2 * priorityFunc2) - + After the scores of all nodes are calculated, the node with highest score is chosen as the host of the Pod. If there are more than one nodes with equal highest scores, a random one among them is chosen. Currently, Kubernetes scheduler provides some practical priority functions, including: diff --git a/writing-a-getting-started-guide.md b/writing-a-getting-started-guide.md index 40f513be..04d0d67f 100644 --- a/writing-a-getting-started-guide.md +++ b/writing-a-getting-started-guide.md @@ -70,7 +70,7 @@ These guidelines say *what* to do. See the Rationale section for *why*. - Setup a cluster and run the [conformance test](development.md#conformance-testing) against it, and report the results in your PR. - Versioned distros should typically not modify or add code in `cluster/`. That is just scripts for developer - distros. + distros. - When a new major or minor release of Kubernetes comes out, we may also release a new conformance test, and require a new conformance test run to earn a conformance checkmark. @@ -82,20 +82,20 @@ Just file an issue or chat us on IRC and one of the committers will link to it f These guidelines say *what* to do. See the Rationale section for *why*. - the main reason to add a new development distro is to support a new IaaS provider (VM and - network management). This means implementing a new `pkg/cloudprovider/$IAAS_NAME`. + network management). This means implementing a new `pkg/cloudprovider/$IAAS_NAME`. - Development distros should use Saltstack for Configuration Management. - development distros need to support automated cluster creation, deletion, upgrading, etc. This mean writing scripts in `cluster/$IAAS_NAME`. - all commits to the tip of this repo need to not break any of the development distros - the author of the change is responsible for making changes necessary on all the cloud-providers if the change affects any of them, and reverting the change if it breaks any of the CIs. - - a development distro needs to have an organization which owns it. This organization needs to: + - a development distro needs to have an organization which owns it. This organization needs to: - Setting up and maintaining Continuous Integration that runs e2e frequently (multiple times per day) against the Distro at head, and which notifies all devs of breakage. - being reasonably available for questions and assisting with refactoring and feature additions that affect code for their IaaS. -## Rationale +## Rationale - We want people to create Kubernetes clusters with whatever IaaS, Node OS, configuration management tools, and so on, which they are familiar with. The @@ -114,19 +114,19 @@ These guidelines say *what* to do. See the Rationale section for *why*. learning curve to understand our automated testing scripts. And it is considerable effort to fully automate setup and teardown of a cluster, which is needed for CI. And, not everyone has the time and money to run CI. We do not want to - discourage people from writing and sharing guides because of this. + discourage people from writing and sharing guides because of this. - Versioned distro authors are free to run their own CI and let us know if there is breakage, but we will not include them as commit hooks -- there cannot be so many commit checks that it is impossible to pass them all. - We prefer a single Configuration Management tool for development distros. If there were more than one, the core developers would have to learn multiple tools and update config in multiple places. **Saltstack** happens to be the one we picked when we started the project. We - welcome versioned distros that use any tool; there are already examples of + welcome versioned distros that use any tool; there are already examples of CoreOS Fleet, Ansible, and others. - You can still run code from head or your own branch if you use another Configuration Management tool -- you just have to do some manual steps during testing and deployment. - + -- cgit v1.2.3 From b15dad5066d0fb1bd39b514230bfc8b2328ea72c Mon Sep 17 00:00:00 2001 From: Eric Paris Date: Fri, 24 Jul 2015 17:52:18 -0400 Subject: Fix trailing whitespace in all docs --- README.md | 2 +- admission_control_resource_quota.md | 6 +++--- architecture.md | 2 +- event_compression.md | 2 +- expansion.md | 8 ++++---- namespaces.md | 14 +++++++------- persistent-storage.md | 12 ++++++------ principles.md | 8 ++++---- resources.md | 10 +++++----- secrets.md | 6 +++--- security_context.md | 34 +++++++++++++++++----------------- simple-rolling-update.md | 4 ++-- 12 files changed, 54 insertions(+), 54 deletions(-) diff --git a/README.md b/README.md index 62946cb6..72d2c662 100644 --- a/README.md +++ b/README.md @@ -33,7 +33,7 @@ Documentation for other releases can be found at # Kubernetes Design Overview -Kubernetes is a system for managing containerized applications across multiple hosts, providing basic mechanisms for deployment, maintenance, and scaling of applications. +Kubernetes is a system for managing containerized applications across multiple hosts, providing basic mechanisms for deployment, maintenance, and scaling of applications. Kubernetes establishes robust declarative primitives for maintaining the desired state requested by the user. We see these primitives as the main value added by Kubernetes. Self-healing mechanisms, such as auto-restarting, re-scheduling, and replicating containers require active controllers, not just imperative orchestration. diff --git a/admission_control_resource_quota.md b/admission_control_resource_quota.md index c86577ac..136603d2 100644 --- a/admission_control_resource_quota.md +++ b/admission_control_resource_quota.md @@ -104,7 +104,7 @@ type ResourceQuotaList struct { ## AdmissionControl plugin: ResourceQuota -The **ResourceQuota** plug-in introspects all incoming admission requests. +The **ResourceQuota** plug-in introspects all incoming admission requests. It makes decisions by evaluating the incoming object against all defined **ResourceQuota.Status.Hard** resource limits in the request namespace. If acceptance of the resource would cause the total usage of a named resource to exceed its hard limit, the request is denied. @@ -125,7 +125,7 @@ Any resource that is not part of core Kubernetes must follow the resource naming This means the resource must have a fully-qualified name (i.e. mycompany.org/shinynewresource) If the incoming request does not cause the total usage to exceed any of the enumerated hard resource limits, the plug-in will post a -**ResourceQuotaUsage** document to the server to atomically update the observed usage based on the previously read +**ResourceQuotaUsage** document to the server to atomically update the observed usage based on the previously read **ResourceQuota.ResourceVersion**. This keeps incremental usage atomically consistent, but does introduce a bottleneck (intentionally) into the system. @@ -184,7 +184,7 @@ resourcequotas 1 1 services 3 5 ``` -## More information +## More information See [resource quota document](../admin/resource-quota.md) and the [example of Resource Quota](../user-guide/resourcequota/) for more information. diff --git a/architecture.md b/architecture.md index f7c55171..5f829d68 100644 --- a/architecture.md +++ b/architecture.md @@ -47,7 +47,7 @@ Each node runs Docker, of course. Docker takes care of the details of downloadi ### Kubelet -The **Kubelet** manages [pods](../user-guide/pods.md) and their containers, their images, their volumes, etc. +The **Kubelet** manages [pods](../user-guide/pods.md) and their containers, their images, their volumes, etc. ### Kube-Proxy diff --git a/event_compression.md b/event_compression.md index bfa2c5d6..ce8d1ad4 100644 --- a/event_compression.md +++ b/event_compression.md @@ -49,7 +49,7 @@ Event compression should be best effort (not guaranteed). Meaning, in the worst ## Design Instead of a single Timestamp, each event object [contains](http://releases.k8s.io/HEAD/pkg/api/types.go#L1111) the following fields: - * `FirstTimestamp util.Time` + * `FirstTimestamp util.Time` * The date/time of the first occurrence of the event. * `LastTimestamp util.Time` * The date/time of the most recent occurrence of the event. diff --git a/expansion.md b/expansion.md index 75c748ca..24a07f0d 100644 --- a/expansion.md +++ b/expansion.md @@ -87,7 +87,7 @@ available to subsequent expansions. ### Use Case: Variable expansion in command -Users frequently need to pass the values of environment variables to a container's command. +Users frequently need to pass the values of environment variables to a container's command. Currently, Kubernetes does not perform any expansion of variables. The workaround is to invoke a shell in the container's command and have the shell perform the substitution, or to write a wrapper script that sets up the environment and runs the command. This has a number of drawbacks: @@ -130,7 +130,7 @@ The exact syntax for variable expansion has a large impact on how users perceive feature. We considered implementing a very restrictive subset of the shell `${var}` syntax. This syntax is an attractive option on some level, because many people are familiar with it. However, this syntax also has a large number of lesser known features such as the ability to provide -default values for unset variables, perform inline substitution, etc. +default values for unset variables, perform inline substitution, etc. In the interest of preventing conflation of the expansion feature in Kubernetes with the shell feature, we chose a different syntax similar to the one in Makefiles, `$(var)`. We also chose not @@ -239,7 +239,7 @@ The necessary changes to implement this functionality are: `ObjectReference` and an `EventRecorder` 2. Introduce `third_party/golang/expansion` package that provides: 1. An `Expand(string, func(string) string) string` function - 2. A `MappingFuncFor(ObjectEventRecorder, ...map[string]string) string` function + 2. A `MappingFuncFor(ObjectEventRecorder, ...map[string]string) string` function 3. Make the kubelet expand environment correctly 4. Make the kubelet expand command correctly @@ -311,7 +311,7 @@ func Expand(input string, mapping func(string) string) string { #### Kubelet changes -The Kubelet should be made to correctly expand variables references in a container's environment, +The Kubelet should be made to correctly expand variables references in a container's environment, command, and args. Changes will need to be made to: 1. The `makeEnvironmentVariables` function in the kubelet; this is used by diff --git a/namespaces.md b/namespaces.md index da3bb2c5..596f6f43 100644 --- a/namespaces.md +++ b/namespaces.md @@ -52,7 +52,7 @@ Each user community has its own: A cluster operator may create a Namespace for each unique user community. -The Namespace provides a unique scope for: +The Namespace provides a unique scope for: 1. named resources (to avoid basic naming collisions) 2. delegated management authority to trusted users @@ -142,7 +142,7 @@ type NamespaceSpec struct { A *FinalizerName* is a qualified name. -The API Server enforces that a *Namespace* can only be deleted from storage if and only if +The API Server enforces that a *Namespace* can only be deleted from storage if and only if it's *Namespace.Spec.Finalizers* is empty. A *finalize* operation is the only mechanism to modify the *Namespace.Spec.Finalizers* field post creation. @@ -189,12 +189,12 @@ are known to the cluster. The *namespace controller* enumerates each known resource type in that namespace and deletes it one by one. Admission control blocks creation of new resources in that namespace in order to prevent a race-condition -where the controller could believe all of a given resource type had been deleted from the namespace, +where the controller could believe all of a given resource type had been deleted from the namespace, when in fact some other rogue client agent had created new objects. Using admission control in this scenario allows each of registry implementations for the individual objects to not need to take into account Namespace life-cycle. Once all objects known to the *namespace controller* have been deleted, the *namespace controller* -executes a *finalize* operation on the namespace that removes the *kubernetes* value from +executes a *finalize* operation on the namespace that removes the *kubernetes* value from the *Namespace.Spec.Finalizers* list. If the *namespace controller* sees a *Namespace* whose *ObjectMeta.DeletionTimestamp* is set, and @@ -245,13 +245,13 @@ In etcd, we want to continue to still support efficient WATCH across namespaces. Resources that persist content in etcd will have storage paths as follows: -/{k8s_storage_prefix}/{resourceType}/{resource.Namespace}/{resource.Name} +/{k8s_storage_prefix}/{resourceType}/{resource.Namespace}/{resource.Name} This enables consumers to WATCH /registry/{resourceType} for changes across namespace of a particular {resourceType}. ### Kubelet -The kubelet will register pod's it sources from a file or http source with a namespace associated with the +The kubelet will register pod's it sources from a file or http source with a namespace associated with the *cluster-id* ### Example: OpenShift Origin managing a Kubernetes Namespace @@ -362,7 +362,7 @@ This results in the following state: At this point, the Kubernetes *namespace controller* in its sync loop will see that the namespace has a deletion timestamp and that its list of finalizers is empty. As a result, it knows all -content associated from that namespace has been purged. It performs a final DELETE action +content associated from that namespace has been purged. It performs a final DELETE action to remove that Namespace from the storage. At this point, all content associated with that Namespace, and the Namespace itself are gone. diff --git a/persistent-storage.md b/persistent-storage.md index 51cfce89..bb200811 100644 --- a/persistent-storage.md +++ b/persistent-storage.md @@ -41,11 +41,11 @@ Two new API kinds: A `PersistentVolume` (PV) is a storage resource provisioned by an administrator. It is analogous to a node. See [Persistent Volume Guide](../user-guide/persistent-volumes/) for how to use it. -A `PersistentVolumeClaim` (PVC) is a user's request for a persistent volume to use in a pod. It is analogous to a pod. +A `PersistentVolumeClaim` (PVC) is a user's request for a persistent volume to use in a pod. It is analogous to a pod. One new system component: -`PersistentVolumeClaimBinder` is a singleton running in master that watches all PersistentVolumeClaims in the system and binds them to the closest matching available PersistentVolume. The volume manager watches the API for newly created volumes to manage. +`PersistentVolumeClaimBinder` is a singleton running in master that watches all PersistentVolumeClaims in the system and binds them to the closest matching available PersistentVolume. The volume manager watches the API for newly created volumes to manage. One new volume: @@ -69,7 +69,7 @@ Cluster administrators use the API to manage *PersistentVolumes*. A custom stor PVs are system objects and, thus, have no namespace. -Many means of dynamic provisioning will be eventually be implemented for various storage types. +Many means of dynamic provisioning will be eventually be implemented for various storage types. ##### PersistentVolume API @@ -116,7 +116,7 @@ TBD #### Events -The implementation of persistent storage will not require events to communicate to the user the state of their claim. The CLI for bound claims contains a reference to the backing persistent volume. This is always present in the API and CLI, making an event to communicate the same unnecessary. +The implementation of persistent storage will not require events to communicate to the user the state of their claim. The CLI for bound claims contains a reference to the backing persistent volume. This is always present in the API and CLI, making an event to communicate the same unnecessary. Events that communicate the state of a mounted volume are left to the volume plugins. @@ -232,9 +232,9 @@ When a claim holder is finished with their data, they can delete their claim. $ kubectl delete pvc myclaim-1 ``` -The ```PersistentVolumeClaimBinder``` will reconcile this by removing the claim reference from the PV and change the PVs status to 'Released'. +The ```PersistentVolumeClaimBinder``` will reconcile this by removing the claim reference from the PV and change the PVs status to 'Released'. -Admins can script the recycling of released volumes. Future dynamic provisioners will understand how a volume should be recycled. +Admins can script the recycling of released volumes. Future dynamic provisioners will understand how a volume should be recycled. diff --git a/principles.md b/principles.md index c208fb6b..23a20349 100644 --- a/principles.md +++ b/principles.md @@ -33,7 +33,7 @@ Documentation for other releases can be found at # Design Principles -Principles to follow when extending Kubernetes. +Principles to follow when extending Kubernetes. ## API @@ -44,14 +44,14 @@ See also the [API conventions](../devel/api-conventions.md). * The control plane should be transparent -- there are no hidden internal APIs. * The cost of API operations should be proportional to the number of objects intentionally operated upon. Therefore, common filtered lookups must be indexed. Beware of patterns of multiple API calls that would incur quadratic behavior. * Object status must be 100% reconstructable by observation. Any history kept must be just an optimization and not required for correct operation. -* Cluster-wide invariants are difficult to enforce correctly. Try not to add them. If you must have them, don't enforce them atomically in master components, that is contention-prone and doesn't provide a recovery path in the case of a bug allowing the invariant to be violated. Instead, provide a series of checks to reduce the probability of a violation, and make every component involved able to recover from an invariant violation. +* Cluster-wide invariants are difficult to enforce correctly. Try not to add them. If you must have them, don't enforce them atomically in master components, that is contention-prone and doesn't provide a recovery path in the case of a bug allowing the invariant to be violated. Instead, provide a series of checks to reduce the probability of a violation, and make every component involved able to recover from an invariant violation. * Low-level APIs should be designed for control by higher-level systems. Higher-level APIs should be intent-oriented (think SLOs) rather than implementation-oriented (think control knobs). ## Control logic * Functionality must be *level-based*, meaning the system must operate correctly given the desired state and the current/observed state, regardless of how many intermediate state updates may have been missed. Edge-triggered behavior must be just an optimization. * Assume an open world: continually verify assumptions and gracefully adapt to external events and/or actors. Example: we allow users to kill pods under control of a replication controller; it just replaces them. -* Do not define comprehensive state machines for objects with behaviors associated with state transitions and/or "assumed" states that cannot be ascertained by observation. +* Do not define comprehensive state machines for objects with behaviors associated with state transitions and/or "assumed" states that cannot be ascertained by observation. * Don't assume a component's decisions will not be overridden or rejected, nor for the component to always understand why. For example, etcd may reject writes. Kubelet may reject pods. The scheduler may not be able to schedule pods. Retry, but back off and/or make alternative decisions. * Components should be self-healing. For example, if you must keep some state (e.g., cache) the content needs to be periodically refreshed, so that if an item does get erroneously stored or a deletion event is missed etc, it will be soon fixed, ideally on timescales that are shorter than what will attract attention from humans. * Component behavior should degrade gracefully. Prioritize actions so that the most important activities can continue to function even when overloaded and/or in states of partial failure. @@ -61,7 +61,7 @@ See also the [API conventions](../devel/api-conventions.md). * Only the apiserver should communicate with etcd/store, and not other components (scheduler, kubelet, etc.). * Compromising a single node shouldn't compromise the cluster. * Components should continue to do what they were last told in the absence of new instructions (e.g., due to network partition or component outage). -* All components should keep all relevant state in memory all the time. The apiserver should write through to etcd/store, other components should write through to the apiserver, and they should watch for updates made by other clients. +* All components should keep all relevant state in memory all the time. The apiserver should write through to etcd/store, other components should write through to the apiserver, and they should watch for updates made by other clients. * Watch is preferred over polling. ## Extensibility diff --git a/resources.md b/resources.md index 7bcce84a..e006d44d 100644 --- a/resources.md +++ b/resources.md @@ -51,7 +51,7 @@ The resource model aims to be: A Kubernetes _resource_ is something that can be requested by, allocated to, or consumed by a pod or container. Examples include memory (RAM), CPU, disk-time, and network bandwidth. -Once resources on a node have been allocated to one pod, they should not be allocated to another until that pod is removed or exits. This means that Kubernetes schedulers should ensure that the sum of the resources allocated (requested and granted) to its pods never exceeds the usable capacity of the node. Testing whether a pod will fit on a node is called _feasibility checking_. +Once resources on a node have been allocated to one pod, they should not be allocated to another until that pod is removed or exits. This means that Kubernetes schedulers should ensure that the sum of the resources allocated (requested and granted) to its pods never exceeds the usable capacity of the node. Testing whether a pod will fit on a node is called _feasibility checking_. Note that the resource model currently prohibits over-committing resources; we will want to relax that restriction later. @@ -70,7 +70,7 @@ For future reference, note that some resources, such as CPU and network bandwidt ### Resource quantities -Initially, all Kubernetes resource types are _quantitative_, and have an associated _unit_ for quantities of the associated resource (e.g., bytes for memory, bytes per seconds for bandwidth, instances for software licences). The units will always be a resource type's natural base units (e.g., bytes, not MB), to avoid confusion between binary and decimal multipliers and the underlying unit multiplier (e.g., is memory measured in MiB, MB, or GB?). +Initially, all Kubernetes resource types are _quantitative_, and have an associated _unit_ for quantities of the associated resource (e.g., bytes for memory, bytes per seconds for bandwidth, instances for software licences). The units will always be a resource type's natural base units (e.g., bytes, not MB), to avoid confusion between binary and decimal multipliers and the underlying unit multiplier (e.g., is memory measured in MiB, MB, or GB?). Resource quantities can be added and subtracted: for example, a node has a fixed quantity of each resource type that can be allocated to pods/containers; once such an allocation has been made, the allocated resources cannot be made available to other pods/containers without over-committing the resources. @@ -110,7 +110,7 @@ resourceCapacitySpec: [ ``` Where: -* _total_: the total allocatable resources of a node. Initially, the resources at a given scope will bound the resources of the sum of inner scopes. +* _total_: the total allocatable resources of a node. Initially, the resources at a given scope will bound the resources of the sum of inner scopes. #### Notes @@ -194,7 +194,7 @@ The following are planned future extensions to the resource model, included here Because resource usage and related metrics change continuously, need to be tracked over time (i.e., historically), can be characterized in a variety of ways, and are fairly voluminous, we will not include usage in core API objects, such as [Pods](../user-guide/pods.md) and Nodes, but will provide separate APIs for accessing and managing that data. See the Appendix for possible representations of usage data, but the representation we'll use is TBD. -Singleton values for observed and predicted future usage will rapidly prove inadequate, so we will support the following structure for extended usage information: +Singleton values for observed and predicted future usage will rapidly prove inadequate, so we will support the following structure for extended usage information: ```yaml resourceStatus: [ @@ -223,7 +223,7 @@ where a `` or `` structure looks like this: ``` All parts of this structure are optional, although we strongly encourage including quantities for 50, 90, 95, 99, 99.5, and 99.9 percentiles. _[In practice, it will be important to include additional info such as the length of the time window over which the averages are calculated, the confidence level, and information-quality metrics such as the number of dropped or discarded data points.]_ -and predicted +and predicted ## Future resource types diff --git a/secrets.md b/secrets.md index f5793133..3adc57af 100644 --- a/secrets.md +++ b/secrets.md @@ -34,7 +34,7 @@ Documentation for other releases can be found at ## Abstract A proposal for the distribution of [secrets](../user-guide/secrets.md) (passwords, keys, etc) to the Kubelet and to -containers inside Kubernetes using a custom [volume](../user-guide/volumes.md#secrets) type. See the [secrets example](../user-guide/secrets/) for more information. +containers inside Kubernetes using a custom [volume](../user-guide/volumes.md#secrets) type. See the [secrets example](../user-guide/secrets/) for more information. ## Motivation @@ -117,7 +117,7 @@ which consumes this type of secret, the Kubelet may take a number of actions: 1. Expose the secret in a `.kubernetes_auth` file in a well-known location in the container's file system -2. Configure that node's `kube-proxy` to decorate HTTP requests from that pod to the +2. Configure that node's `kube-proxy` to decorate HTTP requests from that pod to the `kubernetes-master` service with the auth token, e. g. by adding a header to the request (see the [LOAS Daemon](https://github.com/GoogleCloudPlatform/kubernetes/issues/2209) proposal) @@ -146,7 +146,7 @@ We should consider what the best way to allow this is; there are a few different export MY_SECRET_ENV=MY_SECRET_VALUE The user could `source` the file at `/etc/secrets/my-secret` prior to executing the command for - the image either inline in the command or in an init script, + the image either inline in the command or in an init script, 2. Give secrets an attribute that allows users to express the intent that the platform should generate the above syntax in the file used to present a secret. The user could consume these diff --git a/security_context.md b/security_context.md index 8a6dd314..7a80c01d 100644 --- a/security_context.md +++ b/security_context.md @@ -48,55 +48,55 @@ The problem of securing containers in Kubernetes has come up [before](https://gi ### Container isolation -In order to improve container isolation from host and other containers running on the host, containers should only be -granted the access they need to perform their work. To this end it should be possible to take advantage of Docker -features such as the ability to [add or remove capabilities](https://docs.docker.com/reference/run/#runtime-privilege-linux-capabilities-and-lxc-configuration) and [assign MCS labels](https://docs.docker.com/reference/run/#security-configuration) +In order to improve container isolation from host and other containers running on the host, containers should only be +granted the access they need to perform their work. To this end it should be possible to take advantage of Docker +features such as the ability to [add or remove capabilities](https://docs.docker.com/reference/run/#runtime-privilege-linux-capabilities-and-lxc-configuration) and [assign MCS labels](https://docs.docker.com/reference/run/#security-configuration) to the container process. Support for user namespaces has recently been [merged](https://github.com/docker/libcontainer/pull/304) into Docker's libcontainer project and should soon surface in Docker itself. It will make it possible to assign a range of unprivileged uids and gids from the host to each container, improving the isolation between host and container and between containers. ### External integration with shared storage -In order to support external integration with shared storage, processes running in a Kubernetes cluster -should be able to be uniquely identified by their Unix UID, such that a chain of ownership can be established. +In order to support external integration with shared storage, processes running in a Kubernetes cluster +should be able to be uniquely identified by their Unix UID, such that a chain of ownership can be established. Processes in pods will need to have consistent UID/GID/SELinux category labels in order to access shared disks. ## Constraints and Assumptions -* It is out of the scope of this document to prescribe a specific set +* It is out of the scope of this document to prescribe a specific set of constraints to isolate containers from their host. Different use cases need different settings. -* The concept of a security context should not be tied to a particular security mechanism or platform +* The concept of a security context should not be tied to a particular security mechanism or platform (ie. SELinux, AppArmor) * Applying a different security context to a scope (namespace or pod) requires a solution such as the one proposed for [service accounts](service_accounts.md). ## Use Cases -In order of increasing complexity, following are example use cases that would +In order of increasing complexity, following are example use cases that would be addressed with security contexts: 1. Kubernetes is used to run a single cloud application. In order to protect nodes from containers: * All containers run as a single non-root user * Privileged containers are disabled - * All containers run with a particular MCS label + * All containers run with a particular MCS label * Kernel capabilities like CHOWN and MKNOD are removed from containers - + 2. Just like case #1, except that I have more than one application running on the Kubernetes cluster. * Each application is run in its own namespace to avoid name collisions * For each application a different uid and MCS label is used - -3. Kubernetes is used as the base for a PAAS with - multiple projects, each project represented by a namespace. + +3. Kubernetes is used as the base for a PAAS with + multiple projects, each project represented by a namespace. * Each namespace is associated with a range of uids/gids on the node that - are mapped to uids/gids on containers using linux user namespaces. + are mapped to uids/gids on containers using linux user namespaces. * Certain pods in each namespace have special privileges to perform system actions such as talking back to the server for deployment, run docker builds, etc. * External NFS storage is assigned to each namespace and permissions set - using the range of uids/gids assigned to that namespace. + using the range of uids/gids assigned to that namespace. ## Proposed Design @@ -109,7 +109,7 @@ to mutate Docker API calls in order to apply the security context. It is recommended that this design be implemented in two phases: -1. Implement the security context provider extension point in the Kubelet +1. Implement the security context provider extension point in the Kubelet so that a default security context can be applied on container run and creation. 2. Implement a security context structure that is part of a service account. The default context provider can then be used to apply a security context based @@ -137,7 +137,7 @@ type SecurityContextProvider interface { } ``` -If the value of the SecurityContextProvider field on the Kubelet is nil, the kubelet will create and run the container as it does today. +If the value of the SecurityContextProvider field on the Kubelet is nil, the kubelet will create and run the container as it does today. ### Security Context diff --git a/simple-rolling-update.md b/simple-rolling-update.md index d99e7b25..720f4cbf 100644 --- a/simple-rolling-update.md +++ b/simple-rolling-update.md @@ -33,9 +33,9 @@ Documentation for other releases can be found at ## Simple rolling update -This is a lightweight design document for simple [rolling update](../user-guide/kubectl/kubectl_rolling-update.md) in `kubectl`. +This is a lightweight design document for simple [rolling update](../user-guide/kubectl/kubectl_rolling-update.md) in `kubectl`. -Complete execution flow can be found [here](#execution-details). See the [example of rolling update](../user-guide/update-demo/) for more information. +Complete execution flow can be found [here](#execution-details). See the [example of rolling update](../user-guide/update-demo/) for more information. ### Lightweight rollout -- cgit v1.2.3 From 87b792d6a3dc0aaaa598b64b51b837a281ed49e0 Mon Sep 17 00:00:00 2001 From: Eric Paris Date: Fri, 24 Jul 2015 17:52:18 -0400 Subject: Fix trailing whitespace in all docs --- autoscaling.md | 18 +++++++++--------- federation.md | 38 +++++++++++++++++++------------------- 2 files changed, 28 insertions(+), 28 deletions(-) diff --git a/autoscaling.md b/autoscaling.md index 86a9a819..ff50aa97 100644 --- a/autoscaling.md +++ b/autoscaling.md @@ -34,7 +34,7 @@ Documentation for other releases can be found at ## Abstract Auto-scaling is a data-driven feature that allows users to increase or decrease capacity as needed by controlling the -number of pods deployed within the system automatically. +number of pods deployed within the system automatically. ## Motivation @@ -49,7 +49,7 @@ done automatically based on statistical analysis and thresholds. * Scale verb - [1629](https://github.com/GoogleCloudPlatform/kubernetes/issues/1629) * Config conflicts - [Config](https://github.com/GoogleCloudPlatform/kubernetes/blob/c7cb991987193d4ca33544137a5cb7d0292cf7df/docs/config.md#automated-re-configuration-processes) * Rolling updates - [1353](https://github.com/GoogleCloudPlatform/kubernetes/issues/1353) - * Multiple scalable types - [1624](https://github.com/GoogleCloudPlatform/kubernetes/issues/1624) + * Multiple scalable types - [1624](https://github.com/GoogleCloudPlatform/kubernetes/issues/1624) ## Constraints and Assumptions @@ -77,7 +77,7 @@ balanced or situated behind a proxy - the data from those proxies and load balan server traffic for applications. This is the primary, but not sole, source of data for making decisions. Within Kubernetes a [kube proxy](../user-guide/services.md#ips-and-vips) -running on each node directs service requests to the underlying implementation. +running on each node directs service requests to the underlying implementation. While the proxy provides internal inter-pod connections, there will be L3 and L7 proxies and load balancers that manage traffic to backends. OpenShift, for instance, adds a "route" resource for defining external to internal traffic flow. @@ -87,7 +87,7 @@ data source for the number of backends. ### Scaling based on predictive analysis Scaling may also occur based on predictions of system state like anticipated load, historical data, etc. Hand in hand -with scaling based on traffic, predictive analysis may be used to determine anticipated system load and scale the application automatically. +with scaling based on traffic, predictive analysis may be used to determine anticipated system load and scale the application automatically. ### Scaling based on arbitrary data @@ -113,7 +113,7 @@ use a client/cache implementation to receive watch data from the data aggregator scaling the application. Auto-scalers are created and defined like other resources via REST endpoints and belong to the namespace just as a `ReplicationController` or `Service`. -Since an auto-scaler is a durable object it is best represented as a resource. +Since an auto-scaler is a durable object it is best represented as a resource. ```go //The auto scaler interface @@ -241,7 +241,7 @@ be specified as "when requests per second fall below 25 for 30 seconds scale the ### Data Aggregator This section has intentionally been left empty. I will defer to folks who have more experience gathering and analyzing -time series statistics. +time series statistics. Data aggregation is opaque to the auto-scaler resource. The auto-scaler is configured to use `AutoScaleThresholds` that know how to work with the underlying data in order to know if an application must be scaled up or down. Data aggregation @@ -257,7 +257,7 @@ potentially piggyback on this registry. If multiple scalable targets satisfy the `TargetSelector` criteria the auto-scaler should be configurable as to which target(s) are scaled. To begin with, if multiple targets are found the auto-scaler will scale the largest target up -or down as appropriate. In the future this may be more configurable. +or down as appropriate. In the future this may be more configurable. ### Interactions with a deployment @@ -266,12 +266,12 @@ there will be multiple replication controllers, with one scaling up and another auto-scaler must be aware of the entire set of capacity that backs a service so it does not fight with the deployer. `AutoScalerSpec.MonitorSelector` is what provides this ability. By using a selector that spans the entire service the auto-scaler can monitor capacity of multiple replication controllers and check that capacity against the `AutoScalerSpec.MaxAutoScaleCount` and -`AutoScalerSpec.MinAutoScaleCount` while still only targeting a specific set of `ReplicationController`s with `TargetSelector`. +`AutoScalerSpec.MinAutoScaleCount` while still only targeting a specific set of `ReplicationController`s with `TargetSelector`. In the course of a deployment it is up to the deployment orchestration to decide how to manage the labels on the replication controllers if it needs to ensure that only specific replication controllers are targeted by the auto-scaler. By default, the auto-scaler will scale the largest replication controller that meets the target label -selector criteria. +selector criteria. During deployment orchestration the auto-scaler may be making decisions to scale its target up or down. In order to prevent the scaler from fighting with a deployment process that is scaling one replication controller up and scaling another one diff --git a/federation.md b/federation.md index 99dbe904..1845e9eb 100644 --- a/federation.md +++ b/federation.md @@ -31,17 +31,17 @@ Documentation for other releases can be found at -# Kubernetes Cluster Federation +# Kubernetes Cluster Federation ## (a.k.a. "Ubernetes") ## Requirements Analysis and Product Proposal -## _by Quinton Hoole ([quinton@google.com](mailto:quinton@google.com))_ +## _by Quinton Hoole ([quinton@google.com](mailto:quinton@google.com))_ -_Initial revision: 2015-03-05_ -_Last updated: 2015-03-09_ -This doc: [tinyurl.com/ubernetesv2](http://tinyurl.com/ubernetesv2) +_Initial revision: 2015-03-05_ +_Last updated: 2015-03-09_ +This doc: [tinyurl.com/ubernetesv2](http://tinyurl.com/ubernetesv2) Slides: [tinyurl.com/ubernetes-slides](http://tinyurl.com/ubernetes-slides) ## Introduction @@ -89,11 +89,11 @@ loosely speaking, a cluster can be thought of as running in a single data center, or cloud provider availability zone, a more precise definition is that each cluster provides: -1. a single Kubernetes API entry point, +1. a single Kubernetes API entry point, 1. a consistent, cluster-wide resource naming scheme 1. a scheduling/container placement domain 1. a service network routing domain -1. (in future) an authentication and authorization model. +1. (in future) an authentication and authorization model. 1. .... The above in turn imply the need for a relatively performant, reliable @@ -220,7 +220,7 @@ the multi-cloud provider implementation should just work for a single cloud provider). Propose high-level design catering for both, with initial implementation targeting single cloud provider only. -**Clarifying questions:** +**Clarifying questions:** **How does global external service discovery work?** In the steady state, which external clients connect to which clusters? GeoDNS or similar? What is the tolerable failover latency if a cluster goes @@ -266,8 +266,8 @@ Doing nothing (i.e. forcing users to choose between 1 and 2 on their own) is probably an OK starting point. Kubernetes autoscaling can get us to 3 at some later date. -Up to this point, this use case ("Unavailability Zones") seems materially different from all the others above. It does not require dynamic cross-cluster service migration (we assume that the service is already running in more than one cluster when the failure occurs). Nor does it necessarily involve cross-cluster service discovery or location affinity. As a result, I propose that we address this use case somewhat independently of the others (although I strongly suspect that it will become substantially easier once we've solved the others). - +Up to this point, this use case ("Unavailability Zones") seems materially different from all the others above. It does not require dynamic cross-cluster service migration (we assume that the service is already running in more than one cluster when the failure occurs). Nor does it necessarily involve cross-cluster service discovery or location affinity. As a result, I propose that we address this use case somewhat independently of the others (although I strongly suspect that it will become substantially easier once we've solved the others). + All of the above (regarding "Unavailibility Zones") refers primarily to already-running user-facing services, and minimizing the impact on end users of those services becoming unavailable in a given cluster. @@ -322,7 +322,7 @@ location affinity: (other than the source of YouTube videos, which is assumed to be equally remote from all clusters in this example). Each pod can be scheduled independently, in any cluster, and moved at any time. -1. **"Preferentially Coupled"**: Somewhere between Coupled and Decoupled. These applications prefer to have all of their pods located in the same cluster (e.g. for failure correlation, network latency or bandwidth cost reasons), but can tolerate being partitioned for "short" periods of time (for example while migrating the application from one cluster to another). Most small to medium sized LAMP stacks with not-very-strict latency goals probably fall into this category (provided that they use sane service discovery and reconnect-on-fail, which they need to do anyway to run effectively, even in a single Kubernetes cluster). +1. **"Preferentially Coupled"**: Somewhere between Coupled and Decoupled. These applications prefer to have all of their pods located in the same cluster (e.g. for failure correlation, network latency or bandwidth cost reasons), but can tolerate being partitioned for "short" periods of time (for example while migrating the application from one cluster to another). Most small to medium sized LAMP stacks with not-very-strict latency goals probably fall into this category (provided that they use sane service discovery and reconnect-on-fail, which they need to do anyway to run effectively, even in a single Kubernetes cluster). And then there's what I'll call _absolute_ location affinity. Some applications are required to run in bounded geographical or network @@ -341,7 +341,7 @@ of our users are in Western Europe, U.S. West Coast" etc). ## Cross-cluster service discovery -I propose having pods use standard discovery methods used by external clients of Kubernetes applications (i.e. DNS). DNS might resolve to a public endpoint in the local or a remote cluster. Other than Strictly Coupled applications, software should be largely oblivious of which of the two occurs. +I propose having pods use standard discovery methods used by external clients of Kubernetes applications (i.e. DNS). DNS might resolve to a public endpoint in the local or a remote cluster. Other than Strictly Coupled applications, software should be largely oblivious of which of the two occurs. _Aside:_ How do we avoid "tromboning" through an external VIP when DNS resolves to a public IP on the local cluster? Strictly speaking this would be an optimization, and probably only matters to high bandwidth, @@ -384,15 +384,15 @@ such events include: 1. A change of scheduling policy ("we no longer use cloud provider X"). 1. A change of resource pricing ("cloud provider Y dropped their prices - lets migrate there"). -Strictly Decoupled applications can be trivially moved, in part or in whole, one pod at a time, to one or more clusters. -For Preferentially Decoupled applications, the federation system must first locate a single cluster with sufficient capacity to accommodate the entire application, then reserve that capacity, and incrementally move the application, one (or more) resources at a time, over to the new cluster, within some bounded time period (and possibly within a predefined "maintenance" window). +Strictly Decoupled applications can be trivially moved, in part or in whole, one pod at a time, to one or more clusters. +For Preferentially Decoupled applications, the federation system must first locate a single cluster with sufficient capacity to accommodate the entire application, then reserve that capacity, and incrementally move the application, one (or more) resources at a time, over to the new cluster, within some bounded time period (and possibly within a predefined "maintenance" window). Strictly Coupled applications (with the exception of those deemed completely immovable) require the federation system to: 1. start up an entire replica application in the destination cluster 1. copy persistent data to the new application instance 1. switch traffic across -1. tear down the original application instance +1. tear down the original application instance It is proposed that support for automated migration of Strictly Coupled applications be deferred to a later date. @@ -422,11 +422,11 @@ TBD: All very hand-wavey still, but some initial thoughts to get the conversatio ## Ubernetes API -This looks a lot like the existing Kubernetes API but is explicitly multi-cluster. +This looks a lot like the existing Kubernetes API but is explicitly multi-cluster. -+ Clusters become first class objects, which can be registered, listed, described, deregistered etc via the API. -+ Compute resources can be explicitly requested in specific clusters, or automatically scheduled to the "best" cluster by Ubernetes (by a pluggable Policy Engine). -+ There is a federated equivalent of a replication controller type, which is multicluster-aware, and delegates to cluster-specific replication controllers as required (e.g. a federated RC for n replicas might simply spawn multiple replication controllers in different clusters to do the hard work). ++ Clusters become first class objects, which can be registered, listed, described, deregistered etc via the API. ++ Compute resources can be explicitly requested in specific clusters, or automatically scheduled to the "best" cluster by Ubernetes (by a pluggable Policy Engine). ++ There is a federated equivalent of a replication controller type, which is multicluster-aware, and delegates to cluster-specific replication controllers as required (e.g. a federated RC for n replicas might simply spawn multiple replication controllers in different clusters to do the hard work). + These federated replication controllers (and in fact all the services comprising the Ubernetes Control Plane) have to run somewhere. For high availability Ubernetes deployments, these -- cgit v1.2.3 From 4a1dcd958ef57876885631f8b19b8cc803e6316e Mon Sep 17 00:00:00 2001 From: Ananya Kumar Date: Thu, 30 Jul 2015 20:02:06 -0700 Subject: Update admission_control.md I tested out a Limit Ranger, and it seems like the admission happens *before* Validation. Please correct me if I'm wrong though, I didn't look at the code in detail. In any case, I think it makes sense for admission to happen before validation because code in admission can change containers. By the way I think it's pretty hard to find flows like this in the code, so it's useful if we add links to code in the design docs (for prospective developers) :) --- admission_control.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/admission_control.md b/admission_control.md index c75d5535..8cc6cf03 100644 --- a/admission_control.md +++ b/admission_control.md @@ -104,9 +104,9 @@ will ensure the following: 1. Incoming request 2. Authenticate user 3. Authorize user -4. If operation=create|update, then validate(object) -5. If operation=create|update|delete, then admission.Admit(requestAttributes) - a. invoke each admission.Interface object in sequence +4. If operation=create|update|delete, then admission.Admit(requestAttributes) + a. invoke each admission.Interface object in sequence +5. If operation=create|update, then validate(object) 6. Object is persisted If at any step, there is an error, the request is canceled. -- cgit v1.2.3 From 0a0fbb58fe67fbfb864145956bf3b8b86625d190 Mon Sep 17 00:00:00 2001 From: Ananya Kumar Date: Mon, 3 Aug 2015 23:00:48 -0700 Subject: Update admission_control.md --- admission_control.md | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/admission_control.md b/admission_control.md index 8cc6cf03..b84b2543 100644 --- a/admission_control.md +++ b/admission_control.md @@ -98,16 +98,17 @@ func init() { Invocation of admission control is handled by the **APIServer** and not individual **RESTStorage** implementations. -This design assumes that **Issue 297** is adopted, and as a consequence, the general framework of the APIServer request/response flow -will ensure the following: +This design assumes that **Issue 297** is adopted, and as a consequence, the general framework of the APIServer request/response flow will ensure the following: 1. Incoming request 2. Authenticate user 3. Authorize user -4. If operation=create|update|delete, then admission.Admit(requestAttributes) - a. invoke each admission.Interface object in sequence -5. If operation=create|update, then validate(object) -6. Object is persisted +4. If operation=create|update|delete|connect, then admission.Admit(requestAttributes) + - invoke each admission.Interface object in sequence +5. Case on the operation: + - If operation=create|update, then validate(object) and persist + - If operation=delete, delete the object + - If operation=connect, exec If at any step, there is an error, the request is canceled. -- cgit v1.2.3 From 3c23de245b41ab8b3d027af5ca9a4e7cf83fc4d3 Mon Sep 17 00:00:00 2001 From: derekwaynecarr Date: Tue, 4 Aug 2015 10:46:51 -0400 Subject: LimitRange documentation should be under admin --- admission_control_limit_range.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/admission_control_limit_range.md b/admission_control_limit_range.md index b1baf1f0..595d72e9 100644 --- a/admission_control_limit_range.md +++ b/admission_control_limit_range.md @@ -142,7 +142,7 @@ For example, ```console $ kubectl namespace myspace -$ kubectl create -f docs/user-guide/limitrange/limits.yaml +$ kubectl create -f docs/admin/limitrange/limits.yaml $ kubectl get limits NAME limits @@ -166,7 +166,7 @@ To make a **LimitRangeItem** more restrictive, we will intend to add these addit ## Example -See the [example of Limit Range](../user-guide/limitrange/) for more information. +See the [example of Limit Range](../admin/limitrange/) for more information. -- cgit v1.2.3 From 94ec57fba832c57a013d9acc9bff51d8b4a42ce3 Mon Sep 17 00:00:00 2001 From: derekwaynecarr Date: Wed, 5 Aug 2015 14:06:36 -0400 Subject: Update resource quota design to align with requests and limits --- admission_control_resource_quota.md | 148 ++++++++++++++++++++++-------------- 1 file changed, 91 insertions(+), 57 deletions(-) diff --git a/admission_control_resource_quota.md b/admission_control_resource_quota.md index 136603d2..bb7c6e0a 100644 --- a/admission_control_resource_quota.md +++ b/admission_control_resource_quota.md @@ -35,13 +35,17 @@ Documentation for other releases can be found at ## Background -This document proposes a system for enforcing hard resource usage limits per namespace as part of admission control. +This document describes a system for enforcing hard resource usage limits per namespace as part of admission control. -## Model Changes +## Use cases -A new resource, **ResourceQuota**, is introduced to enumerate hard resource limits in a Kubernetes namespace. +1. Ability to enumerate resource usage limits per namespace. +2. Ability to monitor resource usage for tracked resources. +3. Ability to reject resource usage exceeding hard quotas. -A new resource, **ResourceQuotaUsage**, is introduced to support atomic updates of a **ResourceQuota** status. +## Data Model + +The **ResourceQuota** object is scoped to a **Namespace**. ```go // The following identify resource constants for Kubernetes object types @@ -54,109 +58,139 @@ const ( ResourceReplicationControllers ResourceName = "replicationcontrollers" // ResourceQuotas, number ResourceQuotas ResourceName = "resourcequotas" + // ResourceSecrets, number + ResourceSecrets ResourceName = "secrets" + // ResourcePersistentVolumeClaims, number + ResourcePersistentVolumeClaims ResourceName = "persistentvolumeclaims" ) // ResourceQuotaSpec defines the desired hard limits to enforce for Quota type ResourceQuotaSpec struct { // Hard is the set of desired hard limits for each named resource - Hard ResourceList `json:"hard,omitempty"` + Hard ResourceList `json:"hard,omitempty" description:"hard is the set of desired hard limits for each named resource; see http://releases.k8s.io/HEAD/docs/design/admission_control_resource_quota.md#admissioncontrol-plugin-resourcequota"` } // ResourceQuotaStatus defines the enforced hard limits and observed use type ResourceQuotaStatus struct { // Hard is the set of enforced hard limits for each named resource - Hard ResourceList `json:"hard,omitempty"` + Hard ResourceList `json:"hard,omitempty" description:"hard is the set of enforced hard limits for each named resource; see http://releases.k8s.io/HEAD/docs/design/admission_control_resource_quota.md#admissioncontrol-plugin-resourcequota"` // Used is the current observed total usage of the resource in the namespace - Used ResourceList `json:"used,omitempty"` + Used ResourceList `json:"used,omitempty" description:"used is the current observed total usage of the resource in the namespace"` } // ResourceQuota sets aggregate quota restrictions enforced per namespace type ResourceQuota struct { TypeMeta `json:",inline"` - ObjectMeta `json:"metadata,omitempty"` + ObjectMeta `json:"metadata,omitempty" description:"standard object metadata; see http://releases.k8s.io/HEAD/docs/devel/api-conventions.md#metadata"` // Spec defines the desired quota - Spec ResourceQuotaSpec `json:"spec,omitempty"` - - // Status defines the actual enforced quota and its current usage - Status ResourceQuotaStatus `json:"status,omitempty"` -} - -// ResourceQuotaUsage captures system observed quota status per namespace -// It is used to enforce atomic updates of a backing ResourceQuota.Status field in storage -type ResourceQuotaUsage struct { - TypeMeta `json:",inline"` - ObjectMeta `json:"metadata,omitempty"` + Spec ResourceQuotaSpec `json:"spec,omitempty" description:"spec defines the desired quota; http://releases.k8s.io/HEAD/docs/devel/api-conventions.md#spec-and-status"` // Status defines the actual enforced quota and its current usage - Status ResourceQuotaStatus `json:"status,omitempty"` + Status ResourceQuotaStatus `json:"status,omitempty" description:"status defines the actual enforced quota and current usage; http://releases.k8s.io/HEAD/docs/devel/api-conventions.md#spec-and-status"` } // ResourceQuotaList is a list of ResourceQuota items type ResourceQuotaList struct { TypeMeta `json:",inline"` - ListMeta `json:"metadata,omitempty"` + ListMeta `json:"metadata,omitempty" description:"standard list metadata; see http://releases.k8s.io/HEAD/docs/devel/api-conventions.md#metadata"` // Items is a list of ResourceQuota objects - Items []ResourceQuota `json:"items"` + Items []ResourceQuota `json:"items" description:"items is a list of ResourceQuota objects; see http://releases.k8s.io/HEAD/docs/design/admission_control_resource_quota.md#admissioncontrol-plugin-resourcequota"` } ``` -## AdmissionControl plugin: ResourceQuota +## Quota Tracked Resources -The **ResourceQuota** plug-in introspects all incoming admission requests. +The following resources are supported by the quota system. -It makes decisions by evaluating the incoming object against all defined **ResourceQuota.Status.Hard** resource limits in the request -namespace. If acceptance of the resource would cause the total usage of a named resource to exceed its hard limit, the request is denied. - -The following resource limits are imposed as part of core Kubernetes at the namespace level: - -| ResourceName | Description | +| Resource | Description | | ------------ | ----------- | -| cpu | Total cpu usage | -| memory | Total memory usage | -| pods | Total number of pods | +| cpu | Total requested cpu usage | +| memory | Total requested memory usage | +| pods | Total number of active pods where phase is pending or active. | | services | Total number of services | | replicationcontrollers | Total number of replication controllers | | resourcequotas | Total number of resource quotas | +| secrets | Total number of secrets | +| persistentvolumeclaims | Total number of persistent volume claims | -Any resource that is not part of core Kubernetes must follow the resource naming convention prescribed by Kubernetes. +If a third-party wants to track additional resources, it must follow the resource naming conventions prescribed +by Kubernetes. This means the resource must have a fully-qualified name (i.e. mycompany.org/shinynewresource) -This means the resource must have a fully-qualified name (i.e. mycompany.org/shinynewresource) +## Resource Requirements: Requests vs Limits -If the incoming request does not cause the total usage to exceed any of the enumerated hard resource limits, the plug-in will post a -**ResourceQuotaUsage** document to the server to atomically update the observed usage based on the previously read -**ResourceQuota.ResourceVersion**. This keeps incremental usage atomically consistent, but does introduce a bottleneck (intentionally) -into the system. +If a resource supports the ability to distinguish between a request and a limit for a resource, +the quota tracking system will only cost the request value against the quota usage. If a resource +is tracked by quota, and no request value is provided, the associated entity is rejected as part of admission. -To optimize system performance, it is encouraged that all resource quotas are tracked on the same **ResourceQuota** document. As a result, -its encouraged to actually impose a cap on the total number of individual quotas that are tracked in the **Namespace** to 1 by explicitly -capping it in **ResourceQuota** document. +For an example, consider the following scenarios relative to tracking quota on CPU: -## kube-apiserver +| Pod | Container | Request CPU | Limit CPU | Result | +| --- | --------- | ----------- | --------- | ------ | +| X | C1 | 100m | 500m | The quota usage is incremented 100m | +| Y | C2 | 100m | none | The quota usage is incremented 100m | +| Y | C2 | none | 500m | The quota usage is incremented 500m since request will default to limit | +| Z | C3 | none | none | The pod is rejected since it does not enumerate a request. | -The server is updated to be aware of **ResourceQuota** objects. +The rationale for accounting for the requested amount of a resource versus the limit is the belief +that a user should only be charged for what they are scheduled against in the cluster. In addition, +attempting to track usage against actual usage, where request < actual < limit, is considered highly +volatile. -The quota is only enforced if the kube-apiserver is started as follows: +As a consequence of this decision, the user is able to spread its usage of a resource across multiple tiers +of service. Let's demonstrate this via an example with a 4 cpu quota. -```console -$ kube-apiserver -admission_control=ResourceQuota -``` +The quota may be allocated as follows: + +| Pod | Container | Request CPU | Limit CPU | Tier | Quota Usage | +| --- | --------- | ----------- | --------- | ---- | ----------- | +| X | C1 | 1 | 4 | Burstable | 1 | +| Y | C2 | 2 | 2 | Guaranteed | 2 | +| Z | C3 | 1 | 3 | Burstable | 1 | -## kube-controller-manager +It is possible that the pods may consume 9 cpu over a given time period depending on the nodes available cpu +that held pod X and Z, but since we scheduled X and Z relative to the request, we only track the requesting +value against their allocated quota. If one wants to restrict the ratio between the request and limit, +it is encouraged that the user define a **LimitRange** with **LimitRequestRatio** to control burst out behavior. +This would in effect, let an administrator keep the difference between request and limit more in line with +tracked usage if desired. -A new controller is defined that runs a synch loop to calculate quota usage across the namespace. +## Status API -**ResourceQuota** usage is only calculated if a namespace has a **ResourceQuota** object. +A REST API endpoint to update the status section of the **ResourceQuota** is exposed. It requires an atomic compare-and-swap +in order to keep resource usage tracking consistent. -If the observed usage is different than the recorded usage, the controller sends a **ResourceQuotaUsage** resource -to the server to atomically update. +## Resource Quota Controller -The synchronization loop frequency will control how quickly DELETE actions are recorded in the system and usage is ticked down. +A resource quota controller monitors observed usage for tracked resources in the **Namespace**. + +If there is observed difference between the current usage stats versus the current **ResourceQuota.Status**, the controller +posts an update of the currently observed usage metrics to the **ResourceQuota** via the /status endpoint. + +The resource quota controller is the only component capable of monitoring and recording usage updates after a DELETE operation +since admission control is incapable of guaranteeing a DELETE request actually succeeded. + +## AdmissionControl plugin: ResourceQuota + +The **ResourceQuota** plug-in introspects all incoming admission requests. + +To enable the plug-in and support for ResourceQuota, the kube-apiserver must be configured as follows: + +``` +$ kube-apiserver -admission_control=ResourceQuota +``` + +It makes decisions by evaluating the incoming object against all defined **ResourceQuota.Status.Hard** resource limits in the request +namespace. If acceptance of the resource would cause the total usage of a named resource to exceed its hard limit, the request is denied. + +If the incoming request does not cause the total usage to exceed any of the enumerated hard resource limits, the plug-in will post a +**ResourceQuota.Status** document to the server to atomically update the observed usage based on the previously read +**ResourceQuota.ResourceVersion**. This keeps incremental usage atomically consistent, but does introduce a bottleneck (intentionally) +into the system. -To optimize the synchronization loop, this controller will WATCH on Pod resources to track DELETE events, and in response, recalculate -usage. This is because a Pod deletion will have the most impact on observed cpu and memory usage in the system, and we anticipate -this being the resource most closely running at the prescribed quota limits. +To optimize system performance, it is encouraged that all resource quotas are tracked on the same **ResourceQuota** document in a **Namespace**. As a result, its encouraged to impose a cap on the total number of individual quotas that are tracked in the **Namespace** +to 1 in the **ResourceQuota** document. ## kubectl -- cgit v1.2.3 From 1a50eb50808fbbac7e657ef47bf25cc6181f1e2c Mon Sep 17 00:00:00 2001 From: goltermann Date: Wed, 5 Aug 2015 14:34:52 -0700 Subject: Add post v1.0 PR merge details. --- development.md | 1 + pull-requests.md | 17 +++++++++-------- 2 files changed, 10 insertions(+), 8 deletions(-) diff --git a/development.md b/development.md index 87b4b5d0..7fcd6a89 100644 --- a/development.md +++ b/development.md @@ -98,6 +98,7 @@ git push -f origin myfeature 1. Visit http://github.com/$YOUR_GITHUB_USERNAME/kubernetes 2. Click the "Compare and pull request" button next to your "myfeature" branch. +3. Check out the pull request [process](pull-requests.md) for more details ### When to retain commits and when to squash diff --git a/pull-requests.md b/pull-requests.md index e42faa51..6d2eb597 100644 --- a/pull-requests.md +++ b/pull-requests.md @@ -47,18 +47,19 @@ We want to limit the total number of PRs in flight to: * Remove old PRs that would be difficult to rebase as the underlying code has changed over time * Encourage code velocity -RC to v1.0 Pull Requests ------------------------- +Life of a Pull Request +---------------------- -Between the first RC build (~6/22) and v1.0, we will adopt a higher bar for PR merges. For v1.0 to be a stable release, we need to ensure that any fixes going in are very well tested and have a low risk of breaking anything. Refactors and complex changes will be rejected in favor of more strategic and smaller workarounds. +Unless in the last few weeks of a milestone when we need to reduce churn and stabilize, we aim to be always accepting pull requests. -These PRs require: -* A risk assessment by the code author in the PR. This should outline which parts of the code are being touched, the risk of regression, and complexity of the code. -* Two LGTMs from experienced reviewers. +Either the [on call](https://github.com/GoogleCloudPlatform/kubernetes/wiki/Kubernetes-on-call-rotation) manually or the [submit queue](../../contrib/submit-queue/) automatically will manage merging PRs. -Once those requirements are met, they will be labeled [ok-to-merge](https://github.com/GoogleCloudPlatform/kubernetes/pulls?utf8=%E2%9C%93&q=is%3Aopen+is%3Apr+label%3Aok-to-merge) and can be merged. +There are several requirements for the submit queue to work: +* Author must have signed CLA ("cla: yes" label added to PR) +* No changes can be made since last lgtm label was applied +* k8s-bot must have reported the GCE E2E build and test steps passed (Travis, Shippable and Jenkins build) -These restrictions will be relaxed after v1.0 is released. +Additionally, for infrequent or new contributors, we require the on call to apply the "ok-to-merge" label manually. This is gated by the [whitelist](../../contrib/submit-queue/whitelist.txt). -- cgit v1.2.3 From 1e074e74ea7b0ddef2f6b1726babe7397510cf84 Mon Sep 17 00:00:00 2001 From: Mike Danese Date: Wed, 5 Aug 2015 15:16:36 -0700 Subject: fixup development doc for new vanity path --- development.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/development.md b/development.md index 7fcd6a89..45463293 100644 --- a/development.md +++ b/development.md @@ -59,8 +59,8 @@ Below, we outline one of the more common git workflows that core developers use. The commands below require that you have $GOPATH set ([$GOPATH docs](https://golang.org/doc/code.html#GOPATH)). We highly recommend you put Kubernetes' code into your GOPATH. Note: the commands below will not work if there is more than one directory in your `$GOPATH`. ```sh -mkdir -p $GOPATH/src/github.com/GoogleCloudPlatform/ -cd $GOPATH/src/github.com/GoogleCloudPlatform/ +mkdir -p $GOPATH/src/k8s.io +cd $GOPATH/src/k8s.io # Replace "$YOUR_GITHUB_USERNAME" below with your github username git clone https://github.com/$YOUR_GITHUB_USERNAME/kubernetes.git cd kubernetes @@ -147,8 +147,8 @@ Here's a quick walkthrough of one way to use godeps to add or update a Kubernete ```sh export KPATH=$HOME/code/kubernetes -mkdir -p $KPATH/src/github.com/GoogleCloudPlatform/kubernetes -cd $KPATH/src/github.com/GoogleCloudPlatform/kubernetes +mkdir -p $KPATH/src/k8s.io/kubernetes +cd $KPATH/src/k8s.io/kubernetes git clone https://path/to/your/fork . # Or copy your existing local repo here. IMPORTANT: making a symlink doesn't work. ``` @@ -174,13 +174,13 @@ godep restore ```sh # To add a new dependency, do: -cd $KPATH/src/github.com/GoogleCloudPlatform/kubernetes +cd $KPATH/src/k8s.io/kubernetes go get path/to/dependency # Change code in Kubernetes to use the dependency. godep save ./... # To update an existing dependency, do: -cd $KPATH/src/github.com/GoogleCloudPlatform/kubernetes +cd $KPATH/src/k8s.io/kubernetes go get -u path/to/dependency # Change code in Kubernetes accordingly if necessary. godep update path/to/dependency @@ -224,7 +224,7 @@ $ cd pkg/kubelet $ godep go test # some output from unit tests PASS -ok github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet 0.317s +ok k8s.io/kubernetes/pkg/kubelet 0.317s ``` ## Coverage -- cgit v1.2.3 From a74ffb6a381cf9a7bd8282c8d9806bae41680f3d Mon Sep 17 00:00:00 2001 From: Mike Danese Date: Wed, 5 Aug 2015 18:08:26 -0700 Subject: rewrite all links to issues to k8s links --- access.md | 8 ++++---- admission_control.md | 2 +- command_execution_port_forwarding.md | 6 +++--- event_compression.md | 10 +++++----- identifiers.md | 2 +- principles.md | 2 +- resources.md | 4 ++-- secrets.md | 6 +++--- security_context.md | 2 +- 9 files changed, 21 insertions(+), 21 deletions(-) diff --git a/access.md b/access.md index d2fe44ca..92840f73 100644 --- a/access.md +++ b/access.md @@ -118,8 +118,8 @@ Pods configs should be largely portable between Org-run and hosted configuration # Design Related discussion: -- https://github.com/GoogleCloudPlatform/kubernetes/issues/442 -- https://github.com/GoogleCloudPlatform/kubernetes/issues/443 +- http://issue.k8s.io/442 +- http://issue.k8s.io/443 This doc describes two security profiles: - Simple profile: like single-user mode. Make it easy to evaluate K8s without lots of configuring accounts and policies. Protects from unauthorized users, but does not partition authorized users. @@ -176,7 +176,7 @@ Initially: Improvements: - Kubelet allocates disjoint blocks of root-namespace uids for each container. This may provide some defense-in-depth against container escapes. (https://github.com/docker/docker/pull/4572) - requires docker to integrate user namespace support, and deciding what getpwnam() does for these uids. -- any features that help users avoid use of privileged containers (https://github.com/GoogleCloudPlatform/kubernetes/issues/391) +- any features that help users avoid use of privileged containers (http://issue.k8s.io/391) ### Namespaces @@ -253,7 +253,7 @@ Policy objects may be applicable only to a single namespace or to all namespaces ## Accounting -The API should have a `quota` concept (see https://github.com/GoogleCloudPlatform/kubernetes/issues/442). A quota object relates a namespace (and optionally a label selector) to a maximum quantity of resources that may be used (see [resources design doc](resources.md)). +The API should have a `quota` concept (see http://issue.k8s.io/442). A quota object relates a namespace (and optionally a label selector) to a maximum quantity of resources that may be used (see [resources design doc](resources.md)). Initially: - a `quota` object is immutable. diff --git a/admission_control.md b/admission_control.md index b84b2543..9245aa7d 100644 --- a/admission_control.md +++ b/admission_control.md @@ -37,7 +37,7 @@ Documentation for other releases can be found at | Topic | Link | | ----- | ---- | -| Separate validation from RESTStorage | https://github.com/GoogleCloudPlatform/kubernetes/issues/2977 | +| Separate validation from RESTStorage | http://issue.k8s.io/2977 | ## Background diff --git a/command_execution_port_forwarding.md b/command_execution_port_forwarding.md index 1d319adf..852e761e 100644 --- a/command_execution_port_forwarding.md +++ b/command_execution_port_forwarding.md @@ -44,9 +44,9 @@ This describes an approach for providing support for: There are several related issues/PRs: -- [Support attach](https://github.com/GoogleCloudPlatform/kubernetes/issues/1521) -- [Real container ssh](https://github.com/GoogleCloudPlatform/kubernetes/issues/1513) -- [Provide easy debug network access to services](https://github.com/GoogleCloudPlatform/kubernetes/issues/1863) +- [Support attach](http://issue.k8s.io/1521) +- [Real container ssh](http://issue.k8s.io/1513) +- [Provide easy debug network access to services](http://issue.k8s.io/1863) - [OpenShift container command execution proposal](https://github.com/openshift/origin/pull/576) ## Motivation diff --git a/event_compression.md b/event_compression.md index ce8d1ad4..b14d5206 100644 --- a/event_compression.md +++ b/event_compression.md @@ -38,7 +38,7 @@ This document captures the design of event compression. ## Background -Kubernetes components can get into a state where they generate tons of events which are identical except for the timestamp. For example, when pulling a non-existing image, Kubelet will repeatedly generate `image_not_existing` and `container_is_waiting` events until upstream components correct the image. When this happens, the spam from the repeated events makes the entire event mechanism useless. It also appears to cause memory pressure in etcd (see [#3853](https://github.com/GoogleCloudPlatform/kubernetes/issues/3853)). +Kubernetes components can get into a state where they generate tons of events which are identical except for the timestamp. For example, when pulling a non-existing image, Kubelet will repeatedly generate `image_not_existing` and `container_is_waiting` events until upstream components correct the image. When this happens, the spam from the repeated events makes the entire event mechanism useless. It also appears to cause memory pressure in etcd (see [#3853](http://issue.k8s.io/3853)). ## Proposal @@ -109,10 +109,10 @@ This demonstrates what would have been 20 separate entries (indicating schedulin ## Related Pull Requests/Issues - * Issue [#4073](https://github.com/GoogleCloudPlatform/kubernetes/issues/4073): Compress duplicate events - * PR [#4157](https://github.com/GoogleCloudPlatform/kubernetes/issues/4157): Add "Update Event" to Kubernetes API - * PR [#4206](https://github.com/GoogleCloudPlatform/kubernetes/issues/4206): Modify Event struct to allow compressing multiple recurring events in to a single event - * PR [#4306](https://github.com/GoogleCloudPlatform/kubernetes/issues/4306): Compress recurring events in to a single event to optimize etcd storage + * Issue [#4073](http://issue.k8s.io/4073): Compress duplicate events + * PR [#4157](http://issue.k8s.io/4157): Add "Update Event" to Kubernetes API + * PR [#4206](http://issue.k8s.io/4206): Modify Event struct to allow compressing multiple recurring events in to a single event + * PR [#4306](http://issue.k8s.io/4306): Compress recurring events in to a single event to optimize etcd storage * PR [#4444](https://github.com/GoogleCloudPlatform/kubernetes/pull/4444): Switch events history to use LRU cache instead of map diff --git a/identifiers.md b/identifiers.md index 9e269993..7deff9e9 100644 --- a/identifiers.md +++ b/identifiers.md @@ -33,7 +33,7 @@ Documentation for other releases can be found at # Identifiers and Names in Kubernetes -A summarization of the goals and recommendations for identifiers in Kubernetes. Described in [GitHub issue #199](https://github.com/GoogleCloudPlatform/kubernetes/issues/199). +A summarization of the goals and recommendations for identifiers in Kubernetes. Described in [GitHub issue #199](http://issue.k8s.io/199). ## Definitions diff --git a/principles.md b/principles.md index 23a20349..be3dff55 100644 --- a/principles.md +++ b/principles.md @@ -70,7 +70,7 @@ TODO: pluggability ## Bootstrapping -* [Self-hosting](https://github.com/GoogleCloudPlatform/kubernetes/issues/246) of all components is a goal. +* [Self-hosting](http://issue.k8s.io/246) of all components is a goal. * Minimize the number of dependencies, particularly those required for steady-state operation. * Stratify the dependencies that remain via principled layering. * Break any circular dependencies by converting hard dependencies to soft dependencies. diff --git a/resources.md b/resources.md index e006d44d..fe6f0ec7 100644 --- a/resources.md +++ b/resources.md @@ -33,7 +33,7 @@ Documentation for other releases can be found at **Note: this is a design doc, which describes features that have not been completely implemented. User documentation of the current state is [here](../user-guide/compute-resources.md). The tracking issue for implementation of this model is -[#168](https://github.com/GoogleCloudPlatform/kubernetes/issues/168). Currently, only memory and +[#168](http://issue.k8s.io/168). Currently, only memory and cpu limits on containers (not pods) are supported. "memory" is in bytes and "cpu" is in milli-cores.** @@ -134,7 +134,7 @@ The following resource types are predefined ("reserved") by Kubernetes in the `k * Units: Kubernetes Compute Unit seconds/second (i.e., CPU cores normalized to a canonical "Kubernetes CPU") * Internal representation: milli-KCUs * Compressible? yes - * Qualities: this is a placeholder for the kind of thing that may be supported in the future — see [#147](https://github.com/GoogleCloudPlatform/kubernetes/issues/147) + * Qualities: this is a placeholder for the kind of thing that may be supported in the future — see [#147](http://issue.k8s.io/147) * [future] `schedulingLatency`: as per lmctfy * [future] `cpuConversionFactor`: property of a node: the speed of a CPU core on the node's processor divided by the speed of the canonical Kubernetes CPU (a floating point value; default = 1.0). diff --git a/secrets.md b/secrets.md index 3adc57af..350d151b 100644 --- a/secrets.md +++ b/secrets.md @@ -119,7 +119,7 @@ which consumes this type of secret, the Kubelet may take a number of actions: file system 2. Configure that node's `kube-proxy` to decorate HTTP requests from that pod to the `kubernetes-master` service with the auth token, e. g. by adding a header to the request - (see the [LOAS Daemon](https://github.com/GoogleCloudPlatform/kubernetes/issues/2209) proposal) + (see the [LOAS Daemon](http://issue.k8s.io/2209) proposal) #### Example: service account consumes docker registry credentials @@ -263,11 +263,11 @@ the right storage size for their installation and configuring their Kubelets cor Configuring each Kubelet is not the ideal story for operator experience; it is more intuitive that the cluster-wide storage size be readable from a central configuration store like the one proposed -in [#1553](https://github.com/GoogleCloudPlatform/kubernetes/issues/1553). When such a store +in [#1553](http://issue.k8s.io/1553). When such a store exists, the Kubelet could be modified to read this configuration item from the store. When the Kubelet is modified to advertise node resources (as proposed in -[#4441](https://github.com/GoogleCloudPlatform/kubernetes/issues/4441)), the capacity calculation +[#4441](http://issue.k8s.io/4441)), the capacity calculation for available memory should factor in the potential size of the node-level tmpfs in order to avoid memory overcommit on the node. diff --git a/security_context.md b/security_context.md index 7a80c01d..4704caab 100644 --- a/security_context.md +++ b/security_context.md @@ -42,7 +42,7 @@ A security context is a set of constraints that are applied to a container in or ## Background -The problem of securing containers in Kubernetes has come up [before](https://github.com/GoogleCloudPlatform/kubernetes/issues/398) and the potential problems with container security are [well known](http://opensource.com/business/14/7/docker-security-selinux). Although it is not possible to completely isolate Docker containers from their hosts, new features like [user namespaces](https://github.com/docker/libcontainer/pull/304) make it possible to greatly reduce the attack surface. +The problem of securing containers in Kubernetes has come up [before](http://issue.k8s.io/398) and the potential problems with container security are [well known](http://opensource.com/business/14/7/docker-security-selinux). Although it is not possible to completely isolate Docker containers from their hosts, new features like [user namespaces](https://github.com/docker/libcontainer/pull/304) make it possible to greatly reduce the attack surface. ## Motivation -- cgit v1.2.3 From 09d971bc58179999aea2545bd2b922a6f170a3ef Mon Sep 17 00:00:00 2001 From: Mike Danese Date: Wed, 5 Aug 2015 18:09:50 -0700 Subject: rewrite all links to prs to k8s links --- event_compression.md | 2 +- security.md | 4 ++-- security_context.md | 2 +- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/event_compression.md b/event_compression.md index b14d5206..1187edb6 100644 --- a/event_compression.md +++ b/event_compression.md @@ -113,7 +113,7 @@ This demonstrates what would have been 20 separate entries (indicating schedulin * PR [#4157](http://issue.k8s.io/4157): Add "Update Event" to Kubernetes API * PR [#4206](http://issue.k8s.io/4206): Modify Event struct to allow compressing multiple recurring events in to a single event * PR [#4306](http://issue.k8s.io/4306): Compress recurring events in to a single event to optimize etcd storage - * PR [#4444](https://github.com/GoogleCloudPlatform/kubernetes/pull/4444): Switch events history to use LRU cache instead of map + * PR [#4444](http://pr.k8s.io/4444): Switch events history to use LRU cache instead of map diff --git a/security.md b/security.md index 1d73a529..5c187d69 100644 --- a/security.md +++ b/security.md @@ -127,11 +127,11 @@ A pod runs in a *security context* under a *service account* that is defined by ### Related design discussion * [Authorization and authentication](access.md) -* [Secret distribution via files](https://github.com/GoogleCloudPlatform/kubernetes/pull/2030) +* [Secret distribution via files](http://pr.k8s.io/2030) * [Docker secrets](https://github.com/docker/docker/pull/6697) * [Docker vault](https://github.com/docker/docker/issues/10310) * [Service Accounts:](service_accounts.md) -* [Secret volumes](https://github.com/GoogleCloudPlatform/kubernetes/pull/4126) +* [Secret volumes](http://pr.k8s.io/4126) ## Specific Design Points diff --git a/security_context.md b/security_context.md index 4704caab..1d2b4f71 100644 --- a/security_context.md +++ b/security_context.md @@ -192,7 +192,7 @@ It is up to an admission plugin to determine if the security context is acceptab time of writing, the admission control plugin for security contexts will only allow a context that has defined capabilities or privileged. Contexts that attempt to define a UID or SELinux options will be denied by default. In the future the admission plugin will base this decision upon -configurable policies that reside within the [service account](https://github.com/GoogleCloudPlatform/kubernetes/pull/2297). +configurable policies that reside within the [service account](http://pr.k8s.io/2297). -- cgit v1.2.3 From 14eebfb1c97357b7f5defc2b9b14ae531bf58cef Mon Sep 17 00:00:00 2001 From: Mike Danese Date: Wed, 5 Aug 2015 18:08:26 -0700 Subject: rewrite all links to issues to k8s links --- autoscaling.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/autoscaling.md b/autoscaling.md index ff50aa97..9c5ec752 100644 --- a/autoscaling.md +++ b/autoscaling.md @@ -46,18 +46,18 @@ done automatically based on statistical analysis and thresholds. * Provide a concrete proposal for implementing auto-scaling pods within Kubernetes * Implementation proposal should be in line with current discussions in existing issues: - * Scale verb - [1629](https://github.com/GoogleCloudPlatform/kubernetes/issues/1629) + * Scale verb - [1629](http://issue.k8s.io/1629) * Config conflicts - [Config](https://github.com/GoogleCloudPlatform/kubernetes/blob/c7cb991987193d4ca33544137a5cb7d0292cf7df/docs/config.md#automated-re-configuration-processes) - * Rolling updates - [1353](https://github.com/GoogleCloudPlatform/kubernetes/issues/1353) - * Multiple scalable types - [1624](https://github.com/GoogleCloudPlatform/kubernetes/issues/1624) + * Rolling updates - [1353](http://issue.k8s.io/1353) + * Multiple scalable types - [1624](http://issue.k8s.io/1624) ## Constraints and Assumptions -* This proposal is for horizontal scaling only. Vertical scaling will be handled in [issue 2072](https://github.com/GoogleCloudPlatform/kubernetes/issues/2072) +* This proposal is for horizontal scaling only. Vertical scaling will be handled in [issue 2072](http://issue.k8s.io/2072) * `ReplicationControllers` will not know about the auto-scaler, they are the target of the auto-scaler. The `ReplicationController` responsibilities are constrained to only ensuring that the desired number of pods are operational per the [Replication Controller Design](../user-guide/replication-controller.md#responsibilities-of-the-replication-controller) * Auto-scalers will be loosely coupled with data gathering components in order to allow a wide variety of input sources -* Auto-scalable resources will support a scale verb ([1629](https://github.com/GoogleCloudPlatform/kubernetes/issues/1629)) +* Auto-scalable resources will support a scale verb ([1629](http://issue.k8s.io/1629)) such that the auto-scaler does not directly manipulate the underlying resource. * Initially, most thresholds will be set by application administrators. It should be possible for an autoscaler to be written later that sets thresholds automatically based on past behavior (CPU used vs incoming requests). @@ -120,7 +120,7 @@ Since an auto-scaler is a durable object it is best represented as a resource. type AutoScalerInterface interface { //ScaleApplication adjusts a resource's replica count. Calls scale endpoint. //Args to this are based on what the endpoint - //can support. See https://github.com/GoogleCloudPlatform/kubernetes/issues/1629 + //can support. See http://issue.k8s.io/1629 ScaleApplication(num int) error } -- cgit v1.2.3 From 4c0410dd60e9dbbfc4d71f5a2f2f7e00c0f1a0b2 Mon Sep 17 00:00:00 2001 From: Mike Danese Date: Wed, 5 Aug 2015 18:08:26 -0700 Subject: rewrite all links to issues to k8s links --- api-conventions.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/api-conventions.md b/api-conventions.md index 5a1bfe81..bdd38830 100644 --- a/api-conventions.md +++ b/api-conventions.md @@ -195,7 +195,7 @@ References in the status of the referee to the referrer may be permitted, when t #### Lists of named subobjects preferred over maps -Discussed in [#2004](https://github.com/GoogleCloudPlatform/kubernetes/issues/2004) and elsewhere. There are no maps of subobjects in any API objects. Instead, the convention is to use a list of subobjects containing name fields. +Discussed in [#2004](http://issue.k8s.io/2004) and elsewhere. There are no maps of subobjects in any API objects. Instead, the convention is to use a list of subobjects containing name fields. For example: -- cgit v1.2.3 From ecf3f1ba5e7d8399acd5a631c810816d2c9b4fca Mon Sep 17 00:00:00 2001 From: Paul Morie Date: Thu, 6 Aug 2015 00:53:01 -0400 Subject: Fix typo in security context proposal --- security_context.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/security_context.md b/security_context.md index 7a80c01d..6f0b92b0 100644 --- a/security_context.md +++ b/security_context.md @@ -145,7 +145,7 @@ A security context resides on the container and represents the runtime parameter be used to create and run the container via container APIs. Following is an example of an initial implementation: ```go -type type Container struct { +type Container struct { ... other fields omitted ... // Optional: SecurityContext defines the security options the pod should be run with SecurityContext *SecurityContext -- cgit v1.2.3 From 2414459b8225d0b3702b3e232b5da3376631eddb Mon Sep 17 00:00:00 2001 From: derekwaynecarr Date: Thu, 6 Aug 2015 10:58:55 -0400 Subject: Update design for LimitRange to handle requests --- admission_control_limit_range.md | 183 ++++++++++++++++++++++++--------------- 1 file changed, 114 insertions(+), 69 deletions(-) diff --git a/admission_control_limit_range.md b/admission_control_limit_range.md index 595d72e9..885ef664 100644 --- a/admission_control_limit_range.md +++ b/admission_control_limit_range.md @@ -35,139 +35,184 @@ Documentation for other releases can be found at ## Background -This document proposes a system for enforcing min/max limits per resource as part of admission control. +This document proposes a system for enforcing resource requirements constraints as part of admission control. -## Model Changes +## Use cases -A new resource, **LimitRange**, is introduced to enumerate min/max limits for a resource type scoped to a -Kubernetes namespace. +1. Ability to enumerate resource requirement constraints per namespace +2. Ability to enumerate min/max resource constraints for a pod +3. Ability to enumerate min/max resource constraints for a container +4. Ability to specify default resource limits for a container +5. Ability to specify default resource requests for a container +6. Ability to enforce a ratio between request and limit for a resource. + +## Data Model + +The **LimitRange** resource is scoped to a **Namespace**. + +### Type ```go +// A type of object that is limited +type LimitType string + const ( // Limit that applies to all pods in a namespace - LimitTypePod string = "Pod" + LimitTypePod LimitType = "Pod" // Limit that applies to all containers in a namespace - LimitTypeContainer string = "Container" + LimitTypeContainer LimitType = "Container" ) // LimitRangeItem defines a min/max usage limit for any resource that matches on kind type LimitRangeItem struct { // Type of resource that this limit applies to - Type string `json:"type,omitempty"` + Type LimitType `json:"type,omitempty" description:"type of resource that this limit applies to"` // Max usage constraints on this kind by resource name - Max ResourceList `json:"max,omitempty"` + Max ResourceList `json:"max,omitempty" description:"max usage constraints on this kind by resource name"` // Min usage constraints on this kind by resource name - Min ResourceList `json:"min,omitempty"` - // Default usage constraints on this kind by resource name - Default ResourceList `json:"default,omitempty"` + Min ResourceList `json:"min,omitempty" description:"min usage constraints on this kind by resource name"` + // Default resource limits on this kind by resource name + Default ResourceList `json:"default,omitempty" description:"default resource limits values on this kind by resource name if omitted"` + // DefaultRequests resource requests on this kind by resource name + DefaultRequests ResourceList `json:"defaultRequests,omitempty" description:"default resource requests values on this kind by resource name if omitted"` + // LimitRequestRatio is the ratio of limit over request that is the maximum allowed burst for the named resource + LimitRequestRatio ResourceList `json:"limitRequestRatio,omitempty" description:"the ratio of limit over request that is the maximum allowed burst for the named resource. if specified, the named resource must have a request and limit that are both non-zero where limit divided by request is less than or equal to the enumerated value"` } // LimitRangeSpec defines a min/max usage limit for resources that match on kind type LimitRangeSpec struct { // Limits is the list of LimitRangeItem objects that are enforced - Limits []LimitRangeItem `json:"limits"` + Limits []LimitRangeItem `json:"limits" description:"limits is the list of LimitRangeItem objects that are enforced"` } // LimitRange sets resource usage limits for each kind of resource in a Namespace type LimitRange struct { TypeMeta `json:",inline"` - ObjectMeta `json:"metadata,omitempty"` + ObjectMeta `json:"metadata,omitempty" description:"standard object metadata; see http://releases.k8s.io/HEAD/docs/devel/api-conventions.md#metadata"` // Spec defines the limits enforced - Spec LimitRangeSpec `json:"spec,omitempty"` + Spec LimitRangeSpec `json:"spec,omitempty" description:"spec defines the limits enforced; http://releases.k8s.io/HEAD/docs/devel/api-conventions.md#spec-and-status"` } // LimitRangeList is a list of LimitRange items. type LimitRangeList struct { TypeMeta `json:",inline"` - ListMeta `json:"metadata,omitempty"` + ListMeta `json:"metadata,omitempty" description:"standard list metadata; see http://releases.k8s.io/HEAD/docs/devel/api-conventions.md#metadata"` // Items is a list of LimitRange objects - Items []LimitRange `json:"items"` + Items []LimitRange `json:"items" description:"items is a list of LimitRange objects; see http://releases.k8s.io/HEAD/docs/design/admission_control_limit_range.md"` } ``` -## AdmissionControl plugin: LimitRanger +### Validation -The **LimitRanger** plug-in introspects all incoming admission requests. +Validation of a **LimitRange** enforces that for a given named resource the following rules apply: -It makes decisions by evaluating the incoming object against all defined **LimitRange** objects in the request context namespace. +Min (if specified) <= DefaultRequests (if specified) <= Default (if specified) <= Max (if specified) -The following min/max limits are imposed: +### Default Value Behavior -**Type: Container** +The following default value behaviors are applied to a LimitRange for a given named resource. -| ResourceName | Description | -| ------------ | ----------- | -| cpu | Min/Max amount of cpu per container | -| memory | Min/Max amount of memory per container | +``` +if LimitRangeItem.Default[resourceName] is undefined + if LimitRangeItem.Max[resourceName] is defined + LimitRangeItem.Default[resourceName] = LimitRangeItem.Max[resourceName] +``` -**Type: Pod** +``` +if LimitRangeItem.DefaultRequests[resourceName] is undefined + if LimitRangeItem.Default[resourceName] is defined + LimitRangeItem.DefaultRequests[resourceName] = LimitRangeItem.Default[resourceName] + else if LimitRangeItem.Min[resourceName] is defined + LimitRangeItem.DefaultRequests[resourceName] = LimitRangeItem.Min[resourceName] +``` -| ResourceName | Description | -| ------------ | ----------- | -| cpu | Min/Max amount of cpu per pod | -| memory | Min/Max amount of memory per pod | +## AdmissionControl plugin: LimitRanger -If a resource specifies a default value, it may get applied on the incoming resource. For example, if a default -value is provided for container cpu, it is set on the incoming container if and only if the incoming container -does not specify a resource requirements limit field. +The **LimitRanger** plug-in introspects all incoming pod requests and evaluates the constraints defined on a LimitRange. -If a resource specifies a min value, it may get applied on the incoming resource. For example, if a min -value is provided for container cpu, it is set on the incoming container if and only if the incoming container does -not specify a resource requirements requests field. +If a constraint is not specified for an enumerated resource, it is not enforced or tracked. -If the incoming object would cause a violation of the enumerated constraints, the request is denied with a set of -messages explaining what constraints were the source of the denial. +To enable the plug-in and support for LimitRange, the kube-apiserver must be configured as follows: -If a constraint is not enumerated by a **LimitRange** it is not tracked. +```console +$ kube-apiserver -admission_control=LimitRanger +``` -## kube-apiserver +### Enforcement of constraints -The server is updated to be aware of **LimitRange** objects. +**Type: Container** -The constraints are only enforced if the kube-apiserver is started as follows: +Supported Resources: -```console -$ kube-apiserver -admission_control=LimitRanger -``` +1. memory +2. cpu -## kubectl +Supported Constraints: -kubectl is modified to support the **LimitRange** resource. +Per container, the following must hold true -`kubectl describe` provides a human-readable output of limits. +| Constraint | Behavior | +| ---------- | -------- | +| Min | Min <= Request (required) <= Limit (optional) | +| Max | Limit (required) <= Max | +| LimitRequestRatio | LimitRequestRatio <= ( Limit (required, non-zero) / Request (required, non-zero)) | -For example, +Supported Defaults: -```console -$ kubectl namespace myspace -$ kubectl create -f docs/admin/limitrange/limits.yaml -$ kubectl get limits -NAME -limits -$ kubectl describe limits limits -Name: limits -Type Resource Min Max Default ----- -------- --- --- --- -Pod memory 1Mi 1Gi - -Pod cpu 250m 2 - -Container memory 1Mi 1Gi 1Mi -Container cpu 250m 250m 250m -``` +1. Default - if the named resource has no enumerated value, the Limit is equal to the Default +2. DefaultRequest - if the named resource has no enumerated value, the Request is equal to the DefaultRequest + +**Type: Pod** + +Supported Resources: -## Future Enhancements: Define limits for a particular pod or container. +1. memory +2. cpu -In the current proposal, the **LimitRangeItem** matches purely on **LimitRangeItem.Type** +Supported Constraints: -It is expected we will want to define limits for particular pods or containers by name/uid and label/field selector. +Across all containers in pod, the following must hold true -To make a **LimitRangeItem** more restrictive, we will intend to add these additional restrictions at a future point in time. +| Constraint | Behavior | +| ---------- | -------- | +| Min | Min <= Request (required) <= Limit (optional) | +| Max | Limit (required) <= Max | +| LimitRequestRatio | LimitRequestRatio <= ( Limit (required, non-zero) / Request (non-zero) ) | + +## Run-time configuration + +The default ```LimitRange``` that is applied via Salt configuration will be updated as follows: + +``` +apiVersion: "v1" +kind: "LimitRange" +metadata: + name: "limits" + namespace: default +spec: + limits: + - type: "Container" + defaultRequests: + cpu: "100m" +``` ## Example -See the [example of Limit Range](../admin/limitrange/) for more information. +An example LimitRange configuration: + +| Type | Resource | Min | Max | Default | DefaultRequest | LimitRequestRatio | +| ---- | -------- | --- | --- | ------- | -------------- | ----------------- | +| Container | cpu | .1 | 1 | 500m | 250m | 4 | +| Container | memory | 250Mi | 1Gi | 500Mi | 250Mi | | + +Assuming an incoming container that specified no incoming resource requirements, +the following would happen. +1. The incoming container cpu would request 250m with a limit of 500m. +2. The incoming container memory would request 250Mi with a limit of 500Mi +3. If the container is later resized, it's cpu would be constrained to between .1 and 1 and the ratio of limit to request could not exceed 4. [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/admission_control_limit_range.md?pixel)]() -- cgit v1.2.3 From 56b54ec64f3062926424ddf36cac20ebdc983b37 Mon Sep 17 00:00:00 2001 From: Ben McCann Date: Fri, 7 Aug 2015 00:13:15 -0700 Subject: Fix the architecture diagram such that the arrow from the api server to the node doesn't go through/under etcd --- architecture.dia | Bin 6522 -> 6519 bytes architecture.png | Bin 222407 -> 223860 bytes architecture.svg | 98 +++++++++++++++++++++++++++---------------------------- 3 files changed, 49 insertions(+), 49 deletions(-) diff --git a/architecture.dia b/architecture.dia index 26e0eed2..441e3563 100644 Binary files a/architecture.dia and b/architecture.dia differ diff --git a/architecture.png b/architecture.png index fa39039a..b03cfe88 100644 Binary files a/architecture.png and b/architecture.png differ diff --git a/architecture.svg b/architecture.svg index 825c0ace..cacc7fbf 100644 --- a/architecture.svg +++ b/architecture.svg @@ -153,9 +153,9 @@ - - - + + + @@ -168,7 +168,7 @@ - + @@ -181,24 +181,24 @@ - - - - replication controller + + + + replication controller - - - - Scheduler + + + + Scheduler - - - - Scheduler + + + + Scheduler @@ -206,11 +206,11 @@ Colocated, or spread across machines, as dictated by cluster size. - - + + - - + + @@ -241,19 +241,19 @@ APIs - - - + + + - - - + + + - - - + + + @@ -261,11 +261,11 @@ - - - - - + + + + + @@ -295,10 +295,10 @@ - + .. - + ... @@ -311,7 +311,7 @@ - + @@ -447,7 +447,7 @@ - + @@ -459,10 +459,10 @@ - + .. - + ... @@ -475,7 +475,7 @@ - + @@ -486,14 +486,14 @@ - - - - Distributed - Watchable - Storage - - (implemented via etcd) + + + + Distributed + Watchable + Storage + + (implemented via etcd) -- cgit v1.2.3 From 7749c3d1b6d8b100c48448c871e857a97faf3ea4 Mon Sep 17 00:00:00 2001 From: Ananya Kumar Date: Wed, 22 Jul 2015 14:43:42 -0700 Subject: Add qos proposal --- resource-qos.md | 192 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 192 insertions(+) create mode 100644 resource-qos.md diff --git a/resource-qos.md b/resource-qos.md new file mode 100644 index 00000000..6d7ddcce --- /dev/null +++ b/resource-qos.md @@ -0,0 +1,192 @@ + + + + +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest 1.0.x release of this document can be found +[here](http://releases.k8s.io/release-1.0/docs/proposals/resource-qos.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- + + + + + +# Resource Quality of Service in Kubernetes + +**Author**: Ananya Kumar (@AnanyaKumar) + +**Status**: Draft proposal; prototype in progress. + +*This document presents the design of resource quality of service for containers in Kubernetes, and describes use cases and implementation details.* + +## Motivation + +Kubernetes allocates resources to containers in a simple way. Users can specify resource limits for containers. For example, a user can specify a 1gb memory limit for a container. The scheduler uses resource limits to schedule containers (technically, the scheduler schedules pods comprised of containers). For example, the scheduler will not place 5 containers with a 1gb memory limit onto a machine with 4gb memory. Currently, Kubernetes does not have robust mechanisms to ensure that containers run reliably on an overcommitted system. + +In the current implementation, **if users specify limits for every container, cluster utilization is poor**. Containers often don’t use all the resources that they request which leads to a lot of wasted resources. For example, we might have 4 containers, each reserving 1GB of memory in a node with 4GB memory but only using 500MB of memory. Theoretically, we could fit more containers on the node, but Kubernetes will not schedule new pods (with specified limits) on the node. + +A possible solution is to launch containers without specified limits - containers that don't ask for any resource guarantees. But **containers with limits specified are not very well protected from containers without limits specified**. If a container without a specified memory limit goes overboard and uses lots of memory, other containers (with specified memory limits) might be killed. This is bad, because users often want a way to launch containers that have resources guarantees, and that stay up reliably. + +This proposal provides mechanisms for oversubscribing nodes while maintaining resource guarantees, by allowing containers to specify levels of resource guarantees. Containers will be able to *request* for a minimum resource guarantee. The *request* is different from the *limit* - containers will not be allowed to exceed resource limits. With this change, users can launch *best-effort* containers with 0 request. Best-effort containers use resources only if not being used by other containers, and can be used for resource-scavenging. Supporting best-effort containers in Borg increased utilization by about 20%, and we hope to see similar improvements in Kubernetes. + +## Requests and Limits + +Note: this section describes the functionality that QoS should eventually provide. Due to implementation issues, providing some of these guarantees, while maintaining our broader goals of efficient cluster utilization, is difficult. Later sections will go into the nuances of how the functionality will be achieved, and limitations of the initial implementation. + +For each resource, containers can specify a resource request and limit, 0 <= request <= limit <= Infinity. If the container is successfully scheduled, the container is guaranteed the amount of resource requested. The container will not be allowed to exceed the specified limit. How the request and limit are enforced depends on whether the resource is [compressible or incompressible](../../docs/design/resources.md). + +### Compressible Resource Guarantees + +- For now, we are only supporting CPU. +- Containers are guaranteed to get the amount of CPU they request, they may or may not get additional CPU time (depending on the other jobs running). +- Excess CPU resources will be distributed based on the amount of CPU requested. For example, suppose container A requests for 60% of the CPU, and container B requests for 30% of the CPU. Suppose that both containers are trying to use as much CPU as they can. Then the extra 10% of CPU will be distributed to A and B in a 2:1 ratio (implementation discussed in later sections). +- Containers will be throttled if they exceed their limit. If limit is unspecified, then the containers can use excess CPU when available. + +### Incompressible Resource Guarantees + +- For now, we are only supporting memory. +- Containers will get the amount of memory they request, if they exceed their memory request, they could be killed (if some other container needs memory), but if containers consume fewer resources than requested, they will not be killed (except in cases where system tasks or daemons need more memory). +- Containers will be killed if they use more memory than their limit. + +### Kubelet Admission Policy + +- Pods will be admitted by Kubelet based on the sum of requests of its containers. The Kubelet will ensure that sum of requests of all containers (over all pods) is within the system’s resource (for both memory and CPU). + +## QoS Classes + +In an overcommitted system (where sum of requests > machine capacity) containers might eventually have to be killed, for example if the system runs out of CPU or memory resources. Ideally, we should kill containers that are less important. For each resource, we divide containers into 3 QoS classes: *Guaranteed*, *Burstable*, and *Best-Effort*, in decreasing order of priority. + +The relationship between "Requests and Limits" and "QoS Classes" is subtle. Theoretically, the policy of classifying containers into QoS classes is orthogonal to the requests and limits specified for the container. Hypothetically, users could use an (currently unplanned) API to specify whether a container is guaranteed or best-effort. However, in this proposal, the policy of classifying containers into QoS classes is intimately tied to "Requests and Limits" - in fact, QoS classes are used to implement some of the memory guarantees described in the previous section. + +For each resource, containers will be split into 3 different classes +- For now, we will only focus on memory. Containers will not be killed if CPU guarantees cannot be met (for example if system tasks or daemons take up lots of CPU), they will be temporarily throttled. +- Containers with a 0 memory request are classified as memory *Best-Effort*. These containers are not requesting resource guarantees, and will be treated as lowest priority (processes in these containers are the first to get killed if the system runs out of memory). +- Containers with the same request and limit and non-zero request are classified as memory *Guaranteed*. These containers ask for a well-defined amount of the resource and are considered top-priority (with respect to memory usage). +- All other containers are memory *Burstable* - middle priority containers that have some form of minimal resource guarantee, but can use more resources when available. +- In the current policy and implementation, best-effort containers are technically a subset of Burstable containers (where the request is 0), but they are a very important special case. Memory best-effort containers don't ask for any resource guarantees so they can utilize unused resources in a cluster (resource scavenging). + +### Alternative QoS Class Policy + +An alternative is to have user-specified numerical priorities that guide Kubelet on which tasks to kill (if the node runs out of memory, lower priority tasks will be killed). A strict hierarchy of user-specified numerical priorities is not desirable because: + +1. Achieved behavior would be emergent based on how users assigned priorities to their containers. No particular SLO could be delivered by the system, and usage would be subject to gaming if not restricted administratively +2. Changes to desired priority bands would require changes to all user container configurations. + +## Implementation + +### To implement requests (PR #12035): + +API changes for request +- Default request to limit, if limit is specified but request is not (api/v1/defaults.go) +- Add validation code that checks request <= limit, and validation test cases (api/validation/validation.go) + +Scheduler Changes +- Use requests instead of limits in CheckPodsExceedingCapacity and PodFitsResources (scheduler/algorithm/predicates.go) + +Container Manager Changes +- Use requests to assign CPU shares for Docker (kubelet/dockertools/container_manager.go) +- RKT changes will be implemented in a later iteration + +### QoS Classes (PR #12182): + +For now, we will be implementing QoS classes using OOM scores. However, system OOM kills are expensive, and without kernel modifications we cannot rely on system OOM kills to enforce burstable class guarantees. Eventually, we will need to layer control loops on top of OOM score assignment. + +Add kubelet/qos/policy.go +- Decides which memory QoS class a container is in (based on the policy described above) +- Decides what OOM score all processes in a container should get + +Change memory overcommit mode +- Right now overcommit mode is off on the machines we set up, so if there isn’t enough memory malloc will return null. This prevents QoS, because best-effort containers won’t be killed. Instead, when there isn’t enough memory, and guaranteed containers call malloc, they may not get the memory they want. We want memory guaranteed containers to get the memory they request, and force out memory best-effort containers. +- Change the memory overcommit mode to 1, so that using excess memory starts the OOM killer. The implication is that malloc won't return null, a process will be killed instead. + +Container OOM score configuration +- We’re focusing on Docker in this implementation (not RKT) +- OOM scores + - Note that the OOM score of a process is 10 times the % of memory the process consumes, adjusted by OOM_SCORE_ADJ, barring exceptions (e.g. process is launched by root). Processes with higher OOM scores are killed. + - The base OOM score is between 0 and 1000, so if process A’s OOM_SCORE_ADJ - process B’s OOM_SCORE_ADJ is over a 1000, then process A will always be OOM killed before B. + - The final OOM score of a process is also between 0 and 1000 +- Memory best-effort + - Set OOM_SCORE_ADJ: 1000 + - So processes in best-effort containers will have an OOM_SCORE of 1000 +- Memory guaranteed + - Set OOM_SCORE_ADJ: -999 + - So processes in guaranteed containers will have an OOM_SCORE of 0 or 1 +- Memory burstable + - If total memory request > 99.8% of available memory, OOM_SCORE_ADJ: 2 + - Otherwise, set OOM_SCORE_ADJ to 1000 - 10 * (% of memory requested) + - This ensures that the OOM_SCORE of burstable containers is > 1 + - So burstable containers will be killed if they conflict with guaranteed containers + - If a burstable container uses less memory than requested, its OOM_SCORE < 1000 + - So best-effort containers will be killed if they conflict with burstable containers using less than requested memory + - If a process in a burstable container uses more memory than the container requested, its OOM_SCORE will be 1000, if not its OOM_SCORE will be < 1000 + - Assuming that a container typically has a single big process, if a burstable container that uses more memory than requested conflicts with a burstable container using less memory than requested, the former will be killed + - If burstable containers with multiple processes conflict, then the formula for OOM scores is a heuristic, it will not ensure "Request and Limit" guarantees. This is one reason why control loops will be added in subsequent iterations. +- Pod infrastructure container + - OOM_SCORE_ADJ: -999 +- Kubelet, Docker, Kube-Proxy + - OOM_SCORE_ADJ: -999 (won’t be OOM killed) + - Hack, because these critical tasks might die if they conflict with guaranteed containers. in the future, we should place all user-pods into a separate cgroup, and set a limit on the memory they can consume. + +Setting OOM_SCORE_ADJ for a container +- Refactor existing ApplyOomScoreAdj to util/oom.go +- To set OOM_SCORE_ADJ of a container, we loop through all processes in the container, and set OOM_SCORE_ADJ +- We keep looping until the list of processes in the container stabilizes. This is sufficient because child processes inherit OOM_SCORE_ADJ. + +## Implementation Issues and Extensions + +The above implementation provides for basic oversubscription with protection, but there are a number of issues. Below is a list of issues and TODOs for each of them. The first iteration of QoS will not solve these problems, but we aim to solve them in subsequent iterations of QoS. This list is not exhaustive. We expect to add issues to the list, and reference issues and PRs associated with items on this list. + +Supporting other platforms: +- **RKT**: The proposal focuses on Docker. TODO: add support for RKT. +- **Systemd**: Systemd platforms need to be handled in a different way. Handling distributions of Linux based on systemd is critical, because major Linux distributions like Debian and Ubuntu are moving to systemd. TODO: Add code to handle systemd based operating systems. + +Protecting containers and guarantees: +- **Control loops**: The OOM score assignment is not perfect for burstable containers, and system OOM kills are expensive. TODO: Add a control loop to reduce memory pressure, while ensuring guarantees for various containers. +- **Kubelet, Kube-proxy, Docker daemon protection**: If a system is overcommitted with memory guaranteed containers, then all prcoesses will have an OOM_SCORE of 0. So Docker daemon could be killed instead of a container or pod being killed. TODO: Place all user-pods into a separate cgroup, and set a limit on the memory they can consume. Initially, the limits can be based on estimated memory usage of Kubelet, Kube-proxy, and CPU limits, eventually we can monitor the resources they consume. +- **OOM Assignment Races**: We cannot set OOM_SCORE_ADJ of a process until it has launched. This could lead to races. For example, suppose that a memory burstable container is using 70% of the system’s memory, and another burstable container is using 30% of the system’s memory. A best-effort burstable container attempts to launch on the Kubelet. Initially the best-effort container is using 2% of memory, and has an OOM_SCORE_ADJ of 20. So its OOM_SCORE is lower than the burstable pod using 70% of system memory. The burstable pod will be evicted by the best-effort pod. Short-term TODO: Implement a restart policy where best-effort pods are immediately evicted if OOM killed, but burstable pods are given a few retries. Long-term TODO: push support for OOM scores in cgroups to the upstream Linux kernel. +- **Swap Memory**: The QoS proposal assumes that swap memory is disabled. If swap is enabled, then resource guarantees (for pods that specify resource requirements) will not hold. For example, suppose 2 guaranteed pods have reached their memory limit. They can start allocating memory on swap space. Eventually, if there isn’t enough swap space, processes in the pods might get killed. TODO: ensure that swap space is disabled on our cluster setups scripts. + +Killing and eviction mechanics: +- **Killing Containers**: Usually, containers cannot function properly if one of the constituent processes in the container is killed. TODO: When a process in a container is out of resource killed (e.g. OOM killed), kill the entire container. +- **Out of Resource Eviction**: If a container in a multi-container pod fails, we might want restart the entire pod instead of just restarting the container. In some cases (e.g. if a memory best-effort container is out of resource killed), we might change pods to "failed" phase and pods might need to be evicted. TODO: Draft a policy for out of resource eviction and implement it. + +Maintaining CPU performance: +- **CPU-sharing Issues** Suppose that a node is running 2 container: a container A requesting for 50% of CPU (but without a CPU limit), and a container B not requesting for resoruces. Suppose that both pods try to use as much CPU as possible. After the proposal is implemented, A will get 100% of the CPU, and B will get around 0% of the CPU. However, a fairer scheme would give the Burstable container 75% of the CPU and the Best-Effort container 25% of the CPU (since resources past the Burstable container’s request are not guaranteed). TODO: think about whether this issue to be solved, implement a solution. +- **CPU kills**: System tasks or daemons like the Kubelet could consume more CPU, and we won't be able to guarantee containers the CPU amount they requested. If the situation persists, we might want to kill the container. TODO: Draft a policy for CPU usage killing and implement it. +- **CPU limits**: Enabling CPU limits can be problematic, because processes might be hard capped and might stall for a while. TODO: Enable CPU limits intelligently using CPU quota and core allocation. + +Documentation: +- **QoS Class Status**: TODO: Add code to ContainerStatus in the API, so that it shows which memory and CPU classes a container is in. +- **Documentation**: TODO: add user docs for resource QoS + +## Demo and Tests + +Possible demos/E2E tests: +- Launch a couple of memory guaranteed containers on a node. Barrage the node with memory best-effort containers. The memory guaranteed containers should survive the onslaught of memory best-effort containers. +- Fill up a node with memory best-effort containers. Barrage the node with memory guaranteed containers. All memory best-effort containers should be evicted. This is a hard test, because the Kubelet, Kube-proxy, etc need to be well protected. +- Launch a container with 0 CPU request. The container, when run in isolation, should get to use the entire CPU. Then add a container with non-zero request that tries to use up CPU. The 0-requst containers should be throttled, and given a small number of CPU shares. + + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/resource-qos.md?pixel)]() + -- cgit v1.2.3 From 57ff799db24ba9e16a95fcc8ad0558b8623058c9 Mon Sep 17 00:00:00 2001 From: Veres Lajos Date: Sat, 8 Aug 2015 22:29:57 +0100 Subject: typofix - https://github.com/vlajos/misspell_fixer --- development.md | 2 +- making-release-notes.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/development.md b/development.md index 45463293..2929f281 100644 --- a/development.md +++ b/development.md @@ -87,7 +87,7 @@ Note: If you have write access to the main repository at github.com/GoogleCloudP git remote set-url --push upstream no_push ``` -### Commiting changes to your fork +### Committing changes to your fork ```sh git commit diff --git a/making-release-notes.md b/making-release-notes.md index d4ec6ccf..1efab1ac 100644 --- a/making-release-notes.md +++ b/making-release-notes.md @@ -40,7 +40,7 @@ This documents the process for making release notes for a release. Find the most-recent PR that was merged with the previous .0 release. Remember this as $LASTPR. _TODO_: Figure out a way to record this somewhere to save the next release engineer time. -Find the most-recent PR that was merged with the current .0 release. Remeber this as $CURRENTPR. +Find the most-recent PR that was merged with the current .0 release. Remember this as $CURRENTPR. ### 2) Run the release-notes tool -- cgit v1.2.3 From 54575568d1fb2e91b038e65917531c61209d3047 Mon Sep 17 00:00:00 2001 From: Ed Costello Date: Sun, 9 Aug 2015 14:18:06 -0400 Subject: Copy edits for typos --- development.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/development.md b/development.md index 45463293..2929f281 100644 --- a/development.md +++ b/development.md @@ -87,7 +87,7 @@ Note: If you have write access to the main repository at github.com/GoogleCloudP git remote set-url --push upstream no_push ``` -### Commiting changes to your fork +### Committing changes to your fork ```sh git commit -- cgit v1.2.3 From 046cbbe86d69854b2bc3de273202732802c7957f Mon Sep 17 00:00:00 2001 From: Ed Costello Date: Sun, 9 Aug 2015 14:18:06 -0400 Subject: Copy edits for typos --- apiserver_watch.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/apiserver_watch.md b/apiserver_watch.md index 5610ccbc..a731c7f4 100644 --- a/apiserver_watch.md +++ b/apiserver_watch.md @@ -67,14 +67,14 @@ When a client sends a watch request to apiserver, instead of redirecting it to etcd, it will cause: - registering a handler to receive all new changes coming from etcd - - iteratiting though a watch window, starting at the requested resourceVersion - to the head and sending filetered changes directory to the client, blocking + - iterating though a watch window, starting at the requested resourceVersion + to the head and sending filtered changes directory to the client, blocking the above until this iteration has caught up This will be done be creating a go-routine per watcher that will be responsible for performing the above. -The following section describes the proposal in more details, analizes some +The following section describes the proposal in more details, analyzes some corner cases and divides the whole design in more fine-grained steps. -- cgit v1.2.3 From 0aacb902f7cfe1a18f8ec0e4eeeb74fc62141055 Mon Sep 17 00:00:00 2001 From: David Oppenheimer Date: Sat, 8 Aug 2015 00:59:32 -0700 Subject: Rescheduler design space doc. --- rescheduler.md | 130 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 130 insertions(+) create mode 100644 rescheduler.md diff --git a/rescheduler.md b/rescheduler.md new file mode 100644 index 00000000..d459679c --- /dev/null +++ b/rescheduler.md @@ -0,0 +1,130 @@ + + + + +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest 1.0.x release of this document can be found +[here](http://releases.k8s.io/release-1.0/docs/proposals/rescheduler.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- + + + + + +# Rescheduler design space + +@davidopp, @erictune, @briangrant + +July 2015 + +A rescheduler is an agent that proactively causes currently-running +Pods to be moved, so as to optimize some objective function for +goodness of the layout of Pods in the cluster. (The objective function +doesn't have to be expressed mathematically; it may just be a +collection of ad-hoc rules, but in principle there is an objective +function. Implicitly an objective function is described by the +scheduler's predicate and priority functions.) It might be triggered +to run every N minutes, or whenever some event happens that is known +to make the objective function worse (for example, whenever a Pod goes +PENDING for a long time.) + +A rescheduler is useful because without a rescheduler, scheduling +decisions are only made at the time Pods are created. But as the +cluster layout changes over time, free "holes" are often produced that +were not available when a Pod was initially scheduled. These holes are +produced by run-to-completion Pods terminating, empty nodes being +added by a node auto-scaler, etc. Moving already-running Pods into +these holes may lead to a better cluster layout. A rescheduler might +not just exploit existing holes, but also create holes by evicting +Pods (assuming it knows they can reschedule elsewhere), as in free +space defragmentation. + +[Although alluded to above, it's worth emphasizing that rescheduling +is the only way to make use of new nodes added by a cluster +auto-scaler (unless Pods were already PENDING; but even then, it's +likely advantageous to put more than just the previously PENDING Pods +on the new nodes.)] + +Because rescheduling is disruptive--it causes one or more +already-running Pods to die when they otherwise wouldn't--a key +constraint on rescheduling is that it must be done subject to +disruption SLOs. There are a number of ways to specify these SLOs--a +global rate limit across all Pods, a rate limit across a set of Pods +defined by some particular label selector, a maximum number of Pods +that can be down at any one time among a set defined by some +particular label selector, etc. These policies are presumably part of +the Rescheduler's configuration. + +There are a lot of design possibilities for a rescheduler. To explain +them, it's easiest to start with the description of a baseline +rescheduler, and then describe possible modifications. The Baseline +rescheduler +* only kicks in when there are one or more PENDING Pods for some period of time; its objective function is binary: completely happy if there are no PENDING Pods, and completely unhappy if there are PENDING Pods; it does not try to optimize for any other aspect of cluster layout +* is not a scheduler -- it simply identifies a node where a PENDING Pod could fit if one or more Pods on that node were moved out of the way, and then kills those Pods to make room for the PENDING Pod, which will then be scheduled there by the regular scheduler(s). [obviously this killing operation must be able to specify "don't allow the killed Pod to reschedule back to whence it was killed" otherwise the killing is pointless] Of course it should only do this if it is sure the killed Pods will be able to reschedule into already-free space in the cluster. Note that although it is not a scheduler, the Rescheduler needs to be linked with the predicate functions of the scheduling algorithm(s) so that it can know (1) that the PENDING Pod would actually schedule into the hole it has identified once the hole is created, and (2) that the evicted Pod(s) will be able to schedule somewhere else in the cluster. + +Possible variations on this Baseline rescheduler are + +1. it can kill the Pod(s) whose space it wants **and also schedule the Pod that will take that space and reschedule the Pod(s) that were killed**, rather than just killing the Pod(s) whose space it wants and relying on the regular scheduler(s) to schedule the Pod that will take that space (and to reschedule the Pod(s) that were evicted) +1. it can run continuously in the background to optimize general cluster layout instead of just trying to get a PENDING Pod to schedule +1. it can try to move groups of Pods instead of using a one-at-a-time / greedy approach +1. it can formulate multi-hop plans instead of single-hop + +A key design question for a Rescheduler is how much knowledge it needs about the scheduling policies used by the cluster's scheduler(s). +* For the Baseline rescheduler, it needs to know the predicate functions used by the cluster's scheduler(s) else it can't know how to create a hole that the PENDING Pod will fit into, nor be sure that the evicted Pod(s) will be able to reschedule elsewhere. +* If it is going to run continuously in the background to optimize cluster layout but is still only going to kill Pods, then it still needs to know the predicate functions for the reason mentioned above. In principle it doesn't need to know the priority functions; it could just randomly kill Pods and rely on the regular scheduler to put them back in better places. However, this is a rather inexact approach. Thus it is useful for the rescheduler to know the priority functions, or at least some subset of them, so it can be sure that an action it takes will actually improve the cluster layout. +* If it is going to run continuously in the background to optimize cluster layout and is going to act as a scheduler rather than just killing Pods, then it needs to know the predicate functions and some compatible (but not necessarily identical) priority functions One example of a case where "compatible but not identical" might be useful is if the main scheduler(s) has a very simple scheduling policy optimized for low scheduling latency, and the Rescheduler having a more sophisticated/optimal scheduling policy that requires more computation time. The main thing to avoid is for the scheduler(s) and rescheduler to have incompatible priority functions, as this will cause them to "fight" (though it still can't lead to an infinite loop, since the scheduler(s) only ever touches a Pod once). + +The vast majority of users probably only care about rescheduling for three scenarios: + +1. Redistribute Pods onto new nodes added by a cluster auto-scaler +1. Move Pods around to get a PENDING Pod to schedule +1. Move Pods around when CPU starvation is detected on a node + +**Addendum: How a rescheduler might trigger cluster auto-scaling (to +scale up).** Instead of moving Pods around to free up space, it might +just add a new node (and then move some Pods onto the new node). More +generally, it might be useful to integrate the rescheduler and cluster +auto-scaler. For scaling up the cluster a reasonable workflow might +be: +1. pod horizontal auto-scaler decides to add one or more Pods to a service, based on the metrics it is observing +1. the Pod goes PENDING due to lack of a suitable node with sufficient resources +1. rescheduler notices the PENDING Pod and determines that the Pod cannot schedule just by rearranging existing Pods (while respecting SLOs) +1. rescheduler triggers cluster auto-scaler to add a node of the appropriate type for the PENDING Pod +1. the PENDING Pod schedules onto the new node (and possibly the rescheduler also moves other Pods onto that node) + +**Addendum: Role of simulation.** Things like knowing what will be the +effect of different rearrangements of Pods requires a form of +simulation of the scheduling algorithm (see also discussion in +previous entry about what the rescheduler needs to know about the +predicate and priority functions of the cluster's scheduler(s)). For +cluster auto-scaling down, you could do a +simulation to see whether after removing a node from the cluster, will +the Pods that were on that node be able to reschedule, either directly +or with the help of the rescheduler; if the answer is yes, then you +can safely auto-scale down (assuming services will still meeting their +application-level SLOs). + + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/rescheduler.md?pixel)]() + -- cgit v1.2.3 From 3f99d83ca6a809139b36e0da71ee2339a2336ae4 Mon Sep 17 00:00:00 2001 From: Wojciech Tyczynski Date: Fri, 31 Jul 2015 09:54:05 +0200 Subject: Fixes to watch in apiserver proposal --- apiserver-watch.md | 178 +++++++++++++++++++++++++++++++++++++++++++++++++++ apiserver_watch.md | 184 ----------------------------------------------------- 2 files changed, 178 insertions(+), 184 deletions(-) create mode 100644 apiserver-watch.md delete mode 100644 apiserver_watch.md diff --git a/apiserver-watch.md b/apiserver-watch.md new file mode 100644 index 00000000..02a6e6c8 --- /dev/null +++ b/apiserver-watch.md @@ -0,0 +1,178 @@ + + + + +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest 1.0.x release of this document can be found +[here](http://releases.k8s.io/release-1.0/docs/proposals/apiserver-watch.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- + + + + + +## Abstract + +In the current system, all watch requests send to apiserver are in general +redirected to etcd. This means that for every watch request to apiserver, +apiserver opens a watch on etcd. + +The purpose of the proposal is to improve the overall performance of the system +by solving the following problems: + +- having too many open watches on etcd +- avoiding deserializing/converting the same objects multiple times in different +watch results + +In the future, we would also like to add an indexing mechanism to the watch. +Although Indexer is not part of this proposal, it is supposed to be compatible +with it - in the future Indexer should be incorporated into the proposed new +watch solution in apiserver without requiring any redesign. + + +## High level design + +We are going to solve those problems by allowing many clients to watch the same +storage in the apiserver, without being redirected to etcd. + +At the high level, apiserver will have a single watch open to etcd, watching all +the objects (of a given type) without any filtering. The changes delivered from +etcd will then be stored in a cache in apiserver. This cache is in fact a +"rolling history window" that will support clients having some amount of latency +between their list and watch calls. Thus it will have a limited capacity and +whenever a new change comes from etcd when a cache is full, othe oldest change +will be remove to make place for the new one. + +When a client sends a watch request to apiserver, instead of redirecting it to +etcd, it will cause: + + - registering a handler to receive all new changes coming from etcd + - iterating though a watch window, starting at the requested resourceVersion + to the head and sending filtered changes directory to the client, blocking + the above until this iteration has caught up + +This will be done be creating a go-routine per watcher that will be responsible +for performing the above. + +The following section describes the proposal in more details, analyzes some +corner cases and divides the whole design in more fine-grained steps. + + +## Proposal details + +We would like the cache to be __per-resource-type__ and __optional__. Thanks to +it we will be able to: + - have different cache sizes for different resources (e.g. bigger cache + [= longer history] for pods, which can significantly affect performance) + - avoid any overhead for objects that are watched very rarely (e.g. events + are almost not watched at all, but there are a lot of them) + - filter the cache for each watcher more effectively + +If we decide to support watches spanning different resources in the future and +we have an efficient indexing mechanisms, it should be relatively simple to unify +the cache to be common for all the resources. + +The rest of this section describes the concrete steps that need to be done +to implement the proposal. + +1. Since we want the watch in apiserver to be optional for different resource +types, this needs to be self-contained and hidden behind a well defined API. +This should be a layer very close to etcd - in particular all registries: +"pkg/registry/generic/etcd" should be build on top of it. +We will solve it by turning tools.EtcdHelper by extracting its interface +and treating this interface as this API - the whole watch mechanisms in +apiserver will be hidden behind that interface. +Thanks to it we will get an initial implementation for free and we will just +need to reimplement few relevant functions (probably just Watch and List). +Mover, this will not require any changes in other parts of the code. +This step is about extracting the interface of tools.EtcdHelper. + +2. Create a FIFO cache with a given capacity. In its "rolling history window" +we will store two things: + + - the resourceVersion of the object (being an etcdIndex) + - the object watched from etcd itself (in a deserialized form) + + This should be as simple as having an array an treating it as a cyclic buffer. + Obviously resourceVersion of objects watched from etcd will be increasing, but + they are necessary for registering a new watcher that is interested in all the + changes since a given etcdIndec. + + Additionally, we should support LIST operation, otherwise clients can never + start watching at now. We may consider passing lists through etcd, however + this will not work once we have Indexer, so we will need that information + in memory anyway. + Thus, we should support LIST operation from the "end of the history" - i.e. + from the moment just after the newest cached watched event. It should be + pretty simple to do, because we can incrementally update this list whenever + the new watch event is watched from etcd. + We may consider reusing existing structures cache.Store or cache.Indexer + ("pkg/client/cache") but this is not a hard requirement. + +3. Create the new implementation of the API, that will internally have a +single watch open to etcd and will store the data received from etcd in +the FIFO cache - this includes implementing registration of a new watcher +which will start a new go-routine responsible for iterating over the cache +and sending all the objects watcher is interested in (by applying filtering +function) to the watcher. + +4. Add a support for processing "error too old" from etcd, which will require: + - disconnect all the watchers + - clear the internal cache and relist all objects from etcd + - start accepting watchers again + +5. Enable watch in apiserver for some of the existing resource types - this +should require only changes at the initialization level. + +6. The next step will be to incorporate some indexing mechanism, but details +of it are TBD. + + + +### Future optimizations: + +1. The implementation of watch in apiserver internally will open a single +watch to etcd, responsible for watching all the changes of objects of a given +resource type. However, this watch can potentially expire at any time and +reconnecting can return "too old resource version". In that case relisting is +necessary. In such case, to avoid LIST requests coming from all watchers at +the same time, we can introduce an additional etcd event type: +[EtcdResync](../../pkg/storage/etcd/etcd_watcher.go#L36) + + Whenever reslisting will be done to refresh the internal watch to etcd, + EtcdResync event will be send to all the watchers. It will contain the + full list of all the objects the watcher is interested in (appropriately + filtered) as the parameter of this watch event. + Thus, we need to create the EtcdResync event, extend watch.Interface and + its implementations to support it and handle those events appropriately + in places like + [Reflector](../../pkg/client/cache/reflector.go) + + However, this might turn out to be unnecessary optimization if apiserver + will always keep up (which is possible in the new design). We will work + out all necessary details at that point. + + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/apiserver-watch.md?pixel)]() + diff --git a/apiserver_watch.md b/apiserver_watch.md deleted file mode 100644 index a731c7f4..00000000 --- a/apiserver_watch.md +++ /dev/null @@ -1,184 +0,0 @@ - - - - -WARNING -WARNING -WARNING -WARNING -WARNING - -

PLEASE NOTE: This document applies to the HEAD of the source tree

- -If you are using a released version of Kubernetes, you should -refer to the docs that go with that version. - - -The latest 1.0.x release of this document can be found -[here](http://releases.k8s.io/release-1.0/docs/proposals/apiserver_watch.md). - -Documentation for other releases can be found at -[releases.k8s.io](http://releases.k8s.io). - --- - - - - - -## Abstract - -In the current system, all watch requests send to apiserver are in general -redirected to etcd. This means that for every watch request to apiserver, -apiserver opens a watch on etcd. - -The purpose of the proposal is to improve the overall performance of the system -by solving the following problems: - -- having too many open watches on etcd -- avoiding deserializing/converting the same objects multiple times in different -watch results - -In the future, we would also like to add an indexing mechanism to the watch. -Although Indexer is not part of this proposal, it is supposed to be compatible -with it - in the future Indexer should be incorporated into the proposed new -watch solution in apiserver without requiring any redesign. - - -## High level design - -We are going to solve those problems by allowing many clients to watch the same -storage in the apiserver, without being redirected to etcd. - -At the high level, apiserver will have a single watch open to etcd, watching all -the objects (of a given type) without any filtering. The changes delivered from -etcd will then be stored in a cache in apiserver. This cache is in fact a -"rolling history window" that will support clients having some amount of latency -between their list and watch calls. Thus it will have a limited capacity and -whenever a new change comes from etcd when a cache is full, othe oldest change -will be remove to make place for the new one. - -When a client sends a watch request to apiserver, instead of redirecting it to -etcd, it will cause: - - - registering a handler to receive all new changes coming from etcd - - iterating though a watch window, starting at the requested resourceVersion - to the head and sending filtered changes directory to the client, blocking - the above until this iteration has caught up - -This will be done be creating a go-routine per watcher that will be responsible -for performing the above. - -The following section describes the proposal in more details, analyzes some -corner cases and divides the whole design in more fine-grained steps. - - -## Proposal details - -We would like the cache to be __per-resource-type__ and __optional__. Thanks to -it we will be able to: - - have different cache sizes for different resources (e.g. bigger cache - [= longer history] for pods, which can significantly affect performance) - - avoid any overhead for objects that are watched very rarely (e.g. events - are almost not watched at all, but there are a lot of them) - - filter the cache for each watcher more effectively - -If we decide to support watches spanning different resources in the future and -we have an efficient indexing mechanisms, it should be relatively simple to unify -the cache to be common for all the resources. - -The rest of this section describes the concrete steps that need to be done -to implement the proposal. - -1. Since we want the watch in apiserver to be optional for different resource -types, this needs to be self-contained and hidden behind a well defined API. -This should be a layer very close to etcd - in particular all registries: -"pkg/registry/generic/etcd" should be build on top of it. -We will solve it by turning tools.EtcdHelper by extracting its interface -and treating this interface as this API - the whole watch mechanisms in -apiserver will be hidden behind that interface. -Thanks to it we will get an initial implementation for free and we will just -need to reimplement few relevant functions (probably just Watch and List). -Mover, this will not require any changes in other parts of the code. -This step is about extracting the interface of tools.EtcdHelper. - -2. Create a FIFO cache with a given capacity. In its "rolling history windown" -we will store two things: - - - the resourceVersion of the object (being an etcdIndex) - - the object watched from etcd itself (in a deserialized form) - - This should be as simple as having an array an treating it as a cyclic buffer. - Obviously resourceVersion of objects watched from etcd will be increasing, but - they are necessary for registering a new watcher that is interested in all the - changes since a given etcdIndec. - - Additionally, we should support LIST operation, otherwise clients can never - start watching at now. We may consider passing lists through etcd, however - this will not work once we have Indexer, so we will need that information - in memory anyway. - Thus, we should support LIST operation from the "end of the history" - i.e. - from the moment just after the newest cached watched event. It should be - pretty simple to do, because we can incrementally update this list whenever - the new watch event is watched from etcd. - We may consider reusing existing structures cache.Store or cache.Indexer - ("pkg/client/cache") but this is not a hard requirement. - -3. Create a new implementation of the EtcdHelper interface, that will internally -have a single watch open to etcd and will store data received from etcd in the -FIFO cache. This includes implementing registration of a new watcher that will -start a new go-routine responsible for iterating over the cache and sending -appropriately filtered objects to the watcher. - -4. Create the new implementation of the API, that will internally have a -single watch open to etcd and will store the data received from etcd in -the FIFO cache - this includes implementing registration of a new watcher -which will start a new go-routine responsible for iterating over the cache -and sending all the objects watcher is interested in (by applying filtering -function) to the watcher. - -5. Add a support for processing "error too old" from etcd, which will require: - - disconnect all the watchers - - clear the internal cache and relist all objects from etcd - - start accepting watchers again - -6. Enable watch in apiserver for some of the existing resource types - this -should require only changes at the initialization level. - -7. The next step will be to incorporate some indexing mechanism, but details -of it are TBD. - - - -### Future optimizations: - -1. The implementation of watch in apiserver internally will open a single -watch to etcd, responsible for watching all the changes of objects of a given -resource type. However, this watch can potentially expire at any time and -reconnecting can return "too old resource version". In that case relisting is -necessary. In such case, to avoid LIST requests coming from all watchers at -the same time, we can introduce an additional etcd event type: -[EtcdResync](../../pkg/storage/etcd/etcd_watcher.go#L36) - - Whenever reslisting will be done to refresh the internal watch to etcd, - EtcdResync event will be send to all the watchers. It will contain the - full list of all the objects the watcher is interested in (appropriately - filtered) as the parameter of this watch event. - Thus, we need to create the EtcdResync event, extend watch.Interface and - its implementations to support it and handle those events appropriately - in places like - [Reflector](../../pkg/client/cache/reflector.go) - - However, this might turn out to be unnecessary optimization if apiserver - will always keep up (which is possible in the new design). We will work - out all necessary details at that point. - - - -[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/apiserver_watch.md?pixel)]() - -- cgit v1.2.3 From 171fb6ecc2d2ba72d78b8c1440ec68ebb1aa5bcb Mon Sep 17 00:00:00 2001 From: Wojciech Tyczynski Date: Fri, 31 Jul 2015 09:52:36 +0200 Subject: Kubmark proposal --- scalability-testing.md | 105 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 105 insertions(+) create mode 100644 scalability-testing.md diff --git a/scalability-testing.md b/scalability-testing.md new file mode 100644 index 00000000..cf87d84d --- /dev/null +++ b/scalability-testing.md @@ -0,0 +1,105 @@ + + + + + +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest 1.0.x release of this document can be found +[here](http://releases.k8s.io/release-1.0/docs/proposals/scalability-testing.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- + + + + + +## Background + +We have a goal to be able to scale to 1000-node clusters by end of 2015. +As a result, we need to be able to run some kind of regression tests and deliver +a mechanism so that developers can test their changes with respect to performance. + +Ideally, we would like to run performance tests also on PRs - although it might +be impossible to run them on every single PR, we may introduce a possibility for +a reviewer to trigger them if the change has non obvious impact on the performance +(something like "k8s-bot run scalability tests please" should be feasible). + +However, running performance tests on 1000-node clusters (or even bigger in the +future is) is a non-starter. Thus, we need some more sophisticated infrastructure +to simulate big clusters on relatively small number of machines and/or cores. + +This document describes two approaches to tackling this problem. +Once we have a better understanding of their consequences, we may want to +decide to drop one of them, but we are not yet in that position. + + +## Proposal 1 - Kubmark + +In this proposal we are focusing on scalability testing of master components. +We do NOT focus on node-scalability - this issue should be handled separately. + +Since we do not focus on the node performance, we don't need real Kubelet nor +KubeProxy - in fact we don't even need to start real containers. +All we actually need is to have some Kubelet-like and KubeProxy-like components +that will be simulating the load on apiserver that their real equivalents are +generating (e.g. sending NodeStatus updated, watching for pods, watching for +endpoints (KubeProxy), etc.). + +What needs to be done: + +1. Determine what requests both KubeProxy and Kubelet are sending to apiserver. +2. Create a KubeletSim that is generating the same load on apiserver that the + real Kubelet, but is not starting any containers. In the initial version we + can assume that pods never die, so it is enough to just react on the state + changes read from apiserver. + TBD: Maybe we can reuse a real Kubelet for it by just injecting some "fake" + interfaces to it? +3. Similarly create a KubeProxySim that is generating the same load on apiserver + as a real KubeProxy. Again, since we are not planning to talk to those + containers, it basically doesn't need to do anything apart from that. + TBD: Maybe we can reuse a real KubeProxy for it by just injecting some "fake" + interfaces to it? +4. Refactor kube-up/kube-down scripts (or create new ones) to allow starting + a cluster with KubeletSim and KubeProxySim instead of real ones and put + a bunch of them on a single machine. +5. Create a load generator for it (probably initially it would be enough to + reuse tests that we use in gce-scalability suite). + + +## Proposal 2 - Oversubscribing + +The other method we are proposing is to oversubscribe the resource, +or in essence enable a single node to look like many separate nodes even though +they reside on a single host. This is a well established pattern in many different +cluster managers (for more details see +http://www.uscms.org/SoftwareComputing/Grid/WMS/glideinWMS/doc.prd/index.html ). +There are a couple of different ways to accomplish this, but the most viable method +is to run privileged kubelet pods under a hosts kubelet process. These pods then +register back with the master via the introspective service using modified names +as not to collide. + +Complications may currently exist around container tracking and ownership in docker. + + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/scalability-testing.md?pixel)]() + -- cgit v1.2.3 From b30bb494d08c39e197dbc4f63743e689e5fa306a Mon Sep 17 00:00:00 2001 From: David Oppenheimer Date: Tue, 11 Aug 2015 01:31:30 -0700 Subject: Address reviewer comments. --- rescheduler.md | 105 +++++++++++++++++++++++++++++++++++---------------------- 1 file changed, 65 insertions(+), 40 deletions(-) diff --git a/rescheduler.md b/rescheduler.md index d459679c..b27b9bfe 100644 --- a/rescheduler.md +++ b/rescheduler.md @@ -37,6 +37,8 @@ Documentation for other releases can be found at July 2015 +## Introduction and definition + A rescheduler is an agent that proactively causes currently-running Pods to be moved, so as to optimize some objective function for goodness of the layout of Pods in the cluster. (The objective function @@ -45,25 +47,70 @@ collection of ad-hoc rules, but in principle there is an objective function. Implicitly an objective function is described by the scheduler's predicate and priority functions.) It might be triggered to run every N minutes, or whenever some event happens that is known -to make the objective function worse (for example, whenever a Pod goes +to make the objective function worse (for example, whenever any Pod goes PENDING for a long time.) +## Motivation and use cases + A rescheduler is useful because without a rescheduler, scheduling -decisions are only made at the time Pods are created. But as the -cluster layout changes over time, free "holes" are often produced that -were not available when a Pod was initially scheduled. These holes are -produced by run-to-completion Pods terminating, empty nodes being -added by a node auto-scaler, etc. Moving already-running Pods into -these holes may lead to a better cluster layout. A rescheduler might -not just exploit existing holes, but also create holes by evicting -Pods (assuming it knows they can reschedule elsewhere), as in free -space defragmentation. - -[Although alluded to above, it's worth emphasizing that rescheduling -is the only way to make use of new nodes added by a cluster -auto-scaler (unless Pods were already PENDING; but even then, it's -likely advantageous to put more than just the previously PENDING Pods -on the new nodes.)] +decisions are only made at the time Pods are created. But later on, +the state of the cell may have changed in some way such that it would +be better to move the Pod to another node. + +There are two categories of movements a rescheduler might trigger: coalescing +and spreading. + +### Coalesce Pods + +This is the most common use case. Cluster layout changes over time. For +example, run-to-completion Pods terminate, producing free space in their wake, but that space +is fragmented. This fragmentation might prevent a PENDING Pod from scheduling +(there are enough free resource for the Pod in aggregate across the cluster, +but not on any single node). A rescheduler can coalesce free space like a +disk defragmenter, thereby producing enough free space on a node for a PENDING +Pod to schedule. In some cases it can do this just by moving Pods into existing +holes, but often it will need to evict (and reschedule) running Pods in order to +create a large enough hole. + +A second use case for a rescheduler to coalesce pods is when it becomes possible +to support the running Pods on a fewer number of nodes. The rescheduler can +gradually move Pods off of some set of nodes to make those nodes empty so +that they can then be shut down/removed. More specifically, +the system could do a simulation to see whether after removing a node from the +cluster, will the Pods that were on that node be able to reschedule, +either directly or with the help of the rescheduler; if the answer is +yes, then you can safely auto-scale down (assuming services will still +meeting their application-level SLOs). + +### Spread Pods + +The main use cases for spreading Pods revolve around relieving congestion on (a) highly +utilized node(s). For example, some process might suddenly start receiving a significantly +above-normal amount of external requests, leading to starvation of best-effort +Pods on the node. We can use the rescheduler to move the best-effort Pods off of the +node. (They are likely to have generous eviction SLOs, so are more likely to be movable +than the Pod that is experiencing the higher load, but in principle we might move either.) +Or even before any node becomes overloaded, we might proactively re-spread Pods from nodes +with high-utilization, to give them some buffer against future utilization spikes. In either +case, the nodes we move the Pods onto might have been in the system for a long time or might +have been added by the cluster auto-scaler specifically to allow the rescheduler to +rebalance utilization. + +A second spreading use case is to separate antagnosits. +Sometimes the processes running in two different Pods on the same node +may have unexpected antagonistic +behavior towards one another. A system component might monitor for such +antagonism and ask the rescheduler to move one of the antagonists to a new node. + +### Ranking the use cases + +The vast majority of users probably only care about rescheduling for three scenarios: + +1. Move Pods around to get a PENDING Pod to schedule +1. Redistribute Pods onto new nodes added by a cluster auto-scaler when ther are no PENDING Pods +1. Move Pods around when CPU starvation is detected on a node + +## Design considerations and design space Because rescheduling is disruptive--it causes one or more already-running Pods to die when they otherwise wouldn't--a key @@ -94,37 +141,15 @@ A key design question for a Rescheduler is how much knowledge it needs about the * If it is going to run continuously in the background to optimize cluster layout but is still only going to kill Pods, then it still needs to know the predicate functions for the reason mentioned above. In principle it doesn't need to know the priority functions; it could just randomly kill Pods and rely on the regular scheduler to put them back in better places. However, this is a rather inexact approach. Thus it is useful for the rescheduler to know the priority functions, or at least some subset of them, so it can be sure that an action it takes will actually improve the cluster layout. * If it is going to run continuously in the background to optimize cluster layout and is going to act as a scheduler rather than just killing Pods, then it needs to know the predicate functions and some compatible (but not necessarily identical) priority functions One example of a case where "compatible but not identical" might be useful is if the main scheduler(s) has a very simple scheduling policy optimized for low scheduling latency, and the Rescheduler having a more sophisticated/optimal scheduling policy that requires more computation time. The main thing to avoid is for the scheduler(s) and rescheduler to have incompatible priority functions, as this will cause them to "fight" (though it still can't lead to an infinite loop, since the scheduler(s) only ever touches a Pod once). -The vast majority of users probably only care about rescheduling for three scenarios: +## Appendix: Integrating rescheduler with cluster auto-scaler (scale up) -1. Redistribute Pods onto new nodes added by a cluster auto-scaler -1. Move Pods around to get a PENDING Pod to schedule -1. Move Pods around when CPU starvation is detected on a node - -**Addendum: How a rescheduler might trigger cluster auto-scaling (to -scale up).** Instead of moving Pods around to free up space, it might -just add a new node (and then move some Pods onto the new node). More -generally, it might be useful to integrate the rescheduler and cluster -auto-scaler. For scaling up the cluster a reasonable workflow might -be: +For scaling up the cluster, a reasonable workflow might be: 1. pod horizontal auto-scaler decides to add one or more Pods to a service, based on the metrics it is observing 1. the Pod goes PENDING due to lack of a suitable node with sufficient resources 1. rescheduler notices the PENDING Pod and determines that the Pod cannot schedule just by rearranging existing Pods (while respecting SLOs) 1. rescheduler triggers cluster auto-scaler to add a node of the appropriate type for the PENDING Pod 1. the PENDING Pod schedules onto the new node (and possibly the rescheduler also moves other Pods onto that node) -**Addendum: Role of simulation.** Things like knowing what will be the -effect of different rearrangements of Pods requires a form of -simulation of the scheduling algorithm (see also discussion in -previous entry about what the rescheduler needs to know about the -predicate and priority functions of the cluster's scheduler(s)). For -cluster auto-scaling down, you could do a -simulation to see whether after removing a node from the cluster, will -the Pods that were on that node be able to reschedule, either directly -or with the help of the rescheduler; if the answer is yes, then you -can safely auto-scale down (assuming services will still meeting their -application-level SLOs). - - [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/rescheduler.md?pixel)]() -- cgit v1.2.3 From de3a07b932c811b768c513b72d3917718c01c3ab Mon Sep 17 00:00:00 2001 From: Eric Paris Date: Mon, 20 Jul 2015 08:24:20 -0500 Subject: Split hack/{verify,update}-* files so we don't always go build Right now some of the hack/* tools use `go run` and build almost every time. There are some which expect you to have already run `go install`. And in all cases the pre-commit hook, which runs a full build wouldn't want to do either, since it just built! This creates a new hack/after-build/ directory and has the scripts which REQUIRE that the binary already be built. It doesn't test and complain. It just fails miserably. Users should not be in this directory. Users should just use hack/verify-* which will just do the build and then call the "after-build" version. The pre-commit hook or anything which KNOWS the binaries have been built can use the fast version. --- development.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/development.md b/development.md index 2929f281..294f825a 100644 --- a/development.md +++ b/development.md @@ -345,7 +345,7 @@ See [conformance-test.sh](http://releases.k8s.io/HEAD/hack/conformance-test.sh). ## Regenerating the CLI documentation ```sh -hack/run-gendocs.sh +hack/update-generated-docs.sh ``` -- cgit v1.2.3 From c89196ac7341fdfbfbed5d27bb568bb66d4eafec Mon Sep 17 00:00:00 2001 From: Eric Paris Date: Tue, 11 Aug 2015 16:29:50 -0400 Subject: Update code to use - in flag names instead of _ --- admission_control.md | 4 ++-- admission_control_limit_range.md | 2 +- admission_control_resource_quota.md | 2 +- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/admission_control.md b/admission_control.md index 9245aa7d..a2b5700b 100644 --- a/admission_control.md +++ b/admission_control.md @@ -63,8 +63,8 @@ The kube-apiserver takes the following OPTIONAL arguments to enable admission co | Option | Behavior | | ------ | -------- | -| admission_control | Comma-delimited, ordered list of admission control choices to invoke prior to modifying or deleting an object. | -| admission_control_config_file | File with admission control configuration parameters to boot-strap plug-in. | +| admission-control | Comma-delimited, ordered list of admission control choices to invoke prior to modifying or deleting an object. | +| admission-control-config-file | File with admission control configuration parameters to boot-strap plug-in. | An **AdmissionControl** plug-in is an implementation of the following interface: diff --git a/admission_control_limit_range.md b/admission_control_limit_range.md index 885ef664..621fd564 100644 --- a/admission_control_limit_range.md +++ b/admission_control_limit_range.md @@ -137,7 +137,7 @@ If a constraint is not specified for an enumerated resource, it is not enforced To enable the plug-in and support for LimitRange, the kube-apiserver must be configured as follows: ```console -$ kube-apiserver -admission_control=LimitRanger +$ kube-apiserver --admission-control=LimitRanger ``` ### Enforcement of constraints diff --git a/admission_control_resource_quota.md b/admission_control_resource_quota.md index bb7c6e0a..86fae451 100644 --- a/admission_control_resource_quota.md +++ b/admission_control_resource_quota.md @@ -178,7 +178,7 @@ The **ResourceQuota** plug-in introspects all incoming admission requests. To enable the plug-in and support for ResourceQuota, the kube-apiserver must be configured as follows: ``` -$ kube-apiserver -admission_control=ResourceQuota +$ kube-apiserver --admission-control=ResourceQuota ``` It makes decisions by evaluating the incoming object against all defined **ResourceQuota.Status.Hard** resource limits in the request -- cgit v1.2.3 From a577fe59554100f7339e4784089095734875afd5 Mon Sep 17 00:00:00 2001 From: Bryan Stenson Date: Tue, 11 Aug 2015 22:36:51 -0700 Subject: create cloudprovider "providers" package move all providers into new package update all references to old package path --- writing-a-getting-started-guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/writing-a-getting-started-guide.md b/writing-a-getting-started-guide.md index 04d0d67f..7441474a 100644 --- a/writing-a-getting-started-guide.md +++ b/writing-a-getting-started-guide.md @@ -82,7 +82,7 @@ Just file an issue or chat us on IRC and one of the committers will link to it f These guidelines say *what* to do. See the Rationale section for *why*. - the main reason to add a new development distro is to support a new IaaS provider (VM and - network management). This means implementing a new `pkg/cloudprovider/$IAAS_NAME`. + network management). This means implementing a new `pkg/cloudprovider/providers/$IAAS_NAME`. - Development distros should use Saltstack for Configuration Management. - development distros need to support automated cluster creation, deletion, upgrading, etc. This mean writing scripts in `cluster/$IAAS_NAME`. -- cgit v1.2.3 From 36a2019fe707de032c352ffe1375cf476564ef53 Mon Sep 17 00:00:00 2001 From: Robert Bailey Date: Wed, 12 Aug 2015 13:12:32 -0700 Subject: Update repository links in development.md. --- development.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/development.md b/development.md index 294f825a..db74adaf 100644 --- a/development.md +++ b/development.md @@ -51,7 +51,7 @@ Below, we outline one of the more common git workflows that core developers use. ### Fork the main repository -1. Go to https://github.com/GoogleCloudPlatform/kubernetes +1. Go to https://github.com/kubernetes/kubernetes 2. Click the "Fork" button (at the top right) ### Clone your fork @@ -64,7 +64,7 @@ cd $GOPATH/src/k8s.io # Replace "$YOUR_GITHUB_USERNAME" below with your github username git clone https://github.com/$YOUR_GITHUB_USERNAME/kubernetes.git cd kubernetes -git remote add upstream 'https://github.com/GoogleCloudPlatform/kubernetes.git' +git remote add upstream 'https://github.com/kubernetes/kubernetes.git' ``` ### Create a branch and make changes @@ -81,7 +81,7 @@ git fetch upstream git rebase upstream/master ``` -Note: If you have write access to the main repository at github.com/GoogleCloudPlatform/kubernetes, you should modify your git configuration so that you can't accidentally push to upstream: +Note: If you have write access to the main repository at github.com/kubernetes/kubernetes, you should modify your git configuration so that you can't accidentally push to upstream: ```sh git remote set-url --push upstream no_push @@ -166,7 +166,7 @@ export GOPATH=$KPATH 3) Populate your new GOPATH. ```sh -cd $KPATH/src/github.com/GoogleCloudPlatform/kubernetes +cd $KPATH/src/github.com/kubernetes/kubernetes godep restore ``` -- cgit v1.2.3 From 4fa0f3a7b2c2d43815967c6f4671713a0c2ffa40 Mon Sep 17 00:00:00 2001 From: Brendan Burns Date: Mon, 27 Jul 2015 12:49:06 -0700 Subject: Add initial storage types to the Kubernetes API --- extending-api.md | 222 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 222 insertions(+) create mode 100644 extending-api.md diff --git a/extending-api.md b/extending-api.md new file mode 100644 index 00000000..cca257bd --- /dev/null +++ b/extending-api.md @@ -0,0 +1,222 @@ + + + + +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest 1.0.x release of this document can be found +[here](http://releases.k8s.io/release-1.0/docs/design/extending-api.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- + + + + + +# Adding custom resources to the Kubernetes API server + +This document describes the design for implementing the storage of custom API types in the Kubernetes API Server. + + +## Resource Model + +### The ThirdPartyResource + +The `ThirdPartyResource` resource describes the multiple versions of a custom resource that the user wants to add +to the Kubernetes API. `ThirdPartyResource` is a non-namespaced resource, attempting to place it in a resource +will return an error. + +Each `ThirdPartyResource` resource has the following: + * Standard Kubernetes object metadata. + * ResourceKind - The kind of the resources described by this third party resource. + * Description - A free text description of the resource. + * APIGroup - An API group that this resource should be placed into. + * Versions - One or more `Version` objects. + +### The `Version` Object + +The `Version` object describes a single concrete version of a custom resource. The `Version` object currently +only specifies: + * The `Name` of the version. + * The `APIGroup` this version should belong to. + +## Expectations about third party objects + +Every object that is added to a third-party Kubernetes object store is expected to contain Kubernetes +compatible [object metadata](../devel/api-conventions.md#metadata). This requirement enables the +Kubernetes API server to provide the following features: + * Filtering lists of objects via LabelQueries + * `resourceVersion`-based optimistic concurrency via compare-and-swap + * Versioned storage + * Event recording + * Integration with basic `kubectl` command line tooling. + * Watch for resource changes. + +The `Kind` for an instance of a third-party object (e.g. CronTab) below is expected to be +programnatically convertible to the name of the resource using +the following conversion. Kinds are expected to be of the form ``, the +`APIVersion` for the object is expected to be `//`. + +For example `example.com/stable/v1` + +`domain-name` is expected to be a fully qualified domain name. + +'CamelCaseKind' is the specific type name. + +To convert this into the `metadata.name` for the `ThirdPartyResource` resource instance, +the `` is copied verbatim, the `CamelCaseKind` is +then converted +using '-' instead of capitalization ('camel-case'), with the first character being assumed to be +capitalized. In pseudo code: + +```go +var result string +for ix := range kindName { + if isCapital(kindName[ix]) { + result = append(result, '-') + } + result = append(result, toLowerCase(kindName[ix]) +} +``` + +As a concrete example, the resource named `camel-case-kind.example.com` defines resources of Kind `CamelCaseKind`, in +the APIGroup with the prefix `example.com/...`. + +The reason for this is to enable rapid lookup of a `ThirdPartyResource` object given the kind information. +This is also the reason why `ThirdPartyResource` is not namespaced. + +## Usage + +When a user creates a new `ThirdPartyResource`, the Kubernetes API Server reacts by creating a new, namespaced +RESTful resource path. For now, non-namespaced objects are not supported. As with existing built-in objects +deleting a namespace, deletes all third party resources in that namespace. + +For example, if a user creates: + +```yaml +metadata: + name: cron-tab.example.com +apiVersion: experimental/v1 +kind: ThirdPartyResource +description: "A specification of a Pod to run on a cron style schedule" +versions: + - name: stable/v1 + - name: experimental/v2 +``` + +Then the API server will program in two new RESTful resource paths: + * `/thirdparty/example.com/stable/v1/namespaces//crontabs/...` + * `/thirdparty/example.com/experimental/v2/namespaces//crontabs/...` + + +Now that this schema has been created, a user can `POST`: + +```json +{ + "metadata": { + "name": "my-new-cron-object" + }, + "apiVersion": "example.com/stable/v1", + "kind": "CronTab", + "cronSpec": "* * * * /5", + "image": "my-awesome-chron-image" +} +``` + +to: `/third-party/example.com/stable/v1/namespaces/default/crontabs/my-new-cron-object` + +and the corresponding data will be stored into etcd by the APIServer, so that when the user issues: + +``` +GET /third-party/example.com/stable/v1/namespaces/default/crontabs/my-new-cron-object` +``` + +And when they do that, they will get back the same data, but with additional Kubernetes metadata +(e.g. `resourceVersion`, `createdTimestamp`) filled in. + +Likewise, to list all resources, a user can issue: + +``` +GET /third-party/example.com/stable/v1/namespaces/default/crontabs +``` + +and get back: + +```json +{ + "apiVersion": "example.com/stable/v1", + "kind": "CronTabList", + "items": [ + { + "metadata": { + "name": "my-new-cron-object" + }, + "apiVersion": "example.com/stable/v1", + "kind": "CronTab", + "cronSpec": "* * * * /5", + "image": "my-awesome-chron-image" + } + ] +} +``` + +Because all objects are expected to contain standard Kubernetes metdata fileds, these +list operations can also use `Label` queries to filter requests down to specific subsets. + +Likewise, clients can use watch endpoints to watch for changes to stored objects. + + +## Storage + +In order to store custom user data in a versioned fashion inside of etcd, we need to also introduce a +`Codec`-compatible object for persistent storage in etcd. This object is `ThirdPartyResourceData` and it contains: + * Standard API Metadata + * `Data`: The raw JSON data for this custom object. + +### Storage key specification + +Each custom object stored by the API server needs a custom key in storage, this is described below: + +#### Definitions + + * `resource-namespace` : the namespace of the particular resource that is being stored + * `resource-name`: the name of the particular resource being stored + * `third-party-resource-namespace`: the namespace of the `ThirdPartyResource` resource that represents the type for the specific instance being stored. + * `third-party-resource-name`: the name of the `ThirdPartyResource` resource that represents the type for the specific instance being stored. + +#### Key + +Given the definitions above, the key for a specific third-party object is: + +``` +${standard-k8s-prefix}/third-party-resources/${third-party-resource-namespace}/${third-party-resource-name}/${resource-namespace}/${resource-name} +``` + +Thus, listing a third-party resource can be achieved by listing the directory: + +``` +${standard-k8s-prefix}/third-party-resources/${third-party-resource-namespace}/${third-party-resource-name}/${resource-namespace}/ +``` + + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/extending-api.md?pixel)]() + -- cgit v1.2.3 From 0bdf3f5728d61604c591d4a280ae6412321e4811 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Daniel=20Mart=C3=AD?= Date: Tue, 28 Jul 2015 13:45:36 -0700 Subject: Add compute resource metrics API proposal --- compute-resource-metrics-api.md | 177 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 177 insertions(+) create mode 100644 compute-resource-metrics-api.md diff --git a/compute-resource-metrics-api.md b/compute-resource-metrics-api.md new file mode 100644 index 00000000..472e6a37 --- /dev/null +++ b/compute-resource-metrics-api.md @@ -0,0 +1,177 @@ + + + + +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest 1.0.x release of this document can be found +[here](http://releases.k8s.io/release-1.0/docs/proposals/compute-resource-metrics-api.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- + + + + + +# Kubernetes compute resource metrics API + +## Goals + +Provide resource usage metrics on pods and nodes on the API server to be used +by the scheduler to improve job placement, utilization, etc. and by end users +to understand the resource utilization of their jobs. Horizontal and vertical +auto-scaling are also near-term uses. + +## Current state + +Right now, the Kubelet exports container metrics via an API endpoint. This +information is not gathered nor served by the Kubernetes API server. + +## Use cases + +The first user will be kubectl. The resource usage data can be shown to the +user via a periodically refreshing interface similar to `top` on Unix-like +systems. This info could let users assign resource limits more efficiently. + +``` +$ kubectl top kubernetes-minion-abcd +POD CPU MEM +monitoring-heapster-abcde 0.12 cores 302 MB +kube-ui-v1-nd7in 0.07 cores 130 MB +``` + +A second user will be the scheduler. To assign pods to nodes efficiently, the +scheduler needs to know the current free resources on each node. + +## Proposed endpoints + + /api/v1/namespaces/myns/podMetrics/mypod + /api/v1/nodeMetrics/myNode + +The derived metrics include the mean, max and a few percentiles of the list of +values. + +We are not adding new methods to pods and nodes, e.g. +`/api/v1/namespaces/myns/pods/mypod/metrics`, for a number of reasons. For +example, having a separate endpoint allows fetching all the pod metrics in a +single request. The rate of change of the data is also too high to include in +the pod resource. + +In the future, if any uses cases are found that would benefit from RC, +namespace or service aggregation, metrics at those levels could also be +exposed taking advantage of the fact that Heapster already does aggregation +and metrics for them. + +Initially, this proposal included raw metrics alongside the derived metrics. +After revising the use cases, it was clear that raw metrics could be left out +of this proposal. They can be dealt with in a separate proposal, exposing them +in the Kubelet API via proper versioned endpoints for Heapster to poll +periodically. + +This also means that the amount of data pushed by each Kubelet to the API +server will be much smaller. + +## Data gathering + +We will use a push based system. Each kubelet will periodically - every 10s - +POST its derived metrics to the API server. Then, any users of the metrics can +register as watchers to receive the new metrics when they are available. + +Users of the metrics may also periodically poll the API server instead of +registering as a watcher, having in mind that new data may only be available +every 10 seconds. If any user requires metrics that are either more specific +(e.g. last 1s) or updated more often, they should use the metrics pipeline via +Heapster. + +The API server will not hold any of this data directly. For our initial +purposes, it will hold the most recent metrics obtained from each node in +etcd. Then, when polled for metrics, the API server would only serve said most +recent data per node. + +Benchmarks will be run with etcd to see if it can keep up with the frequent +writes of data. If it turns out that etcd doesn't scale well enough, we will +have to switch to a different storage system. + +If a pod gets deleted, the API server will get rid of any metrics it may +currently be holding for it. + +The clients watching the metrics data may cache it for longer periods of time. +The clearest example would be Heapster. + +In the future, we might want to store the metrics differently: + +* via heapster - Since heapster keeps data for a period of time, we could + redirect requests to the API server to heapster instead of using etcd. This + would also allow serving metrics other than the latest ones. + +An edge case that this proposal doesn't take into account is kubelets being +restarted. If any of them are, with a simple implementation they would lose +historical data and thus take hours to gather enough information to provide +relevant metrics again. We might want to use persistent storage directly or in +the future to improve that situation. + +More information on kubelet checkpoints can be read on +[#489](https://issues.k8s.io/489). + +## Data structure + +```Go +type DerivedPodMetrics struct { + TypeMeta + ObjectMeta // should have pod name + // the key is the container name + Containers []struct { + ContainerReference *Container + Metrics MetricsWindows + } +} + +type DerivedNodeMetrics struct { + TypeMeta + ObjectMeta // should have node name + NodeMetrics MetricsWindows + SystemContainers []struct { + ContainerReference *Container + Metrics MetricsWindows + } +} + +// Last overlapping 10s, 1m, 1h and 1d as a start +// Updated every 10s, so the 10s window is sequential and the rest are +// rolling. +type MetricsWindows map[time.Duration]DerivedMetrics + +type DerivedMetrics struct { + // End time of all the time windows in Metrics + EndTime util.Time `json:"endtime"` + + Mean ResourceUsage `json:"mean"` + Max ResourceUsage `json:"max"` + NinetyFive ResourceUsage `json:"95th"` +} + +type ResourceUsage map[resource.Type]resource.Quantity +``` + + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/compute-resource-metrics-api.md?pixel)]() + -- cgit v1.2.3 From 00ce437841e772d61fc732f99860df17bb01d3ea Mon Sep 17 00:00:00 2001 From: goltermann Date: Thu, 13 Aug 2015 11:29:59 -0700 Subject: Adding teams lists to faster_reviews. --- faster_reviews.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/faster_reviews.md b/faster_reviews.md index d28e9b55..3ea030d3 100644 --- a/faster_reviews.md +++ b/faster_reviews.md @@ -187,6 +187,9 @@ things you can do that might help kick a stalled process along: * Ping the assignee by email (many of us have email addresses that are well published or are the same as our GitHub handle @google.com or @redhat.com). + * Ping the [team](https://github.com/orgs/kubernetes/teams) (via @team-name) + that works in the area you're submitting code. + If you think you have fixed all the issues in a round of review, and you haven't heard back, you should ping the reviewer (assignee) on the comment stream with a "please take another look" (PTAL) or similar comment indicating you are done and -- cgit v1.2.3 From 9eee95bef404c2f9d3fc20dd95527897877f49b4 Mon Sep 17 00:00:00 2001 From: nikhiljindal Date: Tue, 4 Aug 2015 17:05:48 -0700 Subject: Adding a proposal for deployment --- deployment.md | 269 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 269 insertions(+) create mode 100644 deployment.md diff --git a/deployment.md b/deployment.md new file mode 100644 index 00000000..0a79ca86 --- /dev/null +++ b/deployment.md @@ -0,0 +1,269 @@ + + + + +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest 1.0.x release of this document can be found +[here](http://releases.k8s.io/release-1.0/docs/proposals/deployment.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- + + + + + +# Deployment + +## Abstract + +A proposal for implementing a new resource - Deployment - which will enable +declarative config updates for Pods and ReplicationControllers. + +Users will be able to create a Deployment, which will spin up +a ReplicationController to bring up the desired pods. +Users can also target the Deployment at existing ReplicationControllers, in +which case the new RC will replace the existing ones. The exact mechanics of +replacement depends on the DeploymentStrategy chosen by the user. +DeploymentStrategies are explained in detail in a later section. + +## Implementation + +### API Object + +The `Deployment` API object will have the following structure: + +```go +type Deployment struct { + TypeMeta + ObjectMeta + + // Specification of the desired behavior of the Deployment. + Spec DeploymentSpec + + // Most recently observed status of the Deployment. + Status DeploymentStatus +} + +type DeploymentSpec struct { + // Number of desired pods. This is a pointer to distinguish between explicit + // zero and not specified. Defaults to 1. + Replicas *int + + // Label selector for pods. Existing ReplicationControllers whose pods are + // selected by this will be scaled down. + Selector map[string]string + + // Describes the pods that will be created. + Template *PodTemplateSpec + + // The deployment strategy to use to replace existing pods with new ones. + Strategy DeploymentStrategy + + // Key of the selector that is added to existing RCs (and label key that is + // added to its pods) to prevent the existing RCs to select new pods (and old + // pods being selected by new RC). + // Users can set this to an empty string to indicate that the system should + // not add any selector and label. If unspecified, system uses + // "deployment.kubernetes.io/podTemplateHash". + // Value of this key is hash of DeploymentSpec.PodTemplateSpec. + UniqueLabelKey *string +} + +type DeploymentStrategy struct { + // Type of deployment. Can be "Recreate" or "RollingUpdate". + Type DeploymentType + + // TODO: Update this to follow our convention for oneOf, whatever we decide it + // to be. + // Rolling update config params. Present only if DeploymentType = + // RollingUpdate. + RollingUpdate *RollingUpdateDeploymentSpec +} + +type DeploymentType string + +const ( + // Kill all existing pods before creating new ones. + DeploymentRecreate DeploymentType = "Recreate" + + // Replace the old RCs by new one using rolling update i.e gradually scale down the old RCs and scale up the new one. + DeploymentRollingUpdate DeploymentType = "RollingUpdate" +) + +// Spec to control the desired behavior of rolling update. +type RollingUpdateDeploymentSpec struct { + // The maximum number of pods that can be unavailable during the update. + // Value can be an absolute number (ex: 5) or a percentage of total pods at the start of update (ex: 10%). + // Absolute number is calculated from percentage by rounding up. + // This can not be 0 if MaxSurge is 0. + // By default, a fixed value of 1 is used. + // Example: when this is set to 30%, the old RC can be scaled down by 30% + // immediately when the rolling update starts. Once new pods are ready, old RC + // can be scaled down further, followed by scaling up the new RC, ensuring + // that at least 70% of original number of pods are available at all times + // during the update. + MaxUnavailable IntOrString + + // The maximum number of pods that can be scheduled above the original number of + // pods. + // Value can be an absolute number (ex: 5) or a percentage of total pods at + // the start of the update (ex: 10%). This can not be 0 if MaxUnavailable is 0. + // Absolute number is calculated from percentage by rounding up. + // By default, a value of 1 is used. + // Example: when this is set to 30%, the new RC can be scaled up by 30% + // immediately when the rolling update starts. Once old pods have been killed, + // new RC can be scaled up further, ensuring that total number of pods running + // at any time during the update is atmost 130% of original pods. + MaxSurge IntOrString + + // Minimum number of seconds for which a newly created pod should be ready + // without any of its container crashing, for it to be considered available. + // Defaults to 0 (pod will be considered available as soon as it is ready) + MinReadySeconds int +} + +type DeploymentStatus struct { + // Total number of ready pods targeted by this deployment (this + // includes both the old and new pods). + Replicas int + + // Total number of new ready pods with the desired template spec. + UpdatedReplicas int +} + +``` + +### Controller + +#### Deployment Controller + +The DeploymentController will make Deployments happen. +It will watch Deployment objects in etcd. +For each pending deployment, it will: + +1. Find all RCs whose label selector is a superset of DeploymentSpec.Selector. + - For now, we will do this in the client - list all RCs and then filter the + ones we want. Eventually, we want to expose this in the API. +2. The new RC can have the same selector as the old RC and hence we add a unique + selector to all these RCs (and the corresponding label to their pods) to ensure + that they do not select the newly created pods (or old pods get selected by + new RC). + - The label key will be "deployment.kubernetes.io/podTemplateHash". + - The label value will be hash of the podTemplateSpec for that RC without + this label. This value will be unique for all RCs, since PodTemplateSpec should be unique. + - If the RCs and pods dont already have this label and selector: + - We will first add this to RC.PodTemplateSpec.Metadata.Labels for all RCs to + ensure that all new pods that they create will have this label. + - Then we will add this label to their existing pods and then add this as a selector + to that RC. +3. Find if there exists an RC for which value of "deployment.kubernetes.io/podTemplateHash" label + is same as hash of DeploymentSpec.PodTemplateSpec. If it exists already, then + this is the RC that will be ramped up. If there is no such RC, then we create + a new one using DeploymentSpec and then add a "deployment.kubernetes.io/podTemplateHash" label + to it. RCSpec.replicas = 0 for a newly created RC. +4. Scale up the new RC and scale down the olds ones as per the DeploymentStrategy. + - Raise an event if we detect an error, like new pods failing to come up. +5. Go back to step 1 unless the new RC has been ramped up to desired replicas + and the old RCs have been ramped down to 0. +6. Cleanup. + +DeploymentController is stateless so that it can recover incase it crashes during a deployment. + +### MinReadySeconds + +We will implement MinReadySeconds using the Ready condition in Pod. We will add +a LastTransitionTime to PodCondition and update kubelet to set Ready to false, +each time any container crashes. Kubelet will set Ready condition back to true once +all containers are ready. For containers without a readiness probe, we will +assume that they are ready as soon as they are up. +https://github.com/kubernetes/kubernetes/issues/11234 tracks updating kubelet +and https://github.com/kubernetes/kubernetes/issues/12615 tracks adding +LastTransitionTime to PodCondition. + +## Changing Deployment mid-way + +### Updating + +Users can update an ongoing deployment before it is completed. +In this case, the existing deployment will be stalled and the new one will +begin. +For ex: consider the following case: +- User creates a deployment to rolling-update 10 pods with image:v1 to + pods with image:v2. +- User then updates this deployment to create pods with image:v3, + when the image:v2 RC had been ramped up to 5 pods and the image:v1 RC + had been ramped down to 5 pods. +- When Deployment Controller observes the new deployment, it will create + a new RC for creating pods with image:v3. It will then start ramping up this + new RC to 10 pods and will ramp down both the existing RCs to 0. + +### Deleting + +Users can pause/cancel a deployment by deleting it before it is completed. +Recreating the same deployment will resume it. +For ex: consider the following case: +- User creates a deployment to rolling-update 10 pods with image:v1 to + pods with image:v2. +- User then deletes this deployment while the old and new RCs are at 5 replicas each. + User will end up with 2 RCs with 5 replicas each. +User can then create the same deployment again in which case, DeploymentController will +notice that the second RC exists already which it can ramp up while ramping down +the first one. + +### Rollback + +We want to allow the user to rollback a deployment. To rollback a +completed (or ongoing) deployment, user can create (or update) a deployment with +DeploymentSpec.PodTemplateSpec = oldRC.PodTemplateSpec. + +## Deployment Strategies + +DeploymentStrategy specifies how the new RC should replace existing RCs. +To begin with, we will support 2 types of deployment: +* Recreate: We kill all existing RCs and then bring up the new one. This results + in quick deployment but there is a downtime when old pods are down but + the new ones have not come up yet. +* Rolling update: We gradually scale down old RCs while scaling up the new one. + This results in a slower deployment, but there is no downtime. At all times + during the deployment, there are a few pods available (old or new). The number + of available pods and when is a pod considered "available" can be configured + using RollingUpdateDeploymentSpec. + +In future, we want to support more deployment types. + +## Future + +Apart from the above, we want to add support for the following: +* Running the deployment process in a pod: In future, we can run the deployment process in a pod. Then users can define their own custom deployments and we can run it using the image name. +* More DeploymentTypes: https://github.com/openshift/origin/blob/master/examples/deployment/README.md#deployment-types lists most commonly used ones. +* Triggers: Deployment will have a trigger field to identify what triggered the deployment. Options are: Manual/UserTriggered, Autoscaler, NewImage. +* Automatic rollback on error: We want to support automatic rollback on error or timeout. + +## References + +- https://github.com/GoogleCloudPlatform/kubernetes/issues/1743 has most of the + discussion that resulted in this proposal. + + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/deployment.md?pixel)]() + -- cgit v1.2.3 From 818f69e30b787f90b8678eef776039bc015b44bc Mon Sep 17 00:00:00 2001 From: He Simei Date: Thu, 30 Jul 2015 14:09:15 +0800 Subject: fix service-account related doc --- secrets.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/secrets.md b/secrets.md index 350d151b..895d9448 100644 --- a/secrets.md +++ b/secrets.md @@ -321,9 +321,9 @@ type Secret struct { type SecretType string const ( - SecretTypeOpaque SecretType = "Opaque" // Opaque (arbitrary data; default) - SecretTypeKubernetesAuthToken SecretType = "KubernetesAuth" // Kubernetes auth token - SecretTypeDockerRegistryAuth SecretType = "DockerRegistryAuth" // Docker registry auth + SecretTypeOpaque SecretType = "Opaque" // Opaque (arbitrary data; default) + SecretTypeServiceAccountToken SecretType = "kubernetes.io/service-account-token" // Kubernetes auth token + SecretTypeDockercfg SecretType = "kubernetes.io/dockercfg" // Docker registry auth // FUTURE: other type values ) -- cgit v1.2.3 From bc21d6f1f433a0fdcc8696b1c65a4908cf0b27d5 Mon Sep 17 00:00:00 2001 From: Brian Grant Date: Tue, 11 Aug 2015 06:30:48 +0000 Subject: Update API conventions. Add kubectl conventions. Ref #12322. Fixes #6797. --- api-conventions.md | 122 ++++++++++++++++++++++++++++++++++++------------- kubectl-conventions.md | 115 ++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 205 insertions(+), 32 deletions(-) create mode 100644 kubectl-conventions.md diff --git a/api-conventions.md b/api-conventions.md index bdd38830..75612820 100644 --- a/api-conventions.md +++ b/api-conventions.md @@ -33,11 +33,11 @@ Documentation for other releases can be found at API Conventions =============== -Updated: 4/16/2015 +Updated: 8/12/2015 *This document is oriented at users who want a deeper understanding of the Kubernetes API structure, and developers wanting to extend the Kubernetes API. An introduction to -using resources with kubectl can be found in (working_with_resources.md).* +using resources with kubectl can be found in [Working with resources](../user-guide/working-with-resources.md).* **Table of Contents** @@ -65,11 +65,14 @@ using resources with kubectl can be found in (working_with_resources.md).* - [Serialization Format](#serialization-format) - [Units](#units) - [Selecting Fields](#selecting-fields) + - [Object references](#object-references) - [HTTP Status codes](#http-status-codes) - [Success codes](#success-codes) - [Error codes](#error-codes) - [Response Status Kind](#response-status-kind) - [Events](#events) + - [Naming conventions](#naming-conventions) + - [Label, selector, and annotation conventions](#label-selector-and-annotation-conventions) @@ -84,7 +87,7 @@ The following terms are defined: * Collections - a list of resources of the same type, which may be queryable * Elements - an individual resource, addressable via a URL -Each resource typically accepts and returns data of a single kind. A kind may be accepted or returned by multiple resources that reflect specific use cases. For instance, the kind "pod" is exposed as a "pods" resource that allows end users to create, update, and delete pods, while a separate "pod status" resource (that acts on "pod" kind) allows automated processes to update a subset of the fields in that resource. A "restart" resource might be exposed for a number of different resources to allow the same action to have different results for each object. +Each resource typically accepts and returns data of a single kind. A kind may be accepted or returned by multiple resources that reflect specific use cases. For instance, the kind "Pod" is exposed as a "pods" resource that allows end users to create, update, and delete pods, while a separate "pod status" resource (that acts on "Pod" kind) allows automated processes to update a subset of the fields in that resource. Resource collections should be all lowercase and plural, whereas kinds are CamelCase and singular. @@ -99,7 +102,7 @@ Kinds are grouped into three categories: An object may have multiple resources that clients can use to perform specific actions that create, update, delete, or get. - Examples: `Pods`, `ReplicationControllers`, `Services`, `Namespaces`, `Nodes` + Examples: `Pod`, `ReplicationController`, `Service`, `Namespace`, `Node`. 2. **Lists** are collections of **resources** of one (usually) or more (occasionally) kinds. @@ -117,9 +120,15 @@ Kinds are grouped into three categories: Given their limited scope, they have the same set of limited common metadata as lists. - The "size" action may accept a simple resource that has only a single field as input (the number of things). The "status" kind is returned when errors occur and is not persisted in the system. + For instance, the "Status" kind is returned when errors occur and is not persisted in the system. - Examples: Binding, Status + Many simple resources are "subresources", which are rooted at API paths of specific resources. When resources wish to expose alternative actions or views that are closely coupled to a single resource, they should do so using new sub-resources. Common subresources include: + + * `/binding`: Used to bind a resource representing a user request (e.g., Pod, PersistentVolumeClaim) to a cluster infrastructure resource (e.g., Node, PersistentVolume). + * `/status`: Used to write just the status portion of a resource. For example, the `/pods` endpoint only allows updates to `metadata` and `spec`, since those reflect end-user intent. An automated process should be able to modify status for users to see by sending an updated Pod kind to the server to the "/pods/<name>/status" endpoint - the alternate endpoint allows different rules to be applied to the update, and access to be appropriately restricted. + * `/scale`: Used to read and write the count of a resource in a manner that is independent of the specific resource schema. + + Two additional subresources, `proxy` and `portforward`, provide access to cluster resources as described in [docs/user-guide/accessing-the-cluster.md](../user-guide/accessing-the-cluster.md). The standard REST verbs (defined below) MUST return singular JSON objects. Some API endpoints may deviate from the strict REST pattern and return resources that are not singular JSON objects, such as streams of JSON objects or unstructured text log data. @@ -147,6 +156,7 @@ Every object kind MUST have the following metadata in a nested object field call Every object SHOULD have the following metadata in a nested object field called "metadata": * resourceVersion: a string that identifies the internal version of this object that can be used by clients to determine when objects have changed. This value MUST be treated as opaque by clients and passed unmodified back to the server. Clients should not assume that the resource version has meaning across namespaces, different kinds of resources, or different servers. (see [concurrency control](#concurrency-control-and-consistency), below, for more details) +* generation: a sequence number representing a specific generation of the desired state. Set by the system and monotonically increasing, per-resource. May be compared, such as for RAW and WAW consistency. * creationTimestamp: a string representing an RFC 3339 date of the date and time an object was created * deletionTimestamp: a string representing an RFC 3339 date of the date and time after which this resource will be deleted. This field is set by the server when a graceful deletion is requested by the user, and is not directly settable by a client. The resource will be deleted (no longer visible from resource lists, and not reachable by name) after the time in this field. Once set, this value may not be unset or be set further into the future, although it may be shortened or the resource may be deleted prior to this time. * labels: a map of string keys and values that can be used to organize and categorize objects (see [docs/user-guide/labels.md](../user-guide/labels.md)) @@ -172,18 +182,38 @@ Objects that contain both spec and status should not contain additional top-leve ##### Typical status properties -* **phase**: The phase is a simple, high-level summary of the phase of the lifecycle of an object. The phase should progress monotonically. Typical phase values are `Pending` (not yet fully physically realized), `Running` or `Active` (fully realized and active, but not necessarily operating correctly), and `Terminated` (no longer active), but may vary slightly for different types of objects. New phase values should not be added to existing objects in the future. Like other status fields, it must be possible to ascertain the lifecycle phase by observation. Additional details regarding the current phase may be contained in other fields. -* **conditions**: Conditions represent orthogonal observations of an object's current state. Objects may report multiple conditions, and new types of conditions may be added in the future. Condition status values may be `True`, `False`, or `Unknown`. Unlike the phase, conditions are not expected to be monotonic -- their values may change back and forth. A typical condition type is `Ready`, which indicates the object was believed to be fully operational at the time it was last probed. Conditions may carry additional information, such as the last probe time or last transition time. +**Conditions** represent the latest available observations of an object's current state. Objects may report multiple conditions, and new types of conditions may be added in the future. Therefore, conditions are represented using a list/slice, where all have similar structure. + +The `FooCondition` type for some resource type `Foo` may include a subset of the following fields, but must contain at least `type` and `status` fields: + +```golang + Type FooConditionType `json:"type" description:"type of Foo condition"` + Status ConditionStatus `json:"status" description:"status of the condition, one of True, False, Unknown"` + LastHeartbeatTime util.Time `json:"lastHeartbeatTime,omitempty" description:"last time we got an update on a given condition"` + LastTransitionTime util.Time `json:"lastTransitionTime,omitempty" description:"last time the condition transit from one status to another"` + Reason string `json:"reason,omitempty" description:"one-word CamelCase reason for the condition's last transition"` + Message string `json:"message,omitempty" description:"human-readable message indicating details about last transition"` +``` + +Additional fields may be added in the future. + +Conditions should be added to explicitly convey properties that users and components care about rather than requiring those properties to be inferred from other observations. + +Condition status values may be `True`, `False`, or `Unknown`. The absence of a condition should be interpreted the same as `Unknown`. + +In general, condition values may change back and forth, but some condition transitions may be monotonic, depending on the resource and condition type. However, conditions are observations and not, themselves, state machines, nor do we define comprehensive state machines for objects, nor behaviors associated with state transitions. The system is level-based rather than edge-triggered, and should assume an Open World. -TODO(@vishh): Reason and Message. +A typical oscillating condition type is `Ready`, which indicates the object was believed to be fully operational at the time it was last probed. A possible monotonic condition could be `Succeeded`. A `False` status for `Succeeded` would imply failure. An object that was still active would not have a `Succeeded` condition, or its status would be `Unknown`. -Phases and conditions are observations and not, themselves, state machines, nor do we define comprehensive state machines for objects with behaviors associated with state transitions. The system is level-based and should assume an Open World. Additionally, new observations and details about these observations may be added over time. +Some resources in the v1 API contain fields called **`phase`**, and associated `message`, `reason`, and other status fields. The pattern of using `phase` is deprecated. Newer API types should use conditions instead. Phase was essentially a state-machine enumeration field, that contradicted [system-design principles](../design/principles.md#control-logic) and hampered evolution, since [adding new enum values breaks backward compatibility](api_changes.md). Rather than encouraging clients to infer implicit properties from phases, we intend to explicitly expose the conditions that clients need to monitor. Conditions also have the benefit that it is possible to create some conditions with uniform meaning across all resource types, while still exposing others that are unique to specific resource types. See [#7856](http://issues.k8s.io/7856) for more details and discussion. -In order to preserve extensibility, in the future, we intend to explicitly convey properties that users and components care about rather than requiring those properties to be inferred from observations. +In condition types, and everywhere else they appear in the API, **`Reason`** is intended to be a one-word, CamelCase representation of the category of cause of the current status, and **`Message`** is intended to be a human-readable phrase or sentence, which may contain specific details of the individual occurrence. `Reason` is intended to be used in concise output, such as one-line `kubectl get` output, and in summarizing occurrences of causes, whereas `Message` is intended to be presented to users in detailed status explanations, such as `kubectl describe` output. -Note that historical information status (e.g., last transition time, failure counts) is only provided at best effort, and is not guaranteed to not be lost. +Historical information status (e.g., last transition time, failure counts) is only provided with reasonable effort, and is not guaranteed to not be lost. -Status information that may be large (especially unbounded in size, such as lists of references to other objects -- see below) and/or rapidly changing, such as [resource usage](../design/resources.md#usage-data), should be put into separate objects, with possibly a reference from the original object. This helps to ensure that GETs and watch remain reasonably efficient for the majority of clients, which may not need that data. +Status information that may be large (especially proportional in size to collections of other resources, such as lists of references to other objects -- see below) and/or rapidly changing, such as [resource usage](../design/resources.md#usage-data), should be put into separate objects, with possibly a reference from the original object. This helps to ensure that GETs and watch remain reasonably efficient for the majority of clients, which may not need that data. + +Some resources report the `observedGeneration`, which is the `generation` most recently observed by the component responsible for acting upon changes to the desired state of the resource. This can be used, for instance, to ensure that the reported status reflects the most recent desired status. #### References to related objects @@ -213,7 +243,7 @@ ports: containerPort: 80 ``` -This rule maintains the invariant that all JSON/YAML keys are fields in API objects. The only exceptions are pure maps in the API (currently, labels, selectors, and annotations), as opposed to sets of subobjects. +This rule maintains the invariant that all JSON/YAML keys are fields in API objects. The only exceptions are pure maps in the API (currently, labels, selectors, annotations, data), as opposed to sets of subobjects. #### Constants @@ -249,19 +279,7 @@ API resources should use the traditional REST pattern: * DELETE /<resourceNamePlural>/<name> - Delete the single resource with the given name. DeleteOptions may specify gracePeriodSeconds, the optional duration in seconds before the object should be deleted. Individual kinds may declare fields which provide a default grace period, and different kinds may have differing kind-wide default grace periods. A user provided grace period overrides a default grace period, including the zero grace period ("now"). * PUT /<resourceNamePlural>/<name> - Update or create the resource with the given name with the JSON object provided by the client. * PATCH /<resourceNamePlural>/<name> - Selectively modify the specified fields of the resource. See more information [below](#patch). - -Kubernetes by convention exposes additional verbs as new root endpoints with singular names. Examples: - -* GET /watch/<resourceNamePlural> - Receive a stream of JSON objects corresponding to changes made to any resource of the given kind over time. -* GET /watch/<resourceNamePlural>/<name> - Receive a stream of JSON objects corresponding to changes made to the named resource of the given kind over time. - -These are verbs which change the fundamental type of data returned (watch returns a stream of JSON instead of a single JSON object). Support of additional verbs is not required for all object types. - -Two additional verbs `redirect` and `proxy` provide access to cluster resources as described in [docs/user-guide/accessing-the-cluster.md](../user-guide/accessing-the-cluster.md). - -When resources wish to expose alternative actions that are closely coupled to a single resource, they should do so using new sub-resources. An example is allowing automated processes to update the "status" field of a Pod. The `/pods` endpoint only allows updates to "metadata" and "spec", since those reflect end-user intent. An automated process should be able to modify status for users to see by sending an updated Pod kind to the server to the "/pods/<name>/status" endpoint - the alternate endpoint allows different rules to be applied to the update, and access to be appropriately restricted. Likewise, some actions like "stop" or "scale" are best represented as REST sub-resources that are POSTed to. The POST action may require a simple kind to be provided if the action requires parameters, or function without a request body. - -TODO: more documentation of Watch +* GET /<resourceNamePlural>&watch=true - Receive a stream of JSON objects corresponding to changes made to any resource of the given kind over time. ### PATCH operations @@ -423,7 +441,6 @@ APIs may return alternative representations of any resource in response to an Ac All dates should be serialized as RFC3339 strings. - ## Units Units must either be explicit in the field name (e.g., `timeoutSeconds`), or must be specified as part of the value (e.g., `resource.Quantity`). Which approach is preferred is TBD. @@ -431,11 +448,16 @@ Units must either be explicit in the field name (e.g., `timeoutSeconds`), or mus ## Selecting Fields -Some APIs may need to identify which field in a JSON object is invalid, or to reference a value to extract from a separate resource. The current recommendation is to use standard JavaScript syntax for accessing that field, assuming the JSON object was transformed into a JavaScript object. +Some APIs may need to identify which field in a JSON object is invalid, or to reference a value to extract from a separate resource. The current recommendation is to use standard JavaScript syntax for accessing that field, assuming the JSON object was transformed into a JavaScript object, without the leading dot, such as `metadata.name`. Examples: -* Find the field "current" in the object "state" in the second item in the array "fields": `fields[0].state.current` +* Find the field "current" in the object "state" in the second item in the array "fields": `fields[1].state.current` + +## Object references + +Object references should either be called `fooName` if referring to an object of kind `Foo` by just the name (within the current namespace, if a namespaced resource), or should be called `fooRef`, and should contain a subset of the fields of the `ObjectReference` type. + TODO: Plugins, extensions, nested kinds, headers @@ -561,7 +583,7 @@ $ curl -v -k -H "Authorization: Bearer WhCDvq4VPpYhrcfmF6ei7V9qlbqTubUc" https:/ `message` may contain human-readable description of the error -`reason` may contain a machine-readable description of why this operation is in the `Failure` status. If this value is empty there is no information available. The `reason` clarifies an HTTP status code but does not override it. +`reason` may contain a machine-readable, one-word, CamelCase description of why this operation is in the `Failure` status. If this value is empty there is no information available. The `reason` clarifies an HTTP status code but does not override it. `details` may contain extended data associated with the reason. Each reason may define its own extended details. This field is optional and the data returned is not guaranteed to conform to any schema except that defined by the reason type. @@ -646,7 +668,43 @@ Possible values for the `reason` and `details` fields: ## Events -TODO: Document events (refer to another doc for details) +Events are complementary to status information, since they can provide some historical information about status and occurrences in addition to current or previous status. Generate events for situations users or administrators should be alerted about. + +Choose a unique, specific, short, CamelCase reason for each event category. For example, `FreeDiskSpaceInvalid` is a good event reason because it is likely to refer to just one situation, but `Started` is not a good reason because it doesn't sufficiently indicate what started, even when combined with other event fields. + +`Error creating foo` or `Error creating foo %s` would be appropriate for an event message, with the latter being preferable, since it is more informational. + +Accumulate repeated events in the client, especially for frequent events, to reduce data volume, load on the system, and noise exposed to users. + +## Naming conventions + +* `Minion` has been deprecated in favor of `Node`. Use `Node` where referring to the node resource in the context of the cluster. Use `Host` where referring to properties of the individual physical/virtual system, such as `hostname`, `hostPath`, `hostNetwork`, etc. +* `FooController` is a deprecated kind naming convention. Name the kind after the thing being controlled instead (e.g., `Job` rather than `JobController`). +* The name of a field that specifies the time at which `something` occurs should be called `somethingTime`. Do not use `stamp` (e.g., `creationTimestamp`). +* Do not use abbreviations in the API, except where they are extremely commonly used, such as "id", "args", or "stdin". +* Acronyms should similarly only be used when extremely commonly known. All letters in the acronym should have the same case, using the appropriate case for the situation. For example, at the beginning of a field name, the acronym should be all lowercase, such as "httpGet". Where used as a constant, all letters should be uppercase, such as "TCP" or "UDP". + +## Label, selector, and annotation conventions + +Labels are the domain of users. They are intended to facilitate organization and management of API resources using attributes that are meaningful to users, as opposed to meaningful to the system. Think of them as user-created mp3 or email inbox labels, as opposed to the directory structure used by a program to store its data. The former is enables the user to apply an arbitrary ontology, whereas the latter is implementation-centric and inflexible. Users will use labels to select resources to operate on, display label values in CLI/UI columns, etc. Users should always retain full power and flexibility over the label schemas they apply to labels in their namespaces. + +However, we should support conveniences for common cases by default. For example, what we now do in ReplicationController is automatically set the RC's selector and labels to the labels in the pod template by default, if they are not already set. That ensures that the selector will match the template, and that the RC can be managed using the same labels as the pods it creates. Note that once we generalize selectors, it won't necessarily be possible to unambiguously generate labels that match an arbitrary selector. + +If the user wants to apply additional labels to the pods that it doesn't select upon, such as to facilitate adoption of pods or in the expectation that some label values will change, they can set the selector to a subset of the pod labels. Similarly, the RC's labels could be initialized to a subset of the pod template's labels, or could include additional/different labels. + +For disciplined users managing resources within their own namespaces, it's not that hard to consistently apply schemas that ensure uniqueness. One just needs to ensure that at least one value of some label key in common differs compared to all other comparable resources. We could/should provide a verification tool to check that. However, development of conventions similar to the examples in [Labels](../user-guide/labels.md) make uniqueness straightforward. Furthermore, relatively narrowly used namespaces (e.g., per environment, per application) can be used to reduce the set of resources that could potentially cause overlap. + +In cases where users could be running misc. examples with inconsistent schemas, or where tooling or components need to programmatically generate new objects to be selected, there needs to be a straightforward way to generate unique label sets. A simple way to ensure uniqueness of the set is to ensure uniqueness of a single label value, such as by using a resource name, uid, resource hash, or generation number. + +Problems with uids and hashes, however, include that they have no semantic meaning to the user, are not memorable nor readily recognizable, and are not predictable. Lack of predictability obstructs use cases such as creation of a replication controller from a pod, such as people want to do when exploring the system, bootstrapping a self-hosted cluster, or deletion and re-creation of a new RC that adopts the pods of the previous one, such as to rename it. Generation numbers are more predictable and much clearer, assuming there is a logical sequence. Fortunately, for deployments that's the case. For jobs, use of creation timestamps is common internally. Users should always be able to turn off auto-generation, in order to permit some of the scenarios described above. Note that auto-generated labels will also become one more field that needs to be stripped out when cloning a resource, within a namespace, in a new namespace, in a new cluster, etc., and will need to be ignored around when updating a resource via patch or read-modify-write sequence. + +Inclusion of a system prefix in a label key is fairly hostile to UX. A prefix is only necessary in the case that the user cannot choose the label key, in order to avoid collisions with user-defined labels. However, I firmly believe that the user should always be allowed to select the label keys to use on their resources, so it should always be possible to override default label keys. + +Therefore, resources supporting auto-generation of unique labels should have a `uniqueLabelKey` field, so that the user could specify the key if they wanted to, but if unspecified, it could be set by default, such as to the resource type, like job, deployment, or replicationController. The value would need to be at least spatially unique, and perhaps temporally unique in the case of job. + +Annotations have very different intended usage from labels. We expect them to be primarily generated and consumed by tooling and system extensions. I'm inclined to generalize annotations to permit them to directly store arbitrary json. Rigid names and name prefixes make sense, since they are analogous to API fields. + +In fact, experimental API fields, including to represent fields of newer alpha/beta API versions in the older, stable storage version, may be represented as annotations with the prefix `experimental.kubernetes.io/`. diff --git a/kubectl-conventions.md b/kubectl-conventions.md new file mode 100644 index 00000000..e5d1df75 --- /dev/null +++ b/kubectl-conventions.md @@ -0,0 +1,115 @@ + + + + +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest 1.0.x release of this document can be found +[here](http://releases.k8s.io/release-1.0/docs/devel/kubectl-conventions.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- + + + + + +Kubectl Conventions +=================== + +Updated: 8/12/2015 + +**Table of Contents** + + + - [Principles](#principles) + - [Command conventions](#command-conventions) + - [Flag conventions](#flag-conventions) + - [Output conventions](#output-conventions) + - [Documentation conventions](#documentation-conventions) + + + +## Principles + +* Strive for consistency across commands +* Explicit should always override implicit + * Environment variables should override default values + * Command-line flags should override default values and environment variables + * --namespace should also override the value specified in a specified resource + +## Command conventions + +* Command names are all lowercase, and hyphenated if multiple words. +* kubectl VERB NOUNs for commands that apply to multiple resource types +* NOUNs may be specified as TYPE name1 name2 ... or TYPE/name1 TYPE/name2; TYPE is omitted when only a single type is expected +* Resource types are all lowercase, with no hyphens; both singular and plural forms are accepted +* NOUNs may also be specified by one or more file arguments: -f file1 -f file2 ... +* Resource types may have 2- or 3-letter aliases. +* Business logic should be decoupled from the command framework, so that it can be reused independently of kubectl, cobra, etc. + * Ideally, commonly needed functionality would be implemented server-side in order to avoid problems typical of "fat" clients and to make it readily available to non-Go clients +* Commands that generate resources, such as `run` or `expose`, should obey the following conventions: + * Flags should be converted to a parameter Go map or json map prior to invoking the generator + * The generator must be versioned so that users depending on a specific behavior may pin to that version, via `--generator=` + * Generation should be decoupled from creation + * `--dry-run` should output the resource that would be created, without creating it +* A command group (e.g., `kubectl config`) may be used to group related non-standard commands, such as custom generators, mutations, and computations + +## Flag conventions + +* Flags are all lowercase, with words separated by hyphens +* Flag names and single-character aliases should have the same meaning across all commands +* Command-line flags corresponding to API fields should accept API enums exactly (e.g., --restart=Always) + +## Output conventions + +* By default, output is intended for humans rather than programs + * However, affordances are made for simple parsing of `get` output +* Only errors should be directed to stderr +* `get` commands should output one row per resource, and one resource per row + * Column titles and values should not contain spaces in order to facilitate commands that break lines into fields: cut, awk, etc. + * By default, `get` output should fit within about 80 columns + * Eventually we could perhaps auto-detect width + * `-o wide` may be used to display additional columns + * The first column should be the resource name, titled `NAME` (may change this to an abbreviation of resource type) + * NAMESPACE should be displayed as the first column when --all-namespaces is specified + * The last default column should be time since creation, titled `AGE` + * `-Lkey` should append a column containing the value of label with key `key`, with `` if not present + * json, yaml, Go template, and jsonpath template formats should be supported and encouraged for subsequent processing + * Users should use --api-version or --output-version to ensure the output uses the version they expect +* `describe` commands may output on multiple lines and may include information from related resources, such as events. Describe should add additional information from related resources that a normal user may need to know - if a user would always run "describe resource1" and the immediately want to run a "get type2" or "describe resource2", consider including that info. Examples, persistent volume claims for pods that reference claims, events for most resources, nodes and the pods scheduled on them. When fetching related resources, a targeted field selector should be used in favor of client side filtering of related resources. +* Mutations should output TYPE/name verbed by default, where TYPE is singular; `-o name` may be used to just display TYPE/name, which may be used to specify resources in other commands + +## Documentation conventions + +* Commands are documented using Cobra; docs are then auto-generated by hack/run-gendocs.sh. + * Use should contain a short usage string for the most common use case(s), not an exhaustive specification + * Short should contain a one-line explanation of what the command does + * Long may contain multiple lines, including additional information about input, output, commonly used flags, etc. + * Example should contain examples + * Start commands with `$` + * A comment should precede each example command, and should begin with `#` +* Use "FILENAME" for filenames +* Use "TYPE" for the particular flavor of resource type accepted by kubectl, rather than "RESOURCE" or "KIND" +* Use "NAME" for resource names + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/kubectl-conventions.md?pixel)]() + -- cgit v1.2.3 From d4d6d71afde5f59e7098c12c14c154cd62930531 Mon Sep 17 00:00:00 2001 From: Mike Danese Date: Fri, 14 Aug 2015 13:54:04 -0700 Subject: remove contrib/submit-queue as it is moving to the contrib repo --- pull-requests.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/pull-requests.md b/pull-requests.md index 6d2eb597..126b8996 100644 --- a/pull-requests.md +++ b/pull-requests.md @@ -52,14 +52,14 @@ Life of a Pull Request Unless in the last few weeks of a milestone when we need to reduce churn and stabilize, we aim to be always accepting pull requests. -Either the [on call](https://github.com/GoogleCloudPlatform/kubernetes/wiki/Kubernetes-on-call-rotation) manually or the [submit queue](../../contrib/submit-queue/) automatically will manage merging PRs. +Either the [on call](https://github.com/GoogleCloudPlatform/kubernetes/wiki/Kubernetes-on-call-rotation) manually or the [submit queue](https://github.com/contrib/tree/master/submit-queue) automatically will manage merging PRs. There are several requirements for the submit queue to work: * Author must have signed CLA ("cla: yes" label added to PR) * No changes can be made since last lgtm label was applied * k8s-bot must have reported the GCE E2E build and test steps passed (Travis, Shippable and Jenkins build) -Additionally, for infrequent or new contributors, we require the on call to apply the "ok-to-merge" label manually. This is gated by the [whitelist](../../contrib/submit-queue/whitelist.txt). +Additionally, for infrequent or new contributors, we require the on call to apply the "ok-to-merge" label manually. This is gated by the [whitelist](https://github.com/contrib/tree/master/submit-queue/whitelist.txt). -- cgit v1.2.3 From 0eb6b6ec3d1bba2e956e63e6309b3d23b9e6c8c4 Mon Sep 17 00:00:00 2001 From: Eric Paris Date: Fri, 14 Aug 2015 18:50:03 -0400 Subject: TYPO: fix documentation to point at update-generated-docs.sh --- kubectl-conventions.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kubectl-conventions.md b/kubectl-conventions.md index e5d1df75..5739708c 100644 --- a/kubectl-conventions.md +++ b/kubectl-conventions.md @@ -99,7 +99,7 @@ Updated: 8/12/2015 ## Documentation conventions -* Commands are documented using Cobra; docs are then auto-generated by hack/run-gendocs.sh. +* Commands are documented using Cobra; docs are then auto-generated by `hack/update-generated-docs.sh`. * Use should contain a short usage string for the most common use case(s), not an exhaustive specification * Short should contain a one-line explanation of what the command does * Long may contain multiple lines, including additional information about input, output, commonly used flags, etc. -- cgit v1.2.3 From abb5b4de722f05ed7c24a3cfc5017c71bb2252f2 Mon Sep 17 00:00:00 2001 From: Patrick Flor Date: Mon, 17 Aug 2015 09:17:03 -0700 Subject: Update dev docs to note new coveralls URL (also noting old URL for interested parties and future historians) --- development.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/development.md b/development.md index db74adaf..a266f7cb 100644 --- a/development.md +++ b/development.md @@ -249,7 +249,7 @@ KUBE_COVER=y hack/test-go.sh pkg/kubectl Multiple arguments can be passed, in which case the coverage results will be combined for all tests run. -Coverage results for the project can also be viewed on [Coveralls](https://coveralls.io/r/GoogleCloudPlatform/kubernetes), and are continuously updated as commits are merged. Additionally, all pull requests which spawn a Travis build will report unit test coverage results to Coveralls. +Coverage results for the project can also be viewed on [Coveralls](https://coveralls.io/r/kubernetes/kubernetes), and are continuously updated as commits are merged. Additionally, all pull requests which spawn a Travis build will report unit test coverage results to Coveralls. Coverage reports from before the Kubernetes Github organization was created can be found [here](https://coveralls.io/r/GoogleCloudPlatform/kubernetes). ## Integration tests -- cgit v1.2.3 From 9e11b1b1a6283d95d29aaf8979ece898eabc2cfd Mon Sep 17 00:00:00 2001 From: Maciej Szulik Date: Thu, 23 Jul 2015 14:01:38 +0200 Subject: Job controller proposal --- job.md | 191 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 191 insertions(+) create mode 100644 job.md diff --git a/job.md b/job.md new file mode 100644 index 00000000..627f2a05 --- /dev/null +++ b/job.md @@ -0,0 +1,191 @@ + + + + +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest 1.0.x release of this document can be found +[here](http://releases.k8s.io/release-1.0/docs/proposals/job.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- + + + + + +# Job Controller + +## Abstract + +A proposal for implementing a new controller - Job controller - which will be responsible +for managing pod(s) that require running once to completion even if the machine +the pod is running on fails, in contrast to what ReplicationController currently offers. + +Several existing issues and PRs were already created regarding that particular subject: +* Job Controller [#1624](https://github.com/GoogleCloudPlatform/kubernetes/issues/1624) +* New Job resource [#7380](https://github.com/GoogleCloudPlatform/kubernetes/pull/7380) + + +## Use Cases + +1. Be able to start one or several pods tracked as a single entity. +1. Be able to run batch-oriented workloads on Kubernetes. +1. Be able to get the job status. +1. Be able to specify the number of instances performing a job at any one time. +1. Be able to specify the number of successfully finished instances required to finish a job. + + +## Motivation + +Jobs are needed for executing multi-pod computation to completion; a good example +here would be the ability to implement any type of batch oriented tasks. + + +## Implementation + +Job controller is similar to replication controller in that they manage pods. +This implies they will follow the same controller framework that replication +controllers already defined. The biggest difference between a `Job` and a +`ReplicationController` object is the purpose; `ReplicationController` +ensures that a specified number of Pods are running at any one time, whereas +`Job` is responsible for keeping the desired number of Pods to a completion of +a task. This difference will be represented by the `RestartPolicy` which is +required to always take value of `RestartPolicyNever` or `RestartOnFailure`. + + +The new `Job` object will have the following content: + +```go +// Job represents the configuration of a single job. +type Job struct { + TypeMeta + ObjectMeta + + // Spec is a structure defining the expected behavior of a job. + Spec JobSpec + + // Status is a structure describing current status of a job. + Status JobStatus +} + +// JobList is a collection of jobs. +type JobList struct { + TypeMeta + ListMeta + + Items []Job +} +``` + +`JobSpec` structure is defined to contain all the information how the actual job execution +will look like. + +```go +// JobSpec describes how the job execution will look like. +type JobSpec struct { + + // Parallelism specifies the maximum desired number of pods the job should + // run at any given time. The actual number of pods running in steady state will + // be less than this number when ((.spec.completions - .status.successful) < .spec.parallelism), + // i.e. when the work left to do is less than max parallelism. + Parallelism *int + + // Completions specifies the desired number of successfully finished pods the + // job should be run with. Defaults to 1. + Completions *int + + // Selector is a label query over pods running a job. + Selector map[string]string + + // Template is the object that describes the pod that will be created when + // executing a job. + Template *PodTemplateSpec +} +``` + +`JobStatus` structure is defined to contain informations about pods executing +specified job. The structure holds information about pods currently executing +the job. + +```go +// JobStatus represents the current state of a Job. +type JobStatus struct { + Conditions []JobCondition + + // CreationTime represents time when the job was created + CreationTime util.Time + + // StartTime represents time when the job was started + StartTime util.Time + + // CompletionTime represents time when the job was completed + CompletionTime util.Time + + // Active is the number of actively running pods. + Active int + + // Successful is the number of pods successfully completed their job. + Successful int + + // Unsuccessful is the number of pods failures, this applies only to jobs + // created with RestartPolicyNever, otherwise this value will always be 0. + Unsuccessful int +} + +type JobConditionType string + +// These are valid conditions of a job. +const ( + // JobSucceeded means the job has successfully completed its execution. + JobSucceeded JobConditionType = "Complete" +) + +// JobCondition describes current state of a job. +type JobCondition struct { + Type JobConditionType + Status ConditionStatus + LastHeartbeatTime util.Time + LastTransitionTime util.Time + Reason string + Message string +} +``` + +## Events + +Job controller will be emitting the following events: +* JobStart +* JobFinish + +## Future evolution + +Below are the possible future extensions to the Job controller: +* Be able to limit the execution time for a job, similarly to ActiveDeadlineSeconds for Pods. +* Be able to create a chain of jobs dependent one on another. +* Be able to specify the work each of the workers should execute (see type 1 from + [this comment](https://github.com/GoogleCloudPlatform/kubernetes/issues/1624#issuecomment-97622142)) +* Be able to inspect Pods running a Job, especially after a Job has finished, e.g. + by providing pointers to Pods in the JobStatus ([see comment](https://github.com/kubernetes/kubernetes/pull/11746/files#r37142628)). + + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/job.md?pixel)]() + -- cgit v1.2.3 From 4434a3aca668a7dbdef7fc9d7787b3fdf6e69819 Mon Sep 17 00:00:00 2001 From: Kris Rousey Date: Wed, 12 Aug 2015 10:35:07 -0700 Subject: Moving client libs to unversioned dir --- event_compression.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/event_compression.md b/event_compression.md index 1187edb6..4525c097 100644 --- a/event_compression.md +++ b/event_compression.md @@ -60,7 +60,7 @@ Instead of a single Timestamp, each event object [contains](http://releases.k8s. Each binary that generates events: * Maintains a historical record of previously generated events: - * Implemented with ["Least Recently Used Cache"](https://github.com/golang/groupcache/blob/master/lru/lru.go) in [`pkg/client/record/events_cache.go`](../../pkg/client/record/events_cache.go). + * Implemented with ["Least Recently Used Cache"](https://github.com/golang/groupcache/blob/master/lru/lru.go) in [`pkg/client/unversioned/record/events_cache.go`](../../pkg/client/unversioned/record/events_cache.go). * The key in the cache is generated from the event object minus timestamps/count/transient fields, specifically the following events fields are used to construct a unique key for an event: * `event.Source.Component` * `event.Source.Host` -- cgit v1.2.3 From 8d1a87b242d600f94d939e8b2e9b897be1de2430 Mon Sep 17 00:00:00 2001 From: Kris Rousey Date: Wed, 12 Aug 2015 10:35:07 -0700 Subject: Moving client libs to unversioned dir --- apiserver-watch.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/apiserver-watch.md b/apiserver-watch.md index 02a6e6c8..6bc2d33f 100644 --- a/apiserver-watch.md +++ b/apiserver-watch.md @@ -166,7 +166,7 @@ the same time, we can introduce an additional etcd event type: Thus, we need to create the EtcdResync event, extend watch.Interface and its implementations to support it and handle those events appropriately in places like - [Reflector](../../pkg/client/cache/reflector.go) + [Reflector](../../pkg/client/unversioned/cache/reflector.go) However, this might turn out to be unnecessary optimization if apiserver will always keep up (which is possible in the new design). We will work -- cgit v1.2.3 From d02af2b80ce483b79cd57e2b5bde3040f12de0d4 Mon Sep 17 00:00:00 2001 From: Brian Grant Date: Tue, 18 Aug 2015 23:29:40 +0000 Subject: Add duration naming conventions. --- api-conventions.md | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/api-conventions.md b/api-conventions.md index 75612820..1730ada8 100644 --- a/api-conventions.md +++ b/api-conventions.md @@ -443,7 +443,7 @@ All dates should be serialized as RFC3339 strings. ## Units -Units must either be explicit in the field name (e.g., `timeoutSeconds`), or must be specified as part of the value (e.g., `resource.Quantity`). Which approach is preferred is TBD. +Units must either be explicit in the field name (e.g., `timeoutSeconds`), or must be specified as part of the value (e.g., `resource.Quantity`). Which approach is preferred is TBD, though currently we use the `fooSeconds` convention for durations. ## Selecting Fields @@ -681,6 +681,10 @@ Accumulate repeated events in the client, especially for frequent events, to red * `Minion` has been deprecated in favor of `Node`. Use `Node` where referring to the node resource in the context of the cluster. Use `Host` where referring to properties of the individual physical/virtual system, such as `hostname`, `hostPath`, `hostNetwork`, etc. * `FooController` is a deprecated kind naming convention. Name the kind after the thing being controlled instead (e.g., `Job` rather than `JobController`). * The name of a field that specifies the time at which `something` occurs should be called `somethingTime`. Do not use `stamp` (e.g., `creationTimestamp`). +* We use the `fooSeconds` convention for durations, as discussed in the [units subsection](#units). + * `fooPeriodSeconds` is preferred for periodic intervals and other waiting periods (e.g., over `fooIntervalSeconds`). + * `fooTimeoutSeconds` is preferred for inactivity/unresponsiveness deadlines. + * `fooDeadlineSeconds` is preferred for activity completion deadlines. * Do not use abbreviations in the API, except where they are extremely commonly used, such as "id", "args", or "stdin". * Acronyms should similarly only be used when extremely commonly known. All letters in the acronym should have the same case, using the appropriate case for the situation. For example, at the beginning of a field name, the acronym should be all lowercase, such as "httpGet". Where used as a constant, all letters should be uppercase, such as "TCP" or "UDP". -- cgit v1.2.3 From e9f50fabe0f2a1e869344f955e3063bb12cfe186 Mon Sep 17 00:00:00 2001 From: Ilya Dmitrichenko Date: Wed, 19 Aug 2015 12:01:50 +0100 Subject: Make typography more consistent --- architecture.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/architecture.md b/architecture.md index 5f829d68..b17345ef 100644 --- a/architecture.md +++ b/architecture.md @@ -33,7 +33,7 @@ Documentation for other releases can be found at # Kubernetes architecture -A running Kubernetes cluster contains node agents (kubelet) and master components (APIs, scheduler, etc), on top of a distributed storage solution. This diagram shows our desired eventual state, though we're still working on a few things, like making kubelet itself (all our components, really) run within containers, and making the scheduler 100% pluggable. +A running Kubernetes cluster contains node agents (`kubelet`) and master components (APIs, scheduler, etc), on top of a distributed storage solution. This diagram shows our desired eventual state, though we're still working on a few things, like making `kubelet` itself (all our components, really) run within containers, and making the scheduler 100% pluggable. ![Architecture Diagram](architecture.png?raw=true "Architecture overview") @@ -45,21 +45,21 @@ The Kubernetes node has the services necessary to run application containers and Each node runs Docker, of course. Docker takes care of the details of downloading images and running containers. -### Kubelet +### `kubelet` -The **Kubelet** manages [pods](../user-guide/pods.md) and their containers, their images, their volumes, etc. +The `kubelet` manages [pods](../user-guide/pods.md) and their containers, their images, their volumes, etc. -### Kube-Proxy +### `kube-proxy` Each node also runs a simple network proxy and load balancer (see the [services FAQ](https://github.com/GoogleCloudPlatform/kubernetes/wiki/Services-FAQ) for more details). This reflects `services` (see [the services doc](../user-guide/services.md) for more details) as defined in the Kubernetes API on each node and can do simple TCP and UDP stream forwarding (round robin) across a set of backends. -Service endpoints are currently found via [DNS](../admin/dns.md) or through environment variables (both [Docker-links-compatible](https://docs.docker.com/userguide/dockerlinks/) and Kubernetes {FOO}_SERVICE_HOST and {FOO}_SERVICE_PORT variables are supported). These variables resolve to ports managed by the service proxy. +Service endpoints are currently found via [DNS](../admin/dns.md) or through environment variables (both [Docker-links-compatible](https://docs.docker.com/userguide/dockerlinks/) and Kubernetes `{FOO}_SERVICE_HOST` and `{FOO}_SERVICE_PORT` variables are supported). These variables resolve to ports managed by the service proxy. ## The Kubernetes Control Plane The Kubernetes control plane is split into a set of components. Currently they all run on a single _master_ node, but that is expected to change soon in order to support high-availability clusters. These components work together to provide a unified view of the cluster. -### etcd +### `etcd` All persistent master state is stored in an instance of `etcd`. This provides a great way to store configuration data reliably. With `watch` support, coordinating components can be notified very quickly of changes. -- cgit v1.2.3 From fa761e592f8951cb57d99946e792fb69540978c3 Mon Sep 17 00:00:00 2001 From: Maciej Szulik Date: Thu, 20 Aug 2015 10:51:38 +0200 Subject: Changed JobConditionType from JobSucceded to JobComplete to more accurately reflect job's final state --- job.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/job.md b/job.md index 627f2a05..57717ea5 100644 --- a/job.md +++ b/job.md @@ -154,8 +154,8 @@ type JobConditionType string // These are valid conditions of a job. const ( - // JobSucceeded means the job has successfully completed its execution. - JobSucceeded JobConditionType = "Complete" + // JobComplete means the job has completed its execution. + JobComplete JobConditionType = "Complete" ) // JobCondition describes current state of a job. -- cgit v1.2.3 From bf039663a4243c1ad4a75a83dec8dc4fe7c6a824 Mon Sep 17 00:00:00 2001 From: Piotr Szczesniak Date: Mon, 10 Aug 2015 13:59:03 +0200 Subject: Initial Resources proposal --- initial-resources.md | 110 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 110 insertions(+) create mode 100644 initial-resources.md diff --git a/initial-resources.md b/initial-resources.md new file mode 100644 index 00000000..efd7e2e1 --- /dev/null +++ b/initial-resources.md @@ -0,0 +1,110 @@ + + + + +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest 1.0.x release of this document can be found +[here](http://releases.k8s.io/release-1.0/docs/proposals/initial-resources.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- + + + + + +## Abstract + +Initial Resources is a data-driven feature that based on historical data tries to estimate resource usage of a container without Resources specified +and set them before the container is run. This document describes design of the component. + +## Motivation + +Since we want to make Kubernetes as simple as possible for its users we don’t want to require setting +[Resources](https://github.com/GoogleCloudPlatform/kubernetes/blob/7c9bbef96ed7f2a192a1318aa312919b861aee00/pkg/api/v1/types.go#L696) +for container by its owner. On the other hand having Resources filled is critical for scheduling decisions. +Current solution to set up Resources to hardcoded value has obvious drawbacks. We need to implement a component +which will set initial Resources to a reasonable value. + +## Design + +InitialResources component will be implemented as an [admission plugin](../../plugin/pkg/admission/) and invoked right before +[LimitRanger](https://github.com/GoogleCloudPlatform/kubernetes/blob/7c9bbef96ed7f2a192a1318aa312919b861aee00/cluster/gce/config-default.sh#L91). +For every container without Resources specified it will try to predict amount of resources that should be sufficient for it. +So that a pod without specified resources will be treated as +[Burstable](https://github.com/GoogleCloudPlatform/kubernetes/blob/be5e224a0f1c928d49c48aa6a6539d22c47f9238/docs/proposals/resource-qos.md#qos-classes). + +InitialResources will set only [Request](https://github.com/GoogleCloudPlatform/kubernetes/blob/3d2d99c6fd920386eea4ec050164839ec6db38f0/pkg/api/v1/types.go#L665) +(independently for each resource type: cpu, memory) +field in the first version to avoid killing containers due to OOM (however the container still may be killed if exceeds requested resources). +To make the component work with LimitRanger the estimated value will be capped by min and max possible values if defined. +It will prevent from situation when the pod is rejected due to too low or too high estimation. + +The container won’t be marked as managed by this component in any way, however appropriate event will be exported. +The predicting algorithm should have very low latency to not increase significantly e2e pod startup latency +[#3954](https://github.com/GoogleCloudPlatform/kubernetes/pull/3954). + +### Predicting algorithm details + +In the first version estimation will be made based on historical data for the Docker image being run in the container (both the name and the tag matters). +CPU/memory usage of each container is exported periodically (by default with 1 minute resolution) to the backend (see more in [Monitoring pipeline](#monitoring-pipeline)). + +InitialResources will set Request for both cpu/mem as the 90th percentile of the first (in the following order) set of samples defined in the following way: + +* 7 days same image:tag, assuming there is at least 60 samples (1 hour) +* 30 days same image:tag, assuming there is at least 60 samples (1 hour) +* 30 days same image, assuming there is at least 1 sample + +If there is still no data the default value will be set by LimitRanger. Same parameters will be configurable with appropriate flags. + +#### Example + +For example, if we have at least 60 samples from image:tag over the past 7 days, we will use the 90th percentile of all of the samples of image:tag over the past 7 days. +Otherwise, if we have at least 60 samples from image:tag over the past 30 days, we will use the 90th percentile of all of the samples over of image:tag the past 30 days. +Otherwise, if we have at least 1 sample from image over the past 30 days, we will use that the 90th percentile of all of the samples of image over the past 30 days. +Otherwise we will use default value. + +### Monitoring pipeline + +In the first version there will be available 2 options for backend for predicting algorithm: + +* [InfluxDB](../../docs/user-guide/monitoring.md#influxdb-and-grafana) - aggregation will be made in SQL query +* [GCM](../../docs/user-guide/monitoring.md#google-cloud-monitoring) - since GCM is not as powerful as InfluxDB some aggregation will be made on the client side + +Both will be hidden under an abstraction layer, so it would be easy to add another option. +The code will be a part of Initial Resources component to not block development, however in the future it should be a part of Heapster. + + +## Next steps + +The first version will be quite simple so there is a lot of possible improvements. Some of them seem to have high priority +and should be introduced shortly after the first version is done: + +* observe OOM and then react to it by increasing estimation +* add other features to the model like *namespace* +* remember predefined values for the most popular images like *mysql*, *nginx*, *redis*, etc. +* dry mode, which allows to ask system for resource recommendation for a container without running it +* add estimation as annotations for those containers that already has resources set +* support for other data sources like [Hawkular](http://www.hawkular.org/) + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/initial-resources.md?pixel)]() + -- cgit v1.2.3 From aa50296498a9f565bb7a6d55b06cdb1a6aa26a00 Mon Sep 17 00:00:00 2001 From: Jerzy Szczepkowski Date: Tue, 18 Aug 2015 15:25:57 +0200 Subject: Design proposal: Horizontal Pod Autoscaler. Added design proposal for Horizontal Pod Autoscaler. Related to #12087. --- horizontal-pod-autoscaler.md | 272 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 272 insertions(+) create mode 100644 horizontal-pod-autoscaler.md diff --git a/horizontal-pod-autoscaler.md b/horizontal-pod-autoscaler.md new file mode 100644 index 00000000..91211793 --- /dev/null +++ b/horizontal-pod-autoscaler.md @@ -0,0 +1,272 @@ + + + + +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest 1.0.x release of this document can be found +[here](http://releases.k8s.io/release-1.0/docs/proposals/horizontal-pod-autoscaler.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- + + + + + +# Horizontal Pod Autoscaling + +**Author**: Jerzy Szczepkowski (@jszczepkowski) + +## Preface + +This document briefly describes the design of the horizontal autoscaler for pods. +The autoscaler (implemented as a kubernetes control loop) will be responsible for automatically +choosing and setting the number of pods of a given type that run in a kubernetes cluster. + +This proposal supersedes [autoscaling.md](http://releases.k8s.io/release-1.0/docs/proposals/autoscaling.md). + +## Overview + +The usage of a serving application usually vary over time: sometimes the demand for the application rises, +and sometimes it drops. +In the version 1.0, a user can only manually set the number of serving pods. +Our aim is to provide a mechanism for the automatic adjustment of the number of pods basing on usage statistics. + +## Scale Subresource + +We are going to introduce Scale subresource and implement horizontal autoscaling of pods on a base of it. +Scale subresource will be supported for replication controllers and deployments. +HorizontalPodAutoscaler object will be bound with exactly one Scale subresource and will be +autoscaling associated replication controller/deployment through it. + +Scale subresource will be present for replication controller or deployment under the following paths: + +```api/vX/replicationcontrollers/myrc/scale``` + +```api/vX/deployments/mydeployment/scale``` + +It will have the following structure: + +```go +// Scale subresource, applicable to ReplicationControllers and (in future) Deployment. +type Scale struct { + api.TypeMeta + api.ObjectMeta + + // Spec defines the behavior of the scale. + Spec ScaleSpec + + // Status represents the current status of the scale. + Status ScaleStatus +} + +// ScaleSpec describes the attributes a Scale subresource +type ScaleSpec struct { + // Replicas is the number of desired replicas. + Replicas int +} + +// ScaleStatus represents the current status of a Scale subresource. +type ScaleStatus struct { + // Replicas is the number of actual replicas. + Replicas int + + // Selector is a label query over pods that should match the replicas count. + Selector map[string]string +} + +``` + +Writing ```ScaleSpec.Count``` will resize the replication controller/deployment associated with +the given Scale subresource. +```ScaleStatus.Count``` will report how many pods are currently running in the replication controller/deployment, +and ```ScaleStatus.PodSelector``` will return selector for the pods. + +## HorizontalPodAutoscaler Object + +We will introduce HorizontalPodAutoscaler object, it will be accessible under: + +``` +api/vX/horizontalpodautoscalers/myautoscaler +``` + +It will have the following structure: + +```go +// HorizontalPodAutoscaler represents the configuration of a horizontal pod autoscaler. +type HorizontalPodAutoscaler struct { + api.TypeMeta + api.ObjectMeta + + // Spec defines the behaviour of autoscaler. + Spec HorizontalPodAutoscalerSpec + + // Status represents the current information about the autoscaler. + Status HorizontalPodAutoscalerStatus +} + +// HorizontalPodAutoscalerSpec is the specification of a horizontal pod autoscaler. +type HorizontalPodAutoscalerSpec struct { + // ScaleRef is a reference to Scale subresource. HorizontalPodAutoscaler will learn the current + // resource consumption from its status, and will set the desired number of pods by modyfying its spec. + ScaleRef *SubresourceReference + // MinCount is the lower limit for the number of pods that can be set by the autoscaler. + MinCount int + // MaxCount is the upper limit for the number of pods that can be set by the autoscaler. + // It cannot be smaller than MinCount. + MaxCount int + // Target is the target average consumption of the given resource that the autoscaler will try + // to maintain by adjusting the desired number of pods. + // Currently two types of resources are supported: "cpu" and "memory". + Target ResourceConsumption +} + +// HorizontalPodAutoscalerStatus contains the current status of a horizontal pod autoscaler +type HorizontalPodAutoscalerStatus struct { + // CurrentReplicas is the number of replicas of pods managed by this autoscaler. + CurrentReplicas int + + // DesiredReplicas is the desired number of replicas of pods managed by this autoscaler. + // The number may be different because pod downscaling is someteimes delayed to keep the number + // of pods stable. + DesiredReplicas int + + // CurrentConsumption is the current average consumption of the given resource that the autoscaler will + // try to maintain by adjusting the desired number of pods. + // Two types of resources are supported: "cpu" and "memory". + CurrentConsumption ResourceConsumption + + // LastScaleTimestamp is the last time the HorizontalPodAutoscaler scaled the number of pods. + // This is used by the autoscaler to controll how often the number of pods is changed. + LastScaleTimestamp *util.Time +} + +// ResourceConsumption is an object for specifying average resource consumption of a particular resource. +type ResourceConsumption struct { + Resource api.ResourceName + Quantity resource.Quantity +} +``` + +```Scale``` will be a reference to the Scale subresource. +```MinCount```, ```MaxCount``` and ```Target``` will define autoscaler configuration. +We will also introduce HorizontalPodAutoscalerList object to enable listing all autoscalers in the cluster: + +```go +// HorizontalPodAutoscaler is a collection of pod autoscalers. +type HorizontalPodAutoscalerList struct { + api.TypeMeta + api.ListMeta + + Items []HorizontalPodAutoscaler +} +``` + +## Autoscaling Algorithm + +The autoscaler will be implemented as a control loop. +It will periodically (e.g.: every 1 minute) query pods described by ```Status.PodSelector``` of Scale subresource, +and check their average CPU or memory usage from the last 1 minute +(there will be API on master for this purpose, see +[#11951](https://github.com/GoogleCloudPlatform/kubernetes/issues/11951). +Then, it will compare the current CPU or memory consumption with the Target, +and adjust the count of the Scale if needed to match the target +(preserving condition: MinCount <= Count <= MaxCount). + +The target number of pods will be calculated from the following formula: + +``` +TargetNumOfPods = sum(CurrentPodsConsumption) / Target +``` + +To make scaling more stable, scale-up will happen only when the floor of ```TargetNumOfPods``` is higher than +the current number, while scale-down will happen only when the ceiling of ```TargetNumOfPods``` is lower than +the current number. + +The decision to scale-up will be executed instantly. +However, we will execute scale-down only if the sufficient time has past from the last scale-up (e.g.: 10 minutes). +Such approach has two benefits: + +* Autoscaler works in a conservative way. + If new user load appears, it is important for us to rapidly increase the number of pods, + so that user requests will not be rejected. + Lowering the number of pods is not that urgent. + +* Autoscaler avoids thrashing, i.e.: prevents rapid execution of conflicting decision if the load is not stable. + + +As the CPU consumption of a pod immediately after start may be highly variable due to initialization/startup, +autoscaler will skip metrics from the first minute of pod lifecycle. + +## Relative vs. absolute metrics + +The question arises whether the values of the target metrics should be absolute (e.g.: 0.6 core, 100MB of RAM) +or relative (e.g.: 110% of resource request, 90% of resource limit). +The argument for the relative metrics is that when user changes resources for a pod, +she will not have to change the definition of the autoscaler object, as the relative metric will still be valid. +However, we want to be able to base autoscaling on custom metrics in the future. +Such metrics will rather be absolute (e.g.: the number of queries-per-second). +Therefore, we decided to give absolute values for the target metrics in the initial version. + +Please note that when custom metrics are supported, it will be possible to create additional metrics +in heapster that will divide CPU/memory consumption by resource request/limit. +From autoscaler point of view the metrics will be absolute, +althoug such metrics will be bring the benefits of relative metrics to the user. + + +## Support in kubectl + +To make manipulation on HorizontalPodAutoscaler object simpler, we will add support for creating/updating/deletion/listing of HorizontalPodAutoscaler to kubectl. +In addition, we will add kubectl support for the following use-cases: +* When running an image with ```kubectl run```, there should be an additional option to create + an autoscaler for it. +* When creating a replication controller or deployment with ```kubectl create [-f]```, there should be + a possibility to specify an additional autoscaler object. +* We will and a new command ```kubectl autoscale``` that will allow for easy creation of an autoscaler object + for already existing replication controller/deployment. + +## Future Features + +We list here some features that will not be supported in the initial version of autoscaler. +However, we want to keep them in mind, as they will most probably be needed in future. +Our design is in general compatible with them. +* Autoscale pods on a base of metrics different than CPU & memory (e.g.: network traffic, qps). + This includes scaling based on a custom metric. +* Autoscale pods on a base of multiple metrics. + If the target numbers of pods for different metrics are different, choose the largest target number of pods. +* Scale the number of pods starting from 0: all pods can be turned-off, + and then turned-on when there is a demand for them. + When a request to service with no pods arrives, kube-proxy will generate an event for autoscaler + to create a new pod. + Discussed in [#3247](https://github.com/GoogleCloudPlatform/kubernetes/issues/3247). +* When scaling down, make more educated decision which pods to kill (e.g.: kill pods that doubled-up first). + Discussed in [#4301](https://github.com/GoogleCloudPlatform/kubernetes/issues/4301). +* Allow rule based autoscaling: instead of specifying the target value for metric, + specify a rule, e.g.: “if average CPU consumption of pod is higher than 80% add two more replicas”. + This approach was initially suggested in + [autoscaling.md](http://releases.k8s.io/release-1.0/docs/proposals/autoscaling.md) proposal. + Before doing this, we need to evaluate why the target based scaling described in this proposal is not sufficient. + + + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/horizontal-pod-autoscaler.md?pixel)]() + -- cgit v1.2.3 From cb694b865dbcfd424d2bc2408c20e9008af51dc8 Mon Sep 17 00:00:00 2001 From: Quinton Hoole Date: Tue, 18 Aug 2015 13:32:02 -0700 Subject: Address feedback on Cluster Federation proposal doc. --- federation.md | 351 ++++++++++++++++++++++++++++++++++++++++++++++------------ 1 file changed, 279 insertions(+), 72 deletions(-) diff --git a/federation.md b/federation.md index 1845e9eb..7b642bb3 100644 --- a/federation.md +++ b/federation.md @@ -40,9 +40,10 @@ Documentation for other releases can be found at ## _by Quinton Hoole ([quinton@google.com](mailto:quinton@google.com))_ _Initial revision: 2015-03-05_ -_Last updated: 2015-03-09_ +_Last updated: 2015-08-20_ This doc: [tinyurl.com/ubernetesv2](http://tinyurl.com/ubernetesv2) -Slides: [tinyurl.com/ubernetes-slides](http://tinyurl.com/ubernetes-slides) +Original slides: [tinyurl.com/ubernetes-slides](http://tinyurl.com/ubernetes-slides) +Updated slides: [tinyurl.com/ubernetes-whereto](http://tinyurl.com/ubernetes-whereto) ## Introduction @@ -80,7 +81,15 @@ informally become known as_ "Ubernetes"_. ## Summary/TL;DR -TBD +Four primary customer-driven use cases are explored in more detail. +The two highest priority ones relate to High Availability and +Application Portability (between cloud providers, and between +on-premise and cloud providers). + +Four primary federation primitives are identified (location affinity, +cross-cluster scheduling, service discovery and application +migration). Fortunately not all four of these primitives are required +for each primary use case, so incremental development is feasible. ## What exactly is a Kubernetes Cluster? @@ -93,8 +102,7 @@ definition is that each cluster provides: 1. a consistent, cluster-wide resource naming scheme 1. a scheduling/container placement domain 1. a service network routing domain -1. (in future) an authentication and authorization model. -1. .... +1. an authentication and authorization model. The above in turn imply the need for a relatively performant, reliable and cheap network within each cluster. @@ -156,6 +164,9 @@ It seems that most of this boils down to: 1. **cross-cluster migration** (how do compute and storage resources, and the distributed applications to which they belong, move from one cluster to another) +1. **cross-cluster load-balancing** (how does is user traffic directed + to an appropriate cluster?) +1. **cross-cluster monitoring and auditing** (a.k.a. Unified Visibility) ## 2. Sensitive Workloads @@ -165,23 +176,27 @@ automatically diverted to run in my secure, on-premise cluster(s). The list of privacy-sensitive workloads changes over time, and they're subject to external auditing."_ -**Clarifying questions:** What kinds of rules determine which - workloads go where? Is a static mapping from container (or more - typically, replication controller) to cluster maintained and - enforced? If so, is it only enforced on startup, or are things - migrated between clusters when the mappings change? This starts to - look quite similar to "1. Capacity Overflow", and again seems to - boil down to: +**Clarifying questions:** +1. What kinds of rules determine which +workloads go where? + 1. Is there in fact a requirement to have these rules be + declaratively expressed and automatically enforced, or is it + acceptable/better to have users manually select where to run + their workloads when starting them? + 1. Is a static mapping from container (or more typically, + replication controller) to cluster maintained and enforced? + 1. If so, is it only enforced on startup, or are things migrated + between clusters when the mappings change? + +This starts to look quite similar to "1. Capacity Overflow", and again +seems to boil down to: 1. location affinity 1. cross-cluster scheduling 1. cross-cluster service discovery 1. cross-cluster migration -with the possible addition of: - -+ cross-cluster monitoring and auditing (which is conveniently deemed - to be outside the scope of this document, for the time being at - least) +1. cross-cluster monitoring and auditing +1. cross-cluster load balancing ## 3. Vendor lock-in avoidance @@ -193,12 +208,22 @@ enforce these policy changes across the organization every time this happens. She wants it centrally and automatically enforced, monitored and audited."_ -**Clarifying questions:** Again, I think that this can potentially be +**Clarifying questions:** + +1. How does this relate to other use cases (high availability, +capacity overflow etc), as they may all be across multiple vendors. +It's probably not strictly speaking a separate +use case, but it's brought up so often as a requirement, that it's +worth calling out explicitly. +1. Is a useful intermediate step to make it as simple as possible to + migrate an application from one vendor to another in a one-off fashion? + +Again, I think that this can probably be reformulated as a Capacity Overflow problem - the fundamental principles seem to be the same or substantially similar to those above. -## 4. "Unavailability Zones" +## 4. "High Availability" _"I want to be immune to any single data centre or cloud availability zone outage, so I want to spread my service across multiple such zones @@ -206,14 +231,20 @@ zone outage, so I want to spread my service across multiple such zones service remain available even if one of the availability zones or cloud providers "goes down"_. -It seems useful to split this into two sub use cases: +It seems useful to split this into multiple sets of sub use cases: 1. Multiple availability zones within a single cloud provider (across which feature sets like private networks, load balancing, persistent disks, data snapshots etc are typically consistent and explicitly designed to inter-operate). -1. Multiple cloud providers (typically with inconsistent feature sets - and more limited interoperability). + 1.1. within the same geographical region (e.g. metro) within which network + is fast and cheap enough to be almost analogous to a single data + center. + 1.1. across multiple geographical regions, where high network cost and + poor network performance may be prohibitive. +1. Multiple cloud providers (typically with inconsistent feature sets, + more limited interoperability, and typically no cheap inter-cluster + networking described above). The single cloud provider case might be easier to implement (although the multi-cloud provider implementation should just work for a single @@ -251,20 +282,15 @@ initial implementation targeting single cloud provider only. traffic? Either: 1. I constantly over-provision all clusters by 1/n (potentially expensive), or -1. I "manually" update my replica count configurations in the +1. I "manually" (or automatically) update my replica count configurations in the remaining clusters by 1/n when the failure occurs, and Kubernetes takes care of the rest for me, or -1. Auto-scaling (not yet available) in the remaining clusters takes +1. Auto-scaling in the remaining clusters takes care of it for me automagically as the additional failed-over - traffic arrives (with some latency). -1. I manually specify "additional resources to be provisioned" per - remaining cluster, possibly proportional to both the remaining functioning resources - and the unavailable resources in the failed cluster(s). - (All the benefits of over-provisioning, without expensive idle resources.) - -Doing nothing (i.e. forcing users to choose between 1 and 2 on their -own) is probably an OK starting point. Kubernetes autoscaling can get -us to 3 at some later date. + traffic arrives (with some latency). Note that this implies that + the cloud provider keeps the necessary resources on hand to + accommodate such auto-scaling (e.g. via something similar to AWS reserved + and spot instances) Up to this point, this use case ("Unavailability Zones") seems materially different from all the others above. It does not require dynamic cross-cluster service migration (we assume that the service is already running in more than one cluster when the failure occurs). Nor does it necessarily involve cross-cluster service discovery or location affinity. As a result, I propose that we address this use case somewhat independently of the others (although I strongly suspect that it will become substantially easier once we've solved the others). @@ -322,7 +348,37 @@ location affinity: (other than the source of YouTube videos, which is assumed to be equally remote from all clusters in this example). Each pod can be scheduled independently, in any cluster, and moved at any time. -1. **"Preferentially Coupled"**: Somewhere between Coupled and Decoupled. These applications prefer to have all of their pods located in the same cluster (e.g. for failure correlation, network latency or bandwidth cost reasons), but can tolerate being partitioned for "short" periods of time (for example while migrating the application from one cluster to another). Most small to medium sized LAMP stacks with not-very-strict latency goals probably fall into this category (provided that they use sane service discovery and reconnect-on-fail, which they need to do anyway to run effectively, even in a single Kubernetes cluster). +1. **"Preferentially Coupled"**: Somewhere between Coupled and + Decoupled. These applications prefer to have all of their pods + located in the same cluster (e.g. for failure correlation, network + latency or bandwidth cost reasons), but can tolerate being + partitioned for "short" periods of time (for example while + migrating the application from one cluster to another). Most small + to medium sized LAMP stacks with not-very-strict latency goals + probably fall into this category (provided that they use sane + service discovery and reconnect-on-fail, which they need to do + anyway to run effectively, even in a single Kubernetes cluster). + +From a fault isolation point of view, there are also opposites of the +above. For example a master database and it's slave replica might +need to be in different availability zones. We'll refer to this a +anti-affinity, although it is largely outside the scope of this +document. + +Note that there is somewhat of a continuum with respect to network +cost and quality between any two nodes, ranging from two nodes on the +same L2 network segment (lowest latency and cost, highest bandwidth) +to two nodes on different continents (highest latency and cost, lowest +bandwidth). One interesting point on that continuum relates to +multiple availability zones within a well-connected metro or region +and single cloud provider. Despite being in different data centers, +or areas within a mega data center, network in this case is often very fast +and effectively free or very cheap. For the purposes of this network location +affinity discussion, this case is considered analogous to a single +availability zone. Furthermore, if a given application doesn't fit +cleanly into one of the above, shoe-horn it into the best fit, +defaulting to the "Strictly Coupled and Immovable" bucket if you're +not sure. And then there's what I'll call _absolute_ location affinity. Some applications are required to run in bounded geographical or network @@ -341,14 +397,23 @@ of our users are in Western Europe, U.S. West Coast" etc). ## Cross-cluster service discovery -I propose having pods use standard discovery methods used by external clients of Kubernetes applications (i.e. DNS). DNS might resolve to a public endpoint in the local or a remote cluster. Other than Strictly Coupled applications, software should be largely oblivious of which of the two occurs. +I propose having pods use standard discovery methods used by external +clients of Kubernetes applications (i.e. DNS). DNS might resolve to a +public endpoint in the local or a remote cluster. Other than Strictly +Coupled applications, software should be largely oblivious of which of +the two occurs. + _Aside:_ How do we avoid "tromboning" through an external VIP when DNS resolves to a public IP on the local cluster? Strictly speaking this -would be an optimization, and probably only matters to high bandwidth, -low latency communications. We could potentially eliminate the -trombone with some kube-proxy magic if necessary. More detail to be -added here, but feel free to shoot down the basic DNS idea in the mean -time. +would be an optimization for some cases, and probably only matters to +high-bandwidth, low-latency communications. We could potentially +eliminate the trombone with some kube-proxy magic if necessary. More +detail to be added here, but feel free to shoot down the basic DNS +idea in the mean time. In addition, some applications rely on private +networking between clusters for security (e.g. AWS VPC or more +generally VPN). It should not be necessary to forsake this in +order to use Ubernetes, for example by being forced to use public +connectivity between clusters. ## Cross-cluster Scheduling @@ -367,10 +432,23 @@ to be able to: controller to sanely split the request. Similarly, knowledge of the properties of the application (Location Affinity class -- Strictly Coupled, Strictly Decoupled etc, privacy class etc) will - be required. + be required. It is also conceivable that knowledge of service + SLAs and monitoring thereof might provide an input into + scheduling/placement algorithms. 1. Multiplex the responses from the individual clusters into an aggregate response. +There is of course a lot of detail still missing from this section, +including discussion of: +1. admission control, +1. initial placement of instances of a new +service vs scheduling new instances of an existing service in response +to auto-scaling, +1. rescheduling pods due to failure (response might be +different depending on if it's failure of a node, rack, or whole AZ), +1. data placement relative to compute capacity, +etc. + ## Cross-cluster Migration Again this is closely related to location affinity discussed above, @@ -382,20 +460,30 @@ such events include: 1. A low capacity event in a cluster (or a cluster failure). 1. A change of scheduling policy ("we no longer use cloud provider X"). -1. A change of resource pricing ("cloud provider Y dropped their prices - lets migrate there"). - -Strictly Decoupled applications can be trivially moved, in part or in whole, one pod at a time, to one or more clusters. -For Preferentially Decoupled applications, the federation system must first locate a single cluster with sufficient capacity to accommodate the entire application, then reserve that capacity, and incrementally move the application, one (or more) resources at a time, over to the new cluster, within some bounded time period (and possibly within a predefined "maintenance" window). -Strictly Coupled applications (with the exception of those deemed -completely immovable) require the federation system to: +1. A change of resource pricing ("cloud provider Y dropped their + prices - lets migrate there"). + +Strictly Decoupled applications can be trivially moved, in part or in +whole, one pod at a time, to one or more clusters (within applicable +policy constraints, for example "PrivateCloudOnly"). + +For Preferentially Decoupled applications, the federation system must +first locate a single cluster with sufficient capacity to accommodate +the entire application, then reserve that capacity, and incrementally +move the application, one (or more) resources at a time, over to the +new cluster, within some bounded time period (and possibly within a +predefined "maintenance" window). Strictly Coupled applications (with +the exception of those deemed completely immovable) require the +federation system to: 1. start up an entire replica application in the destination cluster -1. copy persistent data to the new application instance -1. switch traffic across +1. copy persistent data to the new application instance (possibly + before starting pods) +1. switch user traffic across 1. tear down the original application instance -It is proposed that support for automated migration of Strictly Coupled applications be -deferred to a later date. +It is proposed that support for automated migration of Strictly +Coupled applications be deferred to a later date. ## Other Requirements @@ -404,36 +492,123 @@ These are often left implicit by customers, but are worth calling out explicitly 1. Software failure isolation between Kubernetes clusters should be retained as far as is practically possible. The federation system should not materially increase the failure correlation across - clusters. For this reason the federation system should ideally be - completely independent of the Kubernetes cluster control software, - and look just like any other Kubernetes API client, with no special - treatment. If the federation system fails catastrophically, the - underlying Kubernetes clusters should remain independently usable. + clusters. For this reason the federation control plane software + should ideally be completely independent of the Kubernetes cluster + control software, and look just like any other Kubernetes API + client, with no special treatment. If the federation control plane + software fails catastrophically, the underlying Kubernetes clusters + should remain independently usable. 1. Unified monitoring, alerting and auditing across federated Kubernetes clusters. 1. Unified authentication, authorization and quota management across clusters (this is in direct conflict with failure isolation above, so there are some tough trade-offs to be made here). -## Proposed High-Level Architecture - -TBD: All very hand-wavey still, but some initial thoughts to get the conversation going... +## Proposed High-Level Architectures + +Two distinct potential architectural approaches have emerged from discussions +thus far: + +1. An explicitly decoupled and hierarchical architecture, where the + Federation Control Plane sits logically above a set of independent + Kubernetes clusters, each of which is (potentially) unaware of the + other clusters, and of the Federation Control Plane itself (other + than to the extent that it is an API client much like any other). + One possible example of this general architecture is illustrated + below, and will be referred to as the "Decoupled, Hierarchical" + approach. +1. A more monolithic architecture, where a single instance of the + Kubernetes control plane itself manages a single logical cluster + composed of nodes in multiple availablity zones and cloud + providers. + +A very brief, non-exhaustive list of pro's and con's of the two +approaches follows. (In the interest of full disclosure, the author +prefers the Decoupled Hierarchical model for the reasons stated below). + +1. **Failure isolation:** The Decoupled Hierarchical approach provides + better failure isolation than the Monolithic approach, as each + underlying Kubernetes cluster, and the Federation Control Plane, + can operate and fail completely independently of each other. In + particular, their software and configurations can be updated + independently. Such updates are, in our experience, the primary + cause of control-plane failures, in general. +1. **Failure probability:** The Decoupled Hierarchical model incorporates + numerically more independent pieces of software and configuration + than the Monolithic one. But the complexity of each of these + decoupled pieces is arguably better contained in the Decoupled + model (per standard arguments for modular rather than monolithic + software design). Which of the two models presents higher + aggregate complexity and consequent failure probability remains + somewhat of an open question. +1. **Scalability:** Conceptually the Decoupled Hierarchical model wins + here, as each underlying Kubernetes cluster can be scaled + completely independently w.r.t. scheduling, node state management, + monitoring, network connectivity etc. It is even potentially + feasible to stack "Ubernetes" federated clusters (i.e. create + federations of federations) should scalability of the independent + Federation Control Plane become an issue (although the author does + not envision this being a problem worth solving in the short + term). +1. **Code complexity:** I think that an argument can be made both ways + here. It depends on whether you prefer to weave the logic for + handling nodes in multiple availability zones and cloud providers + within a single logical cluster into the existing Kubernetes + control plane code base (which was explicitly not designed for + this), or separate it into a decoupled Federation system (with + possible code sharing between the two via shared libraries). The + author prefers the latter because it: + 1. Promotes better code modularity and interface design. + 1. Allows the code + bases of Kubernetes and the Federation system to progress + largely independently (different sets of developers, different + release schedules etc). +1. **Administration complexity:** Again, I think that this could be argued + both ways. Superficially it woud seem that administration of a + single Monolithic multi-zone cluster might be simpler by virtue of + being only "one thing to manage", however in practise each of the + underlying availability zones (and possibly cloud providers) has + it's own capacity, pricing, hardware platforms, and possibly + bureaucratic boudaries (e.g. "our EMEA IT department manages those + European clusters"). So explicitly allowing for (but not + mandating) completely independent administration of each + underlying Kubernetes cluster, and the Federation system itself, + in the Decoupled Hierarchical model seems to have real practical + benefits that outweigh the superficial simplicity of the + Monolithic model. +1. **Application development and deployment complexity:** It's not clear + to me that there is any significant difference between the two + models in this regard. Presumably the API exposed by the two + different architectures would look very similar, as would the + behavior of the deployed applications. It has even been suggested + to write the code in such a way that it could be run in either + configuration. It's not clear that this makes sense in practise + though. +1. **Control plane cost overhead:** There is a minimum per-cluster + overhead -- two possibly virtual machines, or more for redundant HA + deployments. For deployments of very small Kubernetes + clusters with the Decoupled Hierarchical approach, this cost can + become significant. + +### The Decoupled, Hierarchical Approach - Illustrated ![image](federation-high-level-arch.png) ## Ubernetes API -This looks a lot like the existing Kubernetes API but is explicitly multi-cluster. - -+ Clusters become first class objects, which can be registered, listed, described, deregistered etc via the API. -+ Compute resources can be explicitly requested in specific clusters, or automatically scheduled to the "best" cluster by Ubernetes (by a pluggable Policy Engine). -+ There is a federated equivalent of a replication controller type, which is multicluster-aware, and delegates to cluster-specific replication controllers as required (e.g. a federated RC for n replicas might simply spawn multiple replication controllers in different clusters to do the hard work). -+ These federated replication controllers (and in fact all the - services comprising the Ubernetes Control Plane) have to run - somewhere. For high availability Ubernetes deployments, these - services may run in a dedicated Kubernetes cluster, not physically - co-located with any of the federated clusters. But for simpler - deployments, they may be run in one of the federated clusters (but - when that cluster goes down, Ubernetes is down, obviously). +It is proposed that this look a lot like the existing Kubernetes API +but be explicitly multi-cluster. + ++ Clusters become first class objects, which can be registered, + listed, described, deregistered etc via the API. ++ Compute resources can be explicitly requested in specific clusters, + or automatically scheduled to the "best" cluster by Ubernetes (by a + pluggable Policy Engine). ++ There is a federated equivalent of a replication controller type (or + perhaps a [deployment](deployment.md)), + which is multicluster-aware, and delegates to cluster-specific + replication controllers/deployments as required (e.g. a federated RC for n + replicas might simply spawn multiple replication controllers in + different clusters to do the hard work). ## Policy Engine and Migration/Replication Controllers @@ -453,6 +628,37 @@ Either that, or we end up with multilevel auth. Local readonly eventually consistent auth slaves in each cluster and in Ubernetes could potentially cache auth, to mitigate an SPOF auth system. +## Data consistency, failure and availability characteristics + +The services comprising the Ubernetes Control Plane) have to run + somewhere. Several options exist here: +* For high availability Ubernetes deployments, these + services may run in either: + * a dedicated Kubernetes cluster, not co-located in the same + availability zone with any of the federated clusters (for fault + isolation reasons). If that cluster/availability zone, and hence the Federation + system, fails catastrophically, the underlying pods and + applications continue to run correctly, albeit temporarily + without the Federation system. + * across multiple Kubernetes availability zones, probably with + some sort of cross-AZ quorum-based store. This provides + theoretically higher availability, at the cost of some + complexity related to data consistency across multiple + availability zones. + * For simpler, less highly available deployments, just co-locate the + Federation control plane in/on/with one of the underlying + Kubernetes clusters. The downside of this approach is that if + that specific cluster fails, all automated failover and scaling + logic which relies on the federation system will also be + unavailable at the same time (i.e. precisely when it is needed). + But if one of the other federated clusters fails, everything + should work just fine. + +There is some further thinking to be done around the data consistency + model upon which the Federation system is based, and it's impact + on the detailed semantics, failure and availability + characteristics of the system. + ## Proposed Next Steps Identify concrete applications of each use case and configure a proof @@ -463,7 +669,8 @@ Load Balancer or Google Cloud Load Balancer pointing at them? What does the zookeeper config look like for N=3 across 3 AZs -- and how does each replica find the other replicas and how do clients find their primary zookeeper replica? And now how do I do a shared, highly -available redis database? +available redis database? Use a few common specific use cases like +this to flesh out the detailed API and semantics of Ubernetes. -- cgit v1.2.3 From 26ce48d939eaaf9d7c894ee5d0e3793d86b2ef1d Mon Sep 17 00:00:00 2001 From: Jerzy Szczepkowski Date: Wed, 19 Aug 2015 11:15:36 +0200 Subject: Comments applied. --- horizontal-pod-autoscaler.md | 39 ++++++++++++++++++++++++--------------- 1 file changed, 24 insertions(+), 15 deletions(-) diff --git a/horizontal-pod-autoscaler.md b/horizontal-pod-autoscaler.md index 91211793..47a69b2d 100644 --- a/horizontal-pod-autoscaler.md +++ b/horizontal-pod-autoscaler.md @@ -47,17 +47,23 @@ This proposal supersedes [autoscaling.md](http://releases.k8s.io/release-1.0/doc The usage of a serving application usually vary over time: sometimes the demand for the application rises, and sometimes it drops. -In the version 1.0, a user can only manually set the number of serving pods. -Our aim is to provide a mechanism for the automatic adjustment of the number of pods basing on usage statistics. +In Kubernetes version 1.0, a user can only manually set the number of serving pods. +Our aim is to provide a mechanism for the automatic adjustment of the number of pods based on usage statistics. ## Scale Subresource -We are going to introduce Scale subresource and implement horizontal autoscaling of pods on a base of it. +We are going to introduce Scale subresource and implement horizontal autoscaling of pods based on it. Scale subresource will be supported for replication controllers and deployments. +Scale subresource will be a Virtual Resource (will not be stored in etcd as a separate object). +It will be only present in API as an interface to accessing replication controller or deployment, +and the values of Scale fields will be inferred from the corresponing replication controller/deployment object. HorizontalPodAutoscaler object will be bound with exactly one Scale subresource and will be autoscaling associated replication controller/deployment through it. +The main advantage of such approach is that whenever we introduce another type we want to auto-scale, +we just need to implement Scale subresource for it (w/o modifying autoscaler code or API). +The wider discussion regarding Scale took place in [#1629](https://github.com/GoogleCloudPlatform/kubernetes/issues/1629). -Scale subresource will be present for replication controller or deployment under the following paths: +Scale subresource will be present in API for replication controller or deployment under the following paths: ```api/vX/replicationcontrollers/myrc/scale``` @@ -95,10 +101,10 @@ type ScaleStatus struct { ``` -Writing ```ScaleSpec.Count``` will resize the replication controller/deployment associated with +Writing ```ScaleSpec.Replicas``` will resize the replication controller/deployment associated with the given Scale subresource. -```ScaleStatus.Count``` will report how many pods are currently running in the replication controller/deployment, -and ```ScaleStatus.PodSelector``` will return selector for the pods. +```ScaleStatus.Replicas``` will report how many pods are currently running in the replication controller/deployment, +and ```ScaleStatus.Selector``` will return selector for the pods. ## HorizontalPodAutoscaler Object @@ -135,7 +141,7 @@ type HorizontalPodAutoscalerSpec struct { MaxCount int // Target is the target average consumption of the given resource that the autoscaler will try // to maintain by adjusting the desired number of pods. - // Currently two types of resources are supported: "cpu" and "memory". + // Currently this can be either "cpu" or "memory". Target ResourceConsumption } @@ -202,7 +208,7 @@ the current number, while scale-down will happen only when the ceiling of ```Tar the current number. The decision to scale-up will be executed instantly. -However, we will execute scale-down only if the sufficient time has past from the last scale-up (e.g.: 10 minutes). +However, we will execute scale-down only if the sufficient time has passed from the last scale-up (e.g.: 10 minutes). Such approach has two benefits: * Autoscaler works in a conservative way. @@ -229,35 +235,38 @@ Therefore, we decided to give absolute values for the target metrics in the init Please note that when custom metrics are supported, it will be possible to create additional metrics in heapster that will divide CPU/memory consumption by resource request/limit. From autoscaler point of view the metrics will be absolute, -althoug such metrics will be bring the benefits of relative metrics to the user. +although such metrics will be bring the benefits of relative metrics to the user. ## Support in kubectl -To make manipulation on HorizontalPodAutoscaler object simpler, we will add support for creating/updating/deletion/listing of HorizontalPodAutoscaler to kubectl. +To make manipulation on HorizontalPodAutoscaler object simpler, we will add support for +creating/updating/deletion/listing of HorizontalPodAutoscaler to kubectl. In addition, we will add kubectl support for the following use-cases: * When running an image with ```kubectl run```, there should be an additional option to create an autoscaler for it. * When creating a replication controller or deployment with ```kubectl create [-f]```, there should be a possibility to specify an additional autoscaler object. + (This should work out-of-the-box when creation of autoscaler is supported by kubectl as we may include + multiple objects in the same config file). * We will and a new command ```kubectl autoscale``` that will allow for easy creation of an autoscaler object for already existing replication controller/deployment. -## Future Features +## Next steps We list here some features that will not be supported in the initial version of autoscaler. However, we want to keep them in mind, as they will most probably be needed in future. Our design is in general compatible with them. -* Autoscale pods on a base of metrics different than CPU & memory (e.g.: network traffic, qps). +* Autoscale pods based on metrics different than CPU & memory (e.g.: network traffic, qps). This includes scaling based on a custom metric. -* Autoscale pods on a base of multiple metrics. +* Autoscale pods based on multiple metrics. If the target numbers of pods for different metrics are different, choose the largest target number of pods. * Scale the number of pods starting from 0: all pods can be turned-off, and then turned-on when there is a demand for them. When a request to service with no pods arrives, kube-proxy will generate an event for autoscaler to create a new pod. Discussed in [#3247](https://github.com/GoogleCloudPlatform/kubernetes/issues/3247). -* When scaling down, make more educated decision which pods to kill (e.g.: kill pods that doubled-up first). +* When scaling down, make more educated decision which pods to kill (e.g.: if two or more pods are on the same node, kill one of them). Discussed in [#4301](https://github.com/GoogleCloudPlatform/kubernetes/issues/4301). * Allow rule based autoscaling: instead of specifying the target value for metric, specify a rule, e.g.: “if average CPU consumption of pod is higher than 80% add two more replicas”. -- cgit v1.2.3 From 08be1edd38f5795fc767e551b7e0606f835eca99 Mon Sep 17 00:00:00 2001 From: Piotr Szczesniak Date: Mon, 24 Aug 2015 11:00:00 +0200 Subject: Added possible improvement to Initial Resources proposal --- initial-resources.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/initial-resources.md b/initial-resources.md index efd7e2e1..8ab2a46c 100644 --- a/initial-resources.md +++ b/initial-resources.md @@ -77,7 +77,7 @@ If there is still no data the default value will be set by LimitRanger. Same par #### Example -For example, if we have at least 60 samples from image:tag over the past 7 days, we will use the 90th percentile of all of the samples of image:tag over the past 7 days. +If we have at least 60 samples from image:tag over the past 7 days, we will use the 90th percentile of all of the samples of image:tag over the past 7 days. Otherwise, if we have at least 60 samples from image:tag over the past 30 days, we will use the 90th percentile of all of the samples over of image:tag the past 30 days. Otherwise, if we have at least 1 sample from image over the past 30 days, we will use that the 90th percentile of all of the samples of image over the past 30 days. Otherwise we will use default value. @@ -99,6 +99,7 @@ The first version will be quite simple so there is a lot of possible improvement and should be introduced shortly after the first version is done: * observe OOM and then react to it by increasing estimation +* add possibility to specify if estimation should be made, possibly as ```InitialResourcesPolicy``` with options: *always*, *if-not-set*, *never* * add other features to the model like *namespace* * remember predefined values for the most popular images like *mysql*, *nginx*, *redis*, etc. * dry mode, which allows to ask system for resource recommendation for a container without running it -- cgit v1.2.3 From 66a9ff2d9b98120b3c9afe832f72e35dc22d301a Mon Sep 17 00:00:00 2001 From: dinghaiyang Date: Thu, 20 Aug 2015 22:15:21 +0800 Subject: Repalce limits with requests in scheduler documentation. Due to #11713 --- scheduler.md | 6 +++--- scheduler_algorithm.md | 6 +++--- 2 files changed, 6 insertions(+), 6 deletions(-) mode change 100644 => 100755 scheduler.md mode change 100644 => 100755 scheduler_algorithm.md diff --git a/scheduler.md b/scheduler.md old mode 100644 new mode 100755 index b2a137d5..c9d32aa4 --- a/scheduler.md +++ b/scheduler.md @@ -42,13 +42,13 @@ indicating where the Pod should be scheduled. The scheduler tries to find a node for each Pod, one at a time, as it notices these Pods via watch. There are three steps. First it applies a set of "predicates" that filter out -inappropriate nodes. For example, if the PodSpec specifies resource limits, then the scheduler +inappropriate nodes. For example, if the PodSpec specifies resource requests, then the scheduler will filter out nodes that don't have at least that much resources available (computed -as the capacity of the node minus the sum of the resource limits of the containers that +as the capacity of the node minus the sum of the resource requests of the containers that are already running on the node). Second, it applies a set of "priority functions" that rank the nodes that weren't filtered out by the predicate check. For example, it tries to spread Pods across nodes while at the same time favoring the least-loaded -nodes (where "load" here is sum of the resource limits of the containers running on the node, +nodes (where "load" here is sum of the resource requests of the containers running on the node, divided by the node's capacity). Finally, the node with the highest priority is chosen (or, if there are multiple such nodes, then one of them is chosen at random). The code diff --git a/scheduler_algorithm.md b/scheduler_algorithm.md old mode 100644 new mode 100755 index ab8e69ef..7964ab33 --- a/scheduler_algorithm.md +++ b/scheduler_algorithm.md @@ -37,10 +37,10 @@ For each unscheduled Pod, the Kubernetes scheduler tries to find a node across t ## Filtering the nodes -The purpose of filtering the nodes is to filter out the nodes that do not meet certain requirements of the Pod. For example, if the free resource on a node (measured by the capacity minus the sum of the resource limits of all the Pods that already run on the node) is less than the Pod's required resource, the node should not be considered in the ranking phase so it is filtered out. Currently, there are several "predicates" implementing different filtering policies, including: +The purpose of filtering the nodes is to filter out the nodes that do not meet certain requirements of the Pod. For example, if the free resource on a node (measured by the capacity minus the sum of the resource requests of all the Pods that already run on the node) is less than the Pod's required resource, the node should not be considered in the ranking phase so it is filtered out. Currently, there are several "predicates" implementing different filtering policies, including: - `NoDiskConflict`: Evaluate if a pod can fit due to the volumes it requests, and those that are already mounted. -- `PodFitsResources`: Check if the free resource (CPU and Memory) meets the requirement of the Pod. The free resource is measured by the capacity minus the sum of limits of all Pods on the node. +- `PodFitsResources`: Check if the free resource (CPU and Memory) meets the requirement of the Pod. The free resource is measured by the capacity minus the sum of requests of all Pods on the node. To learn more about the resource QoS in Kubernetes, please check [QoS proposal](../proposals/resource-qos.md). - `PodFitsPorts`: Check if any HostPort required by the Pod is already occupied on the node. - `PodFitsHost`: Filter out all nodes except the one specified in the PodSpec's NodeName field. - `PodSelectorMatches`: Check if the labels of the node match the labels specified in the Pod's `nodeSelector` field ([Here](../user-guide/node-selection/) is an example of how to use `nodeSelector` field). @@ -58,7 +58,7 @@ After the scores of all nodes are calculated, the node with highest score is cho Currently, Kubernetes scheduler provides some practical priority functions, including: -- `LeastRequestedPriority`: The node is prioritized based on the fraction of the node that would be free if the new Pod were scheduled onto the node. (In other words, (capacity - sum of limits of all Pods already on the node - limit of Pod that is being scheduled) / capacity). CPU and memory are equally weighted. The node with the highest free fraction is the most preferred. Note that this priority function has the effect of spreading Pods across the nodes with respect to resource consumption. +- `LeastRequestedPriority`: The node is prioritized based on the fraction of the node that would be free if the new Pod were scheduled onto the node. (In other words, (capacity - sum of requests of all Pods already on the node - request of Pod that is being scheduled) / capacity). CPU and memory are equally weighted. The node with the highest free fraction is the most preferred. Note that this priority function has the effect of spreading Pods across the nodes with respect to resource consumption. - `CalculateNodeLabelPriority`: Prefer nodes that have the specified label. - `BalancedResourceAllocation`: This priority function tries to put the Pod on a node such that the CPU and Memory utilization rate is balanced after the Pod is deployed. - `CalculateSpreadPriority`: Spread Pods by minimizing the number of Pods belonging to the same service on the same node. -- cgit v1.2.3 From 96988acedbfedc32087610a04d4b8fb6ead25b4e Mon Sep 17 00:00:00 2001 From: derekwaynecarr Date: Mon, 10 Aug 2015 12:22:44 -0400 Subject: Document need to run generated deep copy --- api_changes.md | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/api_changes.md b/api_changes.md index 687af00a..5c2c4a2a 100644 --- a/api_changes.md +++ b/api_changes.md @@ -297,6 +297,22 @@ generator to create it from scratch. Unsurprisingly, adding manually written conversion also requires you to add tests to `pkg/api//conversion_test.go`. +## Edit deep copy files + +At this point you have both the versioned API changes and the internal +structure changes done. You now need to generate code to handle deep copy +of your versioned api objects. + +The deep copy code resides with each versioned API: + - `pkg/api//deep_copy_generated.go` containing auto-generated copy functions + +To regenerate them: + - run + +```sh +hack/update-generated-deep-copies.sh +``` + ## Update the fuzzer Part of our testing regimen for APIs is to "fuzz" (fill with random values) API -- cgit v1.2.3 From 15509db93f1f3ac79e50bd5e18e34216cbd369c3 Mon Sep 17 00:00:00 2001 From: derekwaynecarr Date: Tue, 11 Aug 2015 11:23:56 -0400 Subject: Remove trailing commas --- namespaces.md | 48 ++++++++++++++++++++++++------------------------ 1 file changed, 24 insertions(+), 24 deletions(-) diff --git a/namespaces.md b/namespaces.md index 596f6f43..bb907c67 100644 --- a/namespaces.md +++ b/namespaces.md @@ -268,16 +268,16 @@ OpenShift creates a Namespace in Kubernetes "kind": "Namespace", "metadata": { "name": "development", + "labels": { + "name": "development" + } }, "spec": { - "finalizers": ["openshift.com/origin", "kubernetes"], + "finalizers": ["openshift.com/origin", "kubernetes"] }, "status": { - "phase": "Active", - }, - "labels": { - "name": "development" - }, + "phase": "Active" + } } ``` @@ -294,16 +294,16 @@ User deletes the Namespace in Kubernetes, and Namespace now has following state: "metadata": { "name": "development", "deletionTimestamp": "..." + "labels": { + "name": "development" + } }, "spec": { - "finalizers": ["openshift.com/origin", "kubernetes"], + "finalizers": ["openshift.com/origin", "kubernetes"] }, "status": { - "phase": "Terminating", - }, - "labels": { - "name": "development" - }, + "phase": "Terminating" + } } ``` @@ -319,16 +319,16 @@ removing *kubernetes* from the list of finalizers: "metadata": { "name": "development", "deletionTimestamp": "..." + "labels": { + "name": "development" + } }, "spec": { - "finalizers": ["openshift.com/origin"], + "finalizers": ["openshift.com/origin"] }, "status": { - "phase": "Terminating", - }, - "labels": { - "name": "development" - }, + "phase": "Terminating" + } } ``` @@ -347,16 +347,16 @@ This results in the following state: "metadata": { "name": "development", "deletionTimestamp": "..." + "labels": { + "name": "development" + } }, "spec": { - "finalizers": [], + "finalizers": [] }, "status": { - "phase": "Terminating", - }, - "labels": { - "name": "development" - }, + "phase": "Terminating" + } } ``` -- cgit v1.2.3 From a80aba14e93201b5dc674e2f0db56cd8aae91772 Mon Sep 17 00:00:00 2001 From: derekwaynecarr Date: Mon, 24 Aug 2015 15:17:34 -0400 Subject: Use singular, make LimitRequestRatio MaxLimitRequestRatio --- admission_control_limit_range.md | 64 ++++++++++++++++++++++------------------ 1 file changed, 35 insertions(+), 29 deletions(-) diff --git a/admission_control_limit_range.md b/admission_control_limit_range.md index 621fd564..e7c706ef 100644 --- a/admission_control_limit_range.md +++ b/admission_control_limit_range.md @@ -53,7 +53,7 @@ The **LimitRange** resource is scoped to a **Namespace**. ### Type ```go -// A type of object that is limited +// LimitType is a type of object that is limited type LimitType string const ( @@ -63,44 +63,50 @@ const ( LimitTypeContainer LimitType = "Container" ) -// LimitRangeItem defines a min/max usage limit for any resource that matches on kind +// LimitRangeItem defines a min/max usage limit for any resource that matches on kind. type LimitRangeItem struct { - // Type of resource that this limit applies to - Type LimitType `json:"type,omitempty" description:"type of resource that this limit applies to"` - // Max usage constraints on this kind by resource name - Max ResourceList `json:"max,omitempty" description:"max usage constraints on this kind by resource name"` - // Min usage constraints on this kind by resource name - Min ResourceList `json:"min,omitempty" description:"min usage constraints on this kind by resource name"` - // Default resource limits on this kind by resource name - Default ResourceList `json:"default,omitempty" description:"default resource limits values on this kind by resource name if omitted"` - // DefaultRequests resource requests on this kind by resource name - DefaultRequests ResourceList `json:"defaultRequests,omitempty" description:"default resource requests values on this kind by resource name if omitted"` - // LimitRequestRatio is the ratio of limit over request that is the maximum allowed burst for the named resource - LimitRequestRatio ResourceList `json:"limitRequestRatio,omitempty" description:"the ratio of limit over request that is the maximum allowed burst for the named resource. if specified, the named resource must have a request and limit that are both non-zero where limit divided by request is less than or equal to the enumerated value"` + // Type of resource that this limit applies to. + Type LimitType `json:"type,omitempty"` + // Max usage constraints on this kind by resource name. + Max ResourceList `json:"max,omitempty"` + // Min usage constraints on this kind by resource name. + Min ResourceList `json:"min,omitempty"` + // Default resource requirement limit value by resource name if resource limit is omitted. + Default ResourceList `json:"default,omitempty"` + // DefaultRequest is the default resource requirement request value by resource name if resource request is omitted. + DefaultRequest ResourceList `json:"defaultRequest,omitempty"` + // MaxLimitRequestRatio if specified, the named resource must have a request and limit that are both non-zero where limit divided by request is less than or equal to the enumerated value; this represents the max burst for the named resource. + MaxLimitRequestRatio ResourceList `json:"maxLimitRequestRatio,omitempty"` } -// LimitRangeSpec defines a min/max usage limit for resources that match on kind +// LimitRangeSpec defines a min/max usage limit for resources that match on kind. type LimitRangeSpec struct { - // Limits is the list of LimitRangeItem objects that are enforced - Limits []LimitRangeItem `json:"limits" description:"limits is the list of LimitRangeItem objects that are enforced"` + // Limits is the list of LimitRangeItem objects that are enforced. + Limits []LimitRangeItem `json:"limits"` } -// LimitRange sets resource usage limits for each kind of resource in a Namespace +// LimitRange sets resource usage limits for each kind of resource in a Namespace. type LimitRange struct { - TypeMeta `json:",inline"` - ObjectMeta `json:"metadata,omitempty" description:"standard object metadata; see http://releases.k8s.io/HEAD/docs/devel/api-conventions.md#metadata"` + TypeMeta `json:",inline"` + // Standard object's metadata. + // More info: http://releases.k8s.io/HEAD/docs/devel/api-conventions.md#metadata + ObjectMeta `json:"metadata,omitempty"` - // Spec defines the limits enforced - Spec LimitRangeSpec `json:"spec,omitempty" description:"spec defines the limits enforced; http://releases.k8s.io/HEAD/docs/devel/api-conventions.md#spec-and-status"` + // Spec defines the limits enforced. + // More info: http://releases.k8s.io/HEAD/docs/devel/api-conventions.md#spec-and-status + Spec LimitRangeSpec `json:"spec,omitempty"` } // LimitRangeList is a list of LimitRange items. type LimitRangeList struct { TypeMeta `json:",inline"` - ListMeta `json:"metadata,omitempty" description:"standard list metadata; see http://releases.k8s.io/HEAD/docs/devel/api-conventions.md#metadata"` + // Standard list metadata. + // More info: http://releases.k8s.io/HEAD/docs/devel/api-conventions.md#types-kinds + ListMeta `json:"metadata,omitempty"` - // Items is a list of LimitRange objects - Items []LimitRange `json:"items" description:"items is a list of LimitRange objects; see http://releases.k8s.io/HEAD/docs/design/admission_control_limit_range.md"` + // Items is a list of LimitRange objects. + // More info: http://releases.k8s.io/HEAD/docs/design/admission_control_limit_range.md + Items []LimitRange `json:"items"` } ``` @@ -108,7 +114,7 @@ type LimitRangeList struct { Validation of a **LimitRange** enforces that for a given named resource the following rules apply: -Min (if specified) <= DefaultRequests (if specified) <= Default (if specified) <= Max (if specified) +Min (if specified) <= DefaultRequest (if specified) <= Default (if specified) <= Max (if specified) ### Default Value Behavior @@ -121,11 +127,11 @@ if LimitRangeItem.Default[resourceName] is undefined ``` ``` -if LimitRangeItem.DefaultRequests[resourceName] is undefined +if LimitRangeItem.DefaultRequest[resourceName] is undefined if LimitRangeItem.Default[resourceName] is defined - LimitRangeItem.DefaultRequests[resourceName] = LimitRangeItem.Default[resourceName] + LimitRangeItem.DefaultRequest[resourceName] = LimitRangeItem.Default[resourceName] else if LimitRangeItem.Min[resourceName] is defined - LimitRangeItem.DefaultRequests[resourceName] = LimitRangeItem.Min[resourceName] + LimitRangeItem.DefaultRequest[resourceName] = LimitRangeItem.Min[resourceName] ``` ## AdmissionControl plugin: LimitRanger -- cgit v1.2.3 From c90e062aec5f1d1deb2f2c384c9dc6e65845e5b2 Mon Sep 17 00:00:00 2001 From: Brian Grant Date: Wed, 19 Aug 2015 18:27:54 +0000 Subject: Added more API conventions. --- api-conventions.md | 7 +++-- api_changes.md | 75 +++++++++++++++++++++++++++++++++++++++++------------- 2 files changed, 63 insertions(+), 19 deletions(-) diff --git a/api-conventions.md b/api-conventions.md index 75612820..e68f53c7 100644 --- a/api-conventions.md +++ b/api-conventions.md @@ -33,7 +33,7 @@ Documentation for other releases can be found at API Conventions =============== -Updated: 8/12/2015 +Updated: 8/24/2015 *This document is oriented at users who want a deeper understanding of the Kubernetes API structure, and developers wanting to extend the Kubernetes API. An introduction to @@ -219,7 +219,7 @@ Some resources report the `observedGeneration`, which is the `generation` most r References to loosely coupled sets of objects, such as [pods](../user-guide/pods.md) overseen by a [replication controller](../user-guide/replication-controller.md), are usually best referred to using a [label selector](../user-guide/labels.md). In order to ensure that GETs of individual objects remain bounded in time and space, these sets may be queried via separate API queries, but will not be expanded in the referring object's status. -References to specific objects, especially specific resource versions and/or specific fields of those objects, are specified using the `ObjectReference` type. Unlike partial URLs, the ObjectReference type facilitates flexible defaulting of fields from the referring object or other contextual information. +References to specific objects, especially specific resource versions and/or specific fields of those objects, are specified using the `ObjectReference` type (or other types representing strict subsets of it). Unlike partial URLs, the ObjectReference type facilitates flexible defaulting of fields from the referring object or other contextual information. References in the status of the referee to the referrer may be permitted, when the references are one-to-one and do not need to be frequently updated, particularly in an edge-based manner. @@ -678,11 +678,14 @@ Accumulate repeated events in the client, especially for frequent events, to red ## Naming conventions +* Go field names must be CamelCase. JSON field names must be camelCase. Other than capitalization of the initial letter, the two should almost always match. No underscores nor dashes in either. +* Field and resource names should be declarative, not imperative (DoSomething, SomethingDoer). * `Minion` has been deprecated in favor of `Node`. Use `Node` where referring to the node resource in the context of the cluster. Use `Host` where referring to properties of the individual physical/virtual system, such as `hostname`, `hostPath`, `hostNetwork`, etc. * `FooController` is a deprecated kind naming convention. Name the kind after the thing being controlled instead (e.g., `Job` rather than `JobController`). * The name of a field that specifies the time at which `something` occurs should be called `somethingTime`. Do not use `stamp` (e.g., `creationTimestamp`). * Do not use abbreviations in the API, except where they are extremely commonly used, such as "id", "args", or "stdin". * Acronyms should similarly only be used when extremely commonly known. All letters in the acronym should have the same case, using the appropriate case for the situation. For example, at the beginning of a field name, the acronym should be all lowercase, such as "httpGet". Where used as a constant, all letters should be uppercase, such as "TCP" or "UDP". +* The name of a field referring to another resource of kind `Foo` by name should be called `fooName`. The name of a field referring to another resource of kind `Foo` by ObjectReference (or subset thereof) should be called `fooRef`. ## Label, selector, and annotation conventions diff --git a/api_changes.md b/api_changes.md index 687af00a..72c38b7f 100644 --- a/api_changes.md +++ b/api_changes.md @@ -33,6 +33,13 @@ Documentation for other releases can be found at # So you want to change the API? +Before attempting a change to the API, you should familiarize yourself +with a number of existing API types and with the [API +conventions](api-conventions.md). If creating a new API +type/resource, we also recommend that you first send a PR containing +just a proposal for the new API types, and that you initially target +the experimental API (pkg/expapi). + The Kubernetes API has two major components - the internal structures and the versioned APIs. The versioned APIs are intended to be stable, while the internal structures are implemented to best reflect the needs of the Kubernetes @@ -92,9 +99,12 @@ backward-compatibly. Before talking about how to make API changes, it is worthwhile to clarify what we mean by API compatibility. An API change is considered backward-compatible if it: - * adds new functionality that is not required for correct behavior - * does not change existing semantics - * does not change existing defaults + * adds new functionality that is not required for correct behavior (e.g., + does not add a new required field) + * does not change existing semantics, including: + * default values and behavior + * interpretation of existing API types, fields, and values + * which fields are required and which are not Put another way: @@ -104,11 +114,11 @@ Put another way: degrade behavior) when issued against servers that do not include your change. 3. It must be possible to round-trip your change (convert to different API versions and back) with no loss of information. +4. Existing clients need not be aware of your change in order for them to continue + to function as they did previously, even when your change is utilized If your change does not meet these criteria, it is not considered strictly -compatible. There are times when this might be OK, but mostly we want changes -that meet this definition. If you think you need to break compatibility, you -should talk to the Kubernetes team first. +compatible. Let's consider some examples. In a hypothetical API (assume we're at version v6), the `Frobber` struct looks something like this: @@ -179,14 +189,43 @@ API call might POST an object in API v7beta1 format, which uses the cleaner form (since v7beta1 is "beta"). When the user reads the object back in the v7beta1 API it would be unacceptable to have lost all but `Params[0]`. This means that, even though it is ugly, a compatible change must be made to the v6 -API. +API. However, this is very challenging to do correctly. It generally requires +multiple representations of the same information in the same API resource, which +need to be kept in sync in the event that either is changed. However, if +the new representation is more expressive than the old, this breaks +backward compatibility, since clients that only understood the old representation +would not be aware of the new representation nor its semantics. Examples of +proposals that have run into this challenge include [generalized label +selectors](http://issues.k8s.io/341) and [pod-level security +context](http://prs.k8s.io/12823). As another interesting example, enumerated values provide a unique challenge. Adding a new value to an enumerated set is *not* a compatible change. Clients which assume they know how to handle all possible values of a given field will not be able to handle the new values. However, removing value from an enumerated set *can* be a compatible change, if handled properly (treat the -removed value as deprecated but allowed). +removed value as deprecated but allowed). This is actually a special case of +a new representation, discussed above. + +## Incompatible API changes + +There are times when this might be OK, but mostly we want changes that +meet this definition. If you think you need to break compatibility, +you should talk to the Kubernetes team first. + +Breaking compatibility of a beta or stable API version, such as v1, is unacceptable. +Compatibility for experimental or alpha APIs is not strictly required, but +breaking compatibility should not be done lightly, as it disrupts all users of the +feature. Experimental APIs may be removed. Alpha and beta API versions may be deprecated +and eventually removed wholesale, as described in the [versioning document](../design/versioning.md). +Document incompatible changes across API versions under the [conversion tips](../api.md). + +If your change is going to be backward incompatible or might be a breaking change for API +consumers, please send an announcement to `kubernetes-dev@googlegroups.com` before +the change gets in. If you are unsure, ask. Also make sure that the change gets documented in +the release notes for the next release by labeling the PR with the "release-note" github label. + +If you found that your change accidentally broke clients, it should be reverted. ## Changing versioned APIs @@ -199,10 +238,13 @@ before starting "all the rest". ### Edit types.go The struct definitions for each API are in `pkg/api//types.go`. Edit -those files to reflect the change you want to make. Note that all non-online -fields in versioned APIs must have description tags - these are used to generate +those files to reflect the change you want to make. Note that all types and non-inline +fields in versioned APIs must be preceded by descriptive comments - these are used to generate documentation. +Optional fields should have the `,omitempty` json tag; fields are interpreted as being +required otherwise. + ### Edit defaults.go If your change includes new fields for which you will need default values, you @@ -228,6 +270,12 @@ incompatible change you might or might not want to do this now, but you will have to do more later. The files you want are `pkg/api//conversion.go` and `pkg/api//conversion_test.go`. +Note that the conversion machinery doesn't generically handle conversion of values, +such as various kinds of field references and API constants. [The client +library](../../pkg/client/unversioned/request.go) has custom conversion code for +field references. You also need to add a call to api.Scheme.AddFieldLabelConversionFunc +with a mapping function that understands supported translations. + ## Changing the internal structures Now it is time to change the internal structs so your versioned changes can be @@ -365,13 +413,6 @@ hack/update-swagger-spec.sh The API spec changes should be in a commit separate from your other changes. -## Incompatible API changes - -If your change is going to be backward incompatible or might be a breaking change for API -consumers, please send an announcement to `kubernetes-dev@googlegroups.com` before -the change gets in. If you are unsure, ask. Also make sure that the change gets documented in -`CHANGELOG.md` for the next release. - ## Adding new REST objects TODO(smarterclayton): write this. -- cgit v1.2.3 From 8c06052254007660e92112f188cbcdbfb2023eac Mon Sep 17 00:00:00 2001 From: Ed Costello Date: Tue, 25 Aug 2015 10:47:58 -0400 Subject: Copy edits for typos (resubmitted) --- apiserver-watch.md | 4 ++-- federation.md | 6 +++--- horizontal-pod-autoscaler.md | 8 ++++---- rescheduler.md | 2 +- 4 files changed, 10 insertions(+), 10 deletions(-) diff --git a/apiserver-watch.md b/apiserver-watch.md index 6bc2d33f..b8069030 100644 --- a/apiserver-watch.md +++ b/apiserver-watch.md @@ -60,7 +60,7 @@ the objects (of a given type) without any filtering. The changes delivered from etcd will then be stored in a cache in apiserver. This cache is in fact a "rolling history window" that will support clients having some amount of latency between their list and watch calls. Thus it will have a limited capacity and -whenever a new change comes from etcd when a cache is full, othe oldest change +whenever a new change comes from etcd when a cache is full, the oldest change will be remove to make place for the new one. When a client sends a watch request to apiserver, instead of redirecting it to @@ -159,7 +159,7 @@ necessary. In such case, to avoid LIST requests coming from all watchers at the same time, we can introduce an additional etcd event type: [EtcdResync](../../pkg/storage/etcd/etcd_watcher.go#L36) - Whenever reslisting will be done to refresh the internal watch to etcd, + Whenever relisting will be done to refresh the internal watch to etcd, EtcdResync event will be send to all the watchers. It will contain the full list of all the objects the watcher is interested in (appropriately filtered) as the parameter of this watch event. diff --git a/federation.md b/federation.md index 7b642bb3..34df0aee 100644 --- a/federation.md +++ b/federation.md @@ -518,7 +518,7 @@ thus far: approach. 1. A more monolithic architecture, where a single instance of the Kubernetes control plane itself manages a single logical cluster - composed of nodes in multiple availablity zones and cloud + composed of nodes in multiple availability zones and cloud providers. A very brief, non-exhaustive list of pro's and con's of the two @@ -563,12 +563,12 @@ prefers the Decoupled Hierarchical model for the reasons stated below). largely independently (different sets of developers, different release schedules etc). 1. **Administration complexity:** Again, I think that this could be argued - both ways. Superficially it woud seem that administration of a + both ways. Superficially it would seem that administration of a single Monolithic multi-zone cluster might be simpler by virtue of being only "one thing to manage", however in practise each of the underlying availability zones (and possibly cloud providers) has it's own capacity, pricing, hardware platforms, and possibly - bureaucratic boudaries (e.g. "our EMEA IT department manages those + bureaucratic boundaries (e.g. "our EMEA IT department manages those European clusters"). So explicitly allowing for (but not mandating) completely independent administration of each underlying Kubernetes cluster, and the Federation system itself, diff --git a/horizontal-pod-autoscaler.md b/horizontal-pod-autoscaler.md index 47a69b2d..c10f54f7 100644 --- a/horizontal-pod-autoscaler.md +++ b/horizontal-pod-autoscaler.md @@ -56,7 +56,7 @@ We are going to introduce Scale subresource and implement horizontal autoscaling Scale subresource will be supported for replication controllers and deployments. Scale subresource will be a Virtual Resource (will not be stored in etcd as a separate object). It will be only present in API as an interface to accessing replication controller or deployment, -and the values of Scale fields will be inferred from the corresponing replication controller/deployment object. +and the values of Scale fields will be inferred from the corresponding replication controller/deployment object. HorizontalPodAutoscaler object will be bound with exactly one Scale subresource and will be autoscaling associated replication controller/deployment through it. The main advantage of such approach is that whenever we introduce another type we want to auto-scale, @@ -132,7 +132,7 @@ type HorizontalPodAutoscaler struct { // HorizontalPodAutoscalerSpec is the specification of a horizontal pod autoscaler. type HorizontalPodAutoscalerSpec struct { // ScaleRef is a reference to Scale subresource. HorizontalPodAutoscaler will learn the current - // resource consumption from its status, and will set the desired number of pods by modyfying its spec. + // resource consumption from its status, and will set the desired number of pods by modifying its spec. ScaleRef *SubresourceReference // MinCount is the lower limit for the number of pods that can be set by the autoscaler. MinCount int @@ -151,7 +151,7 @@ type HorizontalPodAutoscalerStatus struct { CurrentReplicas int // DesiredReplicas is the desired number of replicas of pods managed by this autoscaler. - // The number may be different because pod downscaling is someteimes delayed to keep the number + // The number may be different because pod downscaling is sometimes delayed to keep the number // of pods stable. DesiredReplicas int @@ -161,7 +161,7 @@ type HorizontalPodAutoscalerStatus struct { CurrentConsumption ResourceConsumption // LastScaleTimestamp is the last time the HorizontalPodAutoscaler scaled the number of pods. - // This is used by the autoscaler to controll how often the number of pods is changed. + // This is used by the autoscaler to control how often the number of pods is changed. LastScaleTimestamp *util.Time } diff --git a/rescheduler.md b/rescheduler.md index b27b9bfe..88747d08 100644 --- a/rescheduler.md +++ b/rescheduler.md @@ -96,7 +96,7 @@ case, the nodes we move the Pods onto might have been in the system for a long t have been added by the cluster auto-scaler specifically to allow the rescheduler to rebalance utilization. -A second spreading use case is to separate antagnosits. +A second spreading use case is to separate antagonists. Sometimes the processes running in two different Pods on the same node may have unexpected antagonistic behavior towards one another. A system component might monitor for such -- cgit v1.2.3 From b003d62099bfbaed2bf801ed16048ae3a9a57117 Mon Sep 17 00:00:00 2001 From: Ed Costello Date: Tue, 25 Aug 2015 10:47:58 -0400 Subject: Copy edits for typos (resubmitted) --- extending-api.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/extending-api.md b/extending-api.md index cca257bd..bbd02a54 100644 --- a/extending-api.md +++ b/extending-api.md @@ -71,7 +71,7 @@ Kubernetes API server to provide the following features: * Watch for resource changes. The `Kind` for an instance of a third-party object (e.g. CronTab) below is expected to be -programnatically convertible to the name of the resource using +programmatically convertible to the name of the resource using the following conversion. Kinds are expected to be of the form ``, the `APIVersion` for the object is expected to be `//`. @@ -178,7 +178,7 @@ and get back: } ``` -Because all objects are expected to contain standard Kubernetes metdata fileds, these +Because all objects are expected to contain standard Kubernetes metadata fields, these list operations can also use `Label` queries to filter requests down to specific subsets. Likewise, clients can use watch endpoints to watch for changes to stored objects. -- cgit v1.2.3 From 1f3791a4b008ee13c4003ce044f1faee6ab88197 Mon Sep 17 00:00:00 2001 From: Jimmi Dyson Date: Wed, 26 Aug 2015 10:59:03 +0100 Subject: Update fabric8 client library location --- client-libraries.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/client-libraries.md b/client-libraries.md index 9e41688c..b63e2d44 100644 --- a/client-libraries.md +++ b/client-libraries.md @@ -41,8 +41,8 @@ Documentation for other releases can be found at *Note: Libraries provided by outside parties are supported by their authors, not the core Kubernetes team* - * [Java (OSGI)](https://bitbucket.org/amdatulabs/amdatu-kubernetes) - * [Java (Fabric8)](https://github.com/fabric8io/fabric8/tree/master/components/kubernetes-api) + * [Java (OSGi)](https://bitbucket.org/amdatulabs/amdatu-kubernetes) + * [Java (Fabric8, OSGi)](https://github.com/fabric8io/kubernetes-client) * [Ruby](https://github.com/Ch00k/kuber) * [Ruby](https://github.com/abonas/kubeclient) * [PHP](https://github.com/devstub/kubernetes-api-php-client) -- cgit v1.2.3 From d2d04300c597902f646e846bbdb762080b6b7eba Mon Sep 17 00:00:00 2001 From: Phillip Wittrock Date: Wed, 26 Aug 2015 17:22:27 -0700 Subject: Update development godep instructions to work for cadvisor and changing transitive deps --- development.md | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/development.md b/development.md index a266f7cb..fc1de093 100644 --- a/development.md +++ b/development.md @@ -145,6 +145,8 @@ Here's a quick walkthrough of one way to use godeps to add or update a Kubernete 1) Devote a directory to this endeavor: +_Devoting a separate directory is not required, but it is helpful to separate dependency updates from other changes._ + ```sh export KPATH=$HOME/code/kubernetes mkdir -p $KPATH/src/k8s.io/kubernetes @@ -183,10 +185,17 @@ godep save ./... cd $KPATH/src/k8s.io/kubernetes go get -u path/to/dependency # Change code in Kubernetes accordingly if necessary. -godep update path/to/dependency +godep update path/to/dependency/... ``` -5) Before sending your PR, it's a good idea to sanity check that your Godeps.json file is ok by re-restoring: `godep restore` +_If `go get -u path/to/dependency` fails with compilation errors, instead try `go get -d -u path/to/dependency` +to fetch the dependencies without compiling them. This can happen when updating the cadvisor dependency._ + + +5) Before sending your PR, it's a good idea to sanity check that your Godeps.json file is ok by running hack/verify-godeps.sh + +_If hack/verify-godeps.sh fails after a `godep update`, it is possible that a transitive dependency was added or removed but not +updated by godeps. It then may be necessary to perform a `godep save ./...` to pick up the transitive dependency changes._ It is sometimes expedient to manually fix the /Godeps/godeps.json file to minimize the changes. -- cgit v1.2.3 From 816629623e6870d81ca1c41e6f6b61ab78b71fb7 Mon Sep 17 00:00:00 2001 From: Max Forbes Date: Wed, 26 Aug 2015 10:31:58 -0700 Subject: Add patch notes to versioning doc. --- versioning.md | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/versioning.md b/versioning.md index 9009dc59..ede6b450 100644 --- a/versioning.md +++ b/versioning.md @@ -68,6 +68,14 @@ Here is an example major release cycle: It may seem a bit strange to complete the v2 API before v2.0 is released, but *adding* a v2 API is not a breaking change. *Removing* the v2beta\* APIs *is* a breaking change, which is what necessitates the major version bump. There are other ways to do this, but having the major release be the fresh start of that release's API without the baggage of its beta versions seems most intuitive out of the available options. +# Patches + +Patch releases are intended for critical bug fixes to the latest minor version, such as addressing security vulnerabilities, fixes to problems affecting a large number of users, severe problems with no workaround, and blockers for products based on Kubernetes. + +They should not contain miscellaneous feature additions or improvements, and especially no incompatibilities should be introduced between patch versions of the same minor version (or even major version). + +Dependencies, such as Docker or Etcd, should also not be changed unless absolutely necessary, and also just to fix critical bugs (so, at most patch version changes, not new major nor minor versions). + # Upgrades * Users can upgrade from any Kube 1.x release to any other Kube 1.x release as a rolling upgrade across their cluster. (Rolling upgrade means being able to upgrade the master first, then one node at a time. See #4855 for details.) -- cgit v1.2.3 From f78bbabe8344f2f52bc3945c65fee5e4717688da Mon Sep 17 00:00:00 2001 From: Piotr Szczesniak Date: Thu, 27 Aug 2015 10:50:50 +0200 Subject: Revert "LimitRange updates for Resource Requirements Requests" --- admission_control_limit_range.md | 64 ++++++++++++++++++---------------------- 1 file changed, 29 insertions(+), 35 deletions(-) diff --git a/admission_control_limit_range.md b/admission_control_limit_range.md index e7c706ef..621fd564 100644 --- a/admission_control_limit_range.md +++ b/admission_control_limit_range.md @@ -53,7 +53,7 @@ The **LimitRange** resource is scoped to a **Namespace**. ### Type ```go -// LimitType is a type of object that is limited +// A type of object that is limited type LimitType string const ( @@ -63,50 +63,44 @@ const ( LimitTypeContainer LimitType = "Container" ) -// LimitRangeItem defines a min/max usage limit for any resource that matches on kind. +// LimitRangeItem defines a min/max usage limit for any resource that matches on kind type LimitRangeItem struct { - // Type of resource that this limit applies to. - Type LimitType `json:"type,omitempty"` - // Max usage constraints on this kind by resource name. - Max ResourceList `json:"max,omitempty"` - // Min usage constraints on this kind by resource name. - Min ResourceList `json:"min,omitempty"` - // Default resource requirement limit value by resource name if resource limit is omitted. - Default ResourceList `json:"default,omitempty"` - // DefaultRequest is the default resource requirement request value by resource name if resource request is omitted. - DefaultRequest ResourceList `json:"defaultRequest,omitempty"` - // MaxLimitRequestRatio if specified, the named resource must have a request and limit that are both non-zero where limit divided by request is less than or equal to the enumerated value; this represents the max burst for the named resource. - MaxLimitRequestRatio ResourceList `json:"maxLimitRequestRatio,omitempty"` + // Type of resource that this limit applies to + Type LimitType `json:"type,omitempty" description:"type of resource that this limit applies to"` + // Max usage constraints on this kind by resource name + Max ResourceList `json:"max,omitempty" description:"max usage constraints on this kind by resource name"` + // Min usage constraints on this kind by resource name + Min ResourceList `json:"min,omitempty" description:"min usage constraints on this kind by resource name"` + // Default resource limits on this kind by resource name + Default ResourceList `json:"default,omitempty" description:"default resource limits values on this kind by resource name if omitted"` + // DefaultRequests resource requests on this kind by resource name + DefaultRequests ResourceList `json:"defaultRequests,omitempty" description:"default resource requests values on this kind by resource name if omitted"` + // LimitRequestRatio is the ratio of limit over request that is the maximum allowed burst for the named resource + LimitRequestRatio ResourceList `json:"limitRequestRatio,omitempty" description:"the ratio of limit over request that is the maximum allowed burst for the named resource. if specified, the named resource must have a request and limit that are both non-zero where limit divided by request is less than or equal to the enumerated value"` } -// LimitRangeSpec defines a min/max usage limit for resources that match on kind. +// LimitRangeSpec defines a min/max usage limit for resources that match on kind type LimitRangeSpec struct { - // Limits is the list of LimitRangeItem objects that are enforced. - Limits []LimitRangeItem `json:"limits"` + // Limits is the list of LimitRangeItem objects that are enforced + Limits []LimitRangeItem `json:"limits" description:"limits is the list of LimitRangeItem objects that are enforced"` } -// LimitRange sets resource usage limits for each kind of resource in a Namespace. +// LimitRange sets resource usage limits for each kind of resource in a Namespace type LimitRange struct { - TypeMeta `json:",inline"` - // Standard object's metadata. - // More info: http://releases.k8s.io/HEAD/docs/devel/api-conventions.md#metadata - ObjectMeta `json:"metadata,omitempty"` + TypeMeta `json:",inline"` + ObjectMeta `json:"metadata,omitempty" description:"standard object metadata; see http://releases.k8s.io/HEAD/docs/devel/api-conventions.md#metadata"` - // Spec defines the limits enforced. - // More info: http://releases.k8s.io/HEAD/docs/devel/api-conventions.md#spec-and-status - Spec LimitRangeSpec `json:"spec,omitempty"` + // Spec defines the limits enforced + Spec LimitRangeSpec `json:"spec,omitempty" description:"spec defines the limits enforced; http://releases.k8s.io/HEAD/docs/devel/api-conventions.md#spec-and-status"` } // LimitRangeList is a list of LimitRange items. type LimitRangeList struct { TypeMeta `json:",inline"` - // Standard list metadata. - // More info: http://releases.k8s.io/HEAD/docs/devel/api-conventions.md#types-kinds - ListMeta `json:"metadata,omitempty"` + ListMeta `json:"metadata,omitempty" description:"standard list metadata; see http://releases.k8s.io/HEAD/docs/devel/api-conventions.md#metadata"` - // Items is a list of LimitRange objects. - // More info: http://releases.k8s.io/HEAD/docs/design/admission_control_limit_range.md - Items []LimitRange `json:"items"` + // Items is a list of LimitRange objects + Items []LimitRange `json:"items" description:"items is a list of LimitRange objects; see http://releases.k8s.io/HEAD/docs/design/admission_control_limit_range.md"` } ``` @@ -114,7 +108,7 @@ type LimitRangeList struct { Validation of a **LimitRange** enforces that for a given named resource the following rules apply: -Min (if specified) <= DefaultRequest (if specified) <= Default (if specified) <= Max (if specified) +Min (if specified) <= DefaultRequests (if specified) <= Default (if specified) <= Max (if specified) ### Default Value Behavior @@ -127,11 +121,11 @@ if LimitRangeItem.Default[resourceName] is undefined ``` ``` -if LimitRangeItem.DefaultRequest[resourceName] is undefined +if LimitRangeItem.DefaultRequests[resourceName] is undefined if LimitRangeItem.Default[resourceName] is defined - LimitRangeItem.DefaultRequest[resourceName] = LimitRangeItem.Default[resourceName] + LimitRangeItem.DefaultRequests[resourceName] = LimitRangeItem.Default[resourceName] else if LimitRangeItem.Min[resourceName] is defined - LimitRangeItem.DefaultRequest[resourceName] = LimitRangeItem.Min[resourceName] + LimitRangeItem.DefaultRequests[resourceName] = LimitRangeItem.Min[resourceName] ``` ## AdmissionControl plugin: LimitRanger -- cgit v1.2.3 From cda298a5a0c75419481754b80077d77fbf23b7e8 Mon Sep 17 00:00:00 2001 From: Piotr Szczesniak Date: Thu, 27 Aug 2015 10:50:50 +0200 Subject: Revert "LimitRange updates for Resource Requirements Requests" --- api_changes.md | 16 ---------------- 1 file changed, 16 deletions(-) diff --git a/api_changes.md b/api_changes.md index 709f8c2c..72c38b7f 100644 --- a/api_changes.md +++ b/api_changes.md @@ -345,22 +345,6 @@ generator to create it from scratch. Unsurprisingly, adding manually written conversion also requires you to add tests to `pkg/api//conversion_test.go`. -## Edit deep copy files - -At this point you have both the versioned API changes and the internal -structure changes done. You now need to generate code to handle deep copy -of your versioned api objects. - -The deep copy code resides with each versioned API: - - `pkg/api//deep_copy_generated.go` containing auto-generated copy functions - -To regenerate them: - - run - -```sh -hack/update-generated-deep-copies.sh -``` - ## Update the fuzzer Part of our testing regimen for APIs is to "fuzz" (fill with random values) API -- cgit v1.2.3 From ddba4b452707662f2fbd8d05c236c552c15cb377 Mon Sep 17 00:00:00 2001 From: qiaolei Date: Fri, 28 Aug 2015 16:40:59 +0800 Subject: Update quota example in admission_control_resource_quota.md Two modifications: 1, The example used in this document is outdated so update it 2, Delete the old `kubectl namespace myspace` since it produces an error `error: namespace has been superceded by the context.namespace field of .kubeconfig files` --- admission_control_resource_quota.md | 34 +++++++++++++++++++--------------- 1 file changed, 19 insertions(+), 15 deletions(-) diff --git a/admission_control_resource_quota.md b/admission_control_resource_quota.md index 86fae451..1931143c 100644 --- a/admission_control_resource_quota.md +++ b/admission_control_resource_quota.md @@ -201,21 +201,25 @@ kubectl is modified to support the **ResourceQuota** resource. For example, ```console -$ kubectl namespace myspace -$ kubectl create -f docs/user-guide/resourcequota/quota.yaml -$ kubectl get quota -NAME -quota -$ kubectl describe quota quota -Name: quota -Resource Used Hard --------- ---- ---- -cpu 0m 20 -memory 0 1Gi -pods 5 10 -replicationcontrollers 5 20 -resourcequotas 1 1 -services 3 5 +$ kubectl create -f docs/user-guide/resourcequota/namespace.yaml +namespace "quota-example" created +$ kubectl create -f docs/user-guide/resourcequota/quota.yaml --namespace=quota-example +resourcequota "quota" created +$ kubectl describe quota quota --namespace=quota-example +Name: quota +Namespace: quota-example +Resource Used Hard +-------- ---- ---- +cpu 0 20 +memory 0 1Gi +persistentvolumeclaims 0 10 +pods 0 10 +replicationcontrollers 0 20 +resourcequotas 1 1 +secrets 1 10 +services 0 5 + + ``` ## More information -- cgit v1.2.3 From 76238cf01e317aa95009ab78cd5f12756346cb57 Mon Sep 17 00:00:00 2001 From: Prashanth B Date: Fri, 28 Aug 2015 09:26:36 -0700 Subject: Revert "Revert "LimitRange updates for Resource Requirements Requests"" --- admission_control_limit_range.md | 64 ++++++++++++++++++++++------------------ 1 file changed, 35 insertions(+), 29 deletions(-) diff --git a/admission_control_limit_range.md b/admission_control_limit_range.md index 621fd564..e7c706ef 100644 --- a/admission_control_limit_range.md +++ b/admission_control_limit_range.md @@ -53,7 +53,7 @@ The **LimitRange** resource is scoped to a **Namespace**. ### Type ```go -// A type of object that is limited +// LimitType is a type of object that is limited type LimitType string const ( @@ -63,44 +63,50 @@ const ( LimitTypeContainer LimitType = "Container" ) -// LimitRangeItem defines a min/max usage limit for any resource that matches on kind +// LimitRangeItem defines a min/max usage limit for any resource that matches on kind. type LimitRangeItem struct { - // Type of resource that this limit applies to - Type LimitType `json:"type,omitempty" description:"type of resource that this limit applies to"` - // Max usage constraints on this kind by resource name - Max ResourceList `json:"max,omitempty" description:"max usage constraints on this kind by resource name"` - // Min usage constraints on this kind by resource name - Min ResourceList `json:"min,omitempty" description:"min usage constraints on this kind by resource name"` - // Default resource limits on this kind by resource name - Default ResourceList `json:"default,omitempty" description:"default resource limits values on this kind by resource name if omitted"` - // DefaultRequests resource requests on this kind by resource name - DefaultRequests ResourceList `json:"defaultRequests,omitempty" description:"default resource requests values on this kind by resource name if omitted"` - // LimitRequestRatio is the ratio of limit over request that is the maximum allowed burst for the named resource - LimitRequestRatio ResourceList `json:"limitRequestRatio,omitempty" description:"the ratio of limit over request that is the maximum allowed burst for the named resource. if specified, the named resource must have a request and limit that are both non-zero where limit divided by request is less than or equal to the enumerated value"` + // Type of resource that this limit applies to. + Type LimitType `json:"type,omitempty"` + // Max usage constraints on this kind by resource name. + Max ResourceList `json:"max,omitempty"` + // Min usage constraints on this kind by resource name. + Min ResourceList `json:"min,omitempty"` + // Default resource requirement limit value by resource name if resource limit is omitted. + Default ResourceList `json:"default,omitempty"` + // DefaultRequest is the default resource requirement request value by resource name if resource request is omitted. + DefaultRequest ResourceList `json:"defaultRequest,omitempty"` + // MaxLimitRequestRatio if specified, the named resource must have a request and limit that are both non-zero where limit divided by request is less than or equal to the enumerated value; this represents the max burst for the named resource. + MaxLimitRequestRatio ResourceList `json:"maxLimitRequestRatio,omitempty"` } -// LimitRangeSpec defines a min/max usage limit for resources that match on kind +// LimitRangeSpec defines a min/max usage limit for resources that match on kind. type LimitRangeSpec struct { - // Limits is the list of LimitRangeItem objects that are enforced - Limits []LimitRangeItem `json:"limits" description:"limits is the list of LimitRangeItem objects that are enforced"` + // Limits is the list of LimitRangeItem objects that are enforced. + Limits []LimitRangeItem `json:"limits"` } -// LimitRange sets resource usage limits for each kind of resource in a Namespace +// LimitRange sets resource usage limits for each kind of resource in a Namespace. type LimitRange struct { - TypeMeta `json:",inline"` - ObjectMeta `json:"metadata,omitempty" description:"standard object metadata; see http://releases.k8s.io/HEAD/docs/devel/api-conventions.md#metadata"` + TypeMeta `json:",inline"` + // Standard object's metadata. + // More info: http://releases.k8s.io/HEAD/docs/devel/api-conventions.md#metadata + ObjectMeta `json:"metadata,omitempty"` - // Spec defines the limits enforced - Spec LimitRangeSpec `json:"spec,omitempty" description:"spec defines the limits enforced; http://releases.k8s.io/HEAD/docs/devel/api-conventions.md#spec-and-status"` + // Spec defines the limits enforced. + // More info: http://releases.k8s.io/HEAD/docs/devel/api-conventions.md#spec-and-status + Spec LimitRangeSpec `json:"spec,omitempty"` } // LimitRangeList is a list of LimitRange items. type LimitRangeList struct { TypeMeta `json:",inline"` - ListMeta `json:"metadata,omitempty" description:"standard list metadata; see http://releases.k8s.io/HEAD/docs/devel/api-conventions.md#metadata"` + // Standard list metadata. + // More info: http://releases.k8s.io/HEAD/docs/devel/api-conventions.md#types-kinds + ListMeta `json:"metadata,omitempty"` - // Items is a list of LimitRange objects - Items []LimitRange `json:"items" description:"items is a list of LimitRange objects; see http://releases.k8s.io/HEAD/docs/design/admission_control_limit_range.md"` + // Items is a list of LimitRange objects. + // More info: http://releases.k8s.io/HEAD/docs/design/admission_control_limit_range.md + Items []LimitRange `json:"items"` } ``` @@ -108,7 +114,7 @@ type LimitRangeList struct { Validation of a **LimitRange** enforces that for a given named resource the following rules apply: -Min (if specified) <= DefaultRequests (if specified) <= Default (if specified) <= Max (if specified) +Min (if specified) <= DefaultRequest (if specified) <= Default (if specified) <= Max (if specified) ### Default Value Behavior @@ -121,11 +127,11 @@ if LimitRangeItem.Default[resourceName] is undefined ``` ``` -if LimitRangeItem.DefaultRequests[resourceName] is undefined +if LimitRangeItem.DefaultRequest[resourceName] is undefined if LimitRangeItem.Default[resourceName] is defined - LimitRangeItem.DefaultRequests[resourceName] = LimitRangeItem.Default[resourceName] + LimitRangeItem.DefaultRequest[resourceName] = LimitRangeItem.Default[resourceName] else if LimitRangeItem.Min[resourceName] is defined - LimitRangeItem.DefaultRequests[resourceName] = LimitRangeItem.Min[resourceName] + LimitRangeItem.DefaultRequest[resourceName] = LimitRangeItem.Min[resourceName] ``` ## AdmissionControl plugin: LimitRanger -- cgit v1.2.3 From 52a0abcbe29b4c98c3d3a6274dd2dc7a9a1b27ed Mon Sep 17 00:00:00 2001 From: Prashanth B Date: Fri, 28 Aug 2015 09:26:36 -0700 Subject: Revert "Revert "LimitRange updates for Resource Requirements Requests"" --- api_changes.md | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/api_changes.md b/api_changes.md index 72c38b7f..709f8c2c 100644 --- a/api_changes.md +++ b/api_changes.md @@ -345,6 +345,22 @@ generator to create it from scratch. Unsurprisingly, adding manually written conversion also requires you to add tests to `pkg/api//conversion_test.go`. +## Edit deep copy files + +At this point you have both the versioned API changes and the internal +structure changes done. You now need to generate code to handle deep copy +of your versioned api objects. + +The deep copy code resides with each versioned API: + - `pkg/api//deep_copy_generated.go` containing auto-generated copy functions + +To regenerate them: + - run + +```sh +hack/update-generated-deep-copies.sh +``` + ## Update the fuzzer Part of our testing regimen for APIs is to "fuzz" (fill with random values) API -- cgit v1.2.3 From 4fe08d305fc8a8b23b5ade90901d097b2a67abaa Mon Sep 17 00:00:00 2001 From: qiaolei Date: Sat, 29 Aug 2015 23:54:36 +0800 Subject: Amend some markdown errors in federation.md --- federation.md | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/federation.md b/federation.md index 34df0aee..371d9c30 100644 --- a/federation.md +++ b/federation.md @@ -237,10 +237,10 @@ It seems useful to split this into multiple sets of sub use cases: which feature sets like private networks, load balancing, persistent disks, data snapshots etc are typically consistent and explicitly designed to inter-operate). - 1.1. within the same geographical region (e.g. metro) within which network + 1. within the same geographical region (e.g. metro) within which network is fast and cheap enough to be almost analogous to a single data center. - 1.1. across multiple geographical regions, where high network cost and + 1. across multiple geographical regions, where high network cost and poor network performance may be prohibitive. 1. Multiple cloud providers (typically with inconsistent feature sets, more limited interoperability, and typically no cheap inter-cluster @@ -440,12 +440,13 @@ to be able to: There is of course a lot of detail still missing from this section, including discussion of: -1. admission control, + +1. admission control 1. initial placement of instances of a new service vs scheduling new instances of an existing service in response -to auto-scaling, +to auto-scaling 1. rescheduling pods due to failure (response might be -different depending on if it's failure of a node, rack, or whole AZ), +different depending on if it's failure of a node, rack, or whole AZ) 1. data placement relative to compute capacity, etc. -- cgit v1.2.3 From 29834b4bb59b41ba2a609cef1d459ccb47e595c7 Mon Sep 17 00:00:00 2001 From: qiaolei Date: Mon, 31 Aug 2015 14:39:44 +0800 Subject: Fix dead link in event_compression.md Where `pkg/client/record/event.go` should be `pkg/client/unversioned/record/event.go` --- event_compression.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/event_compression.md b/event_compression.md index 4525c097..e8f9775b 100644 --- a/event_compression.md +++ b/event_compression.md @@ -72,7 +72,7 @@ Each binary that generates events: * `event.Reason` * `event.Message` * The LRU cache is capped at 4096 events. That means if a component (e.g. kubelet) runs for a long period of time and generates tons of unique events, the previously generated events cache will not grow unchecked in memory. Instead, after 4096 unique events are generated, the oldest events are evicted from the cache. - * When an event is generated, the previously generated events cache is checked (see [`pkg/client/record/event.go`](http://releases.k8s.io/HEAD/pkg/client/record/event.go)). + * When an event is generated, the previously generated events cache is checked (see [`pkg/client/unversioned/record/event.go`](http://releases.k8s.io/HEAD/pkg/client/unversioned/record/event.go)). * If the key for the new event matches the key for a previously generated event (meaning all of the above fields match between the new event and some previously generated event), then the event is considered to be a duplicate and the existing event entry is updated in etcd: * The new PUT (update) event API is called to update the existing event entry in etcd with the new last seen timestamp and count. * The event is also updated in the previously generated events cache with an incremented count, updated last seen timestamp, name, and new resource version (all required to issue a future event update). -- cgit v1.2.3 From 2acd635b4749add50f4e5b9753fe77e14180eb6e Mon Sep 17 00:00:00 2001 From: Harry Zhang Date: Mon, 31 Aug 2015 12:15:05 +0800 Subject: Fix inconsistency path in GOPATH doc we set up $KPATH/src/k8s.io/kubernetes directory, but ask user to `cd` into $KPATH/src/github.com/kubernetes Close this if I made mistaken this --- development.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/development.md b/development.md index fc1de093..65ab981b 100644 --- a/development.md +++ b/development.md @@ -168,7 +168,7 @@ export GOPATH=$KPATH 3) Populate your new GOPATH. ```sh -cd $KPATH/src/github.com/kubernetes/kubernetes +cd $KPATH/src/k8s.io/kubernetes godep restore ``` -- cgit v1.2.3 From c4ed1dfa7803b1edcbb2dbdfe16dfc706d790e6d Mon Sep 17 00:00:00 2001 From: Piotr Szczesniak Date: Fri, 28 Aug 2015 10:42:30 +0200 Subject: Fixed links in initial-resources proposal --- initial-resources.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/initial-resources.md b/initial-resources.md index 8ab2a46c..514c03ff 100644 --- a/initial-resources.md +++ b/initial-resources.md @@ -39,7 +39,7 @@ and set them before the container is run. This document describes design of the ## Motivation Since we want to make Kubernetes as simple as possible for its users we don’t want to require setting -[Resources](https://github.com/GoogleCloudPlatform/kubernetes/blob/7c9bbef96ed7f2a192a1318aa312919b861aee00/pkg/api/v1/types.go#L696) +[Resources](resource-qos.md#resource-specifications) for container by its owner. On the other hand having Resources filled is critical for scheduling decisions. Current solution to set up Resources to hardcoded value has obvious drawbacks. We need to implement a component which will set initial Resources to a reasonable value. @@ -47,12 +47,12 @@ which will set initial Resources to a reasonable value. ## Design InitialResources component will be implemented as an [admission plugin](../../plugin/pkg/admission/) and invoked right before -[LimitRanger](https://github.com/GoogleCloudPlatform/kubernetes/blob/7c9bbef96ed7f2a192a1318aa312919b861aee00/cluster/gce/config-default.sh#L91). +[LimitRanger](https://github.com/kubernetes/kubernetes/blob/7c9bbef96ed7f2a192a1318aa312919b861aee00/cluster/gce/config-default.sh#L91). For every container without Resources specified it will try to predict amount of resources that should be sufficient for it. So that a pod without specified resources will be treated as -[Burstable](https://github.com/GoogleCloudPlatform/kubernetes/blob/be5e224a0f1c928d49c48aa6a6539d22c47f9238/docs/proposals/resource-qos.md#qos-classes). +[Burstable](resource-qos.md#qos-classes). -InitialResources will set only [Request](https://github.com/GoogleCloudPlatform/kubernetes/blob/3d2d99c6fd920386eea4ec050164839ec6db38f0/pkg/api/v1/types.go#L665) +InitialResources will set only [request](resource-qos.md#resource-specifications) (independently for each resource type: cpu, memory) field in the first version to avoid killing containers due to OOM (however the container still may be killed if exceeds requested resources). To make the component work with LimitRanger the estimated value will be capped by min and max possible values if defined. @@ -60,7 +60,7 @@ It will prevent from situation when the pod is rejected due to too low or too hi The container won’t be marked as managed by this component in any way, however appropriate event will be exported. The predicting algorithm should have very low latency to not increase significantly e2e pod startup latency -[#3954](https://github.com/GoogleCloudPlatform/kubernetes/pull/3954). +[#3954](https://github.com/kubernetes/kubernetes/pull/3954). ### Predicting algorithm details -- cgit v1.2.3 From ca9f771cf90bd88378a3e6b0cee9f1dcfeea58c7 Mon Sep 17 00:00:00 2001 From: Brian Grant Date: Thu, 27 Aug 2015 21:12:06 +0000 Subject: Start on expanding code expectations (aka "The bar") --- api-conventions.md | 7 ++++ api_changes.md | 90 +++++++++++++++++++++++++++++++++++++++++++++++--- coding-conventions.md | 51 ++++++++++++++++++++++++++-- development.md | 2 ++ faster_reviews.md | 32 +++++++++++++++--- kubectl-conventions.md | 27 ++++++++++++++- 6 files changed, 195 insertions(+), 14 deletions(-) diff --git a/api-conventions.md b/api-conventions.md index f00dde1e..746d56cb 100644 --- a/api-conventions.md +++ b/api-conventions.md @@ -713,6 +713,13 @@ Annotations have very different intended usage from labels. We expect them to be In fact, experimental API fields, including to represent fields of newer alpha/beta API versions in the older, stable storage version, may be represented as annotations with the prefix `experimental.kubernetes.io/`. +Other advice regarding use of labels, annotations, and other generic map keys by Kubernetes components and tools: + - Key names should be all lowercase, with words separated by dashes, such as `desired-replicas` + - Prefix the key with `kubernetes.io/` or `foo.kubernetes.io/`, preferably the latter if the label/annotation is specific to `foo` + - For instance, prefer `service-account.kubernetes.io/name` over `kubernetes.io/service-account.name` + - Use annotations to store API extensions that the controller responsible for the resource doesn't need to know about, experimental fields that aren't intended to be generally used API fields, etc. Beware that annotations aren't automatically handled by the API conversion machinery. + + [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/api-conventions.md?pixel)]() diff --git a/api_changes.md b/api_changes.md index 72c38b7f..289123d5 100644 --- a/api_changes.md +++ b/api_changes.md @@ -189,17 +189,82 @@ API call might POST an object in API v7beta1 format, which uses the cleaner form (since v7beta1 is "beta"). When the user reads the object back in the v7beta1 API it would be unacceptable to have lost all but `Params[0]`. This means that, even though it is ugly, a compatible change must be made to the v6 -API. However, this is very challenging to do correctly. It generally requires +API. + +However, this is very challenging to do correctly. It often requires multiple representations of the same information in the same API resource, which -need to be kept in sync in the event that either is changed. However, if -the new representation is more expressive than the old, this breaks -backward compatibility, since clients that only understood the old representation +need to be kept in sync in the event that either is changed. For example, +let's say you decide to rename a field within the same API version. In this case, +you add units to `height` and `width`. You implement this by adding duplicate +fields: + +```go +type Frobber struct { + Height *int `json:"height"` + Width *int `json:"width"` + HeightInInches *int `json:"heightInInches"` + WidthInInches *int `json:"widthInInches"` +} +``` + +You convert all of the fields to pointers in order to distinguish between unset and +set to 0, and then set each corresponding field from the other in the defaulting +pass (e.g., `heightInInches` from `height`, and vice versa), which runs just prior +to conversion. That works fine when the user creates a resource from a hand-written +configuration -- clients can write either field and read either field, but what about +creation or update from the output of GET, or update via PATCH (see +[In-place updates](../user-guide/managing-deployments.md#in-place-updates-of-resources))? +In this case, the two fields will conflict, because only one field would be updated +in the case of an old client that was only aware of the old field (e.g., `height`). + +Say the client creates: + +```json +{ + "height": 10, + "width": 5 +} +``` + +and GETs: + +```json +{ + "height": 10, + "heightInInches": 10, + "width": 5, + "widthInInches": 5 +} +``` + +then PUTs back: + +```json +{ + "height": 13, + "heightInInches": 10, + "width": 5, + "widthInInches": 5 +} +``` + +The update should not fail, because it would have worked before `heightInInches` was added. + +Therefore, when there are duplicate fields, the old field MUST take precedence +over the new, and the new field should be set to match by the server upon write. +A new client would be aware of the old field as well as the new, and so can ensure +that the old field is either unset or is set consistently with the new field. However, +older clients would be unaware of the new field. Please avoid introducing duplicate +fields due to the complexity they incur in the API. + +A new representation, even in a new API version, that is more expressive than an old one +breaks backward compatibility, since clients that only understood the old representation would not be aware of the new representation nor its semantics. Examples of proposals that have run into this challenge include [generalized label selectors](http://issues.k8s.io/341) and [pod-level security context](http://prs.k8s.io/12823). -As another interesting example, enumerated values provide a unique challenge. +As another interesting example, enumerated values cause similar challenges. Adding a new value to an enumerated set is *not* a compatible change. Clients which assume they know how to handle all possible values of a given field will not be able to handle the new values. However, removing value from an @@ -227,6 +292,21 @@ the release notes for the next release by labeling the PR with the "release-note If you found that your change accidentally broke clients, it should be reverted. +In short, the expected API evolution is as follows: +* `experimental/v1alpha1` -> +* `newapigroup/v1alpha1` -> ... -> `newapigroup/v1alphaN` -> +* `newapigroup/v1beta1` -> ... -> `newapigroup/v1betaN` -> +* `newapigroup/v1` -> +* `newapigroup/v2alpha1` -> ... + +While in experimental we have no obligation to move forward with the API at all and may delete or break it at any time. + +While in alpha we expect to move forward with it, but may break it. + +Once in beta we will preserve forward compatibility, but may introduce new versions and delete old ones. + +v1 must be backward-compatible for an extended length of time. + ## Changing versioned APIs For most changes, you will probably find it easiest to change the versioned diff --git a/coding-conventions.md b/coding-conventions.md index ac3d353f..1569d1aa 100644 --- a/coding-conventions.md +++ b/coding-conventions.md @@ -30,12 +30,57 @@ Documentation for other releases can be found at -Coding style advice for contributors +Code conventions - Bash - https://google-styleguide.googlecode.com/svn/trunk/shell.xml + - Ensure that build, release, test, and cluster-management scripts run on OS X - Go - - https://github.com/golang/go/wiki/CodeReviewComments - - https://gist.github.com/lavalamp/4bd23295a9f32706a48f + - Ensure your code passes the [presubmit checks](development.md#hooks) + - [Go Code Review Comments](https://github.com/golang/go/wiki/CodeReviewComments) + - [Effective Go](https://golang.org/doc/effective_go.html) + - Comment your code. + - [Go's commenting conventions](http://blog.golang.org/godoc-documenting-go-code) + - If reviewers ask questions about why the code is the way it is, that's a sign that comments might be helpful. + - Command-line flags should use dashes, not underscores + - Naming + - Please consider package name when selecting an interface name, and avoid redundancy. + - e.g.: `storage.Interface` is better than `storage.StorageInterface`. + - Do not use uppercase characters, underscores, or dashes in package names. + - Please consider parent directory name when choosing a package name. + - so pkg/controllers/autoscaler/foo.go should say `package autoscaler` not `package autoscalercontroller`. + - Unless there's a good reason, the `package foo` line should match the name of the directory in which the .go file exists. + - Importers can use a different name if they need to disambiguate. + - API conventions + - [API changes](api_changes.md) + - [API conventions](api-conventions.md) + - [Kubectl conventions](kubectl-conventions.md) + - [Logging conventions](logging.md) + +Testing conventions + - All new packages and most new significant functionality must come with unit tests + - Table-driven tests are preferred for testing multiple scenarios/inputs; for example, see [TestNamespaceAuthorization](../../test/integration/auth_test.go) + - Significant features should come with integration (test/integration) and/or end-to-end (test/e2e) tests + - Including new kubectl commands and major features of existing commands + - Unit tests must pass on OS X and Windows platforms - if you use Linux specific features, your test case must either be skipped on windows or compiled out (skipped is better when running Linux specific commands, compiled out is required when your code does not compile on Windows). + +Directory and file conventions + - Avoid package sprawl. Find an appropriate subdirectory for new packages. (See [#4851](http://issues.k8s.io/4851) for discussion.) + - Libraries with no more appropriate home belong in new package subdirectories of pkg/util + - Avoid general utility packages. Packages called "util" are suspect. Instead, derive a name that describes your desired function. For example, the utility functions dealing with waiting for operations are in the "wait" package and include functionality like Poll. So the full name is wait.Poll + - Go source files and directories use underscores, not dashes + - Package directories should generally avoid using separators as much as possible (when packages are multiple words, they usually should be in nested subdirectories). + - Document directories and filenames should use dashes rather than underscores + - Contrived examples that illustrate system features belong in /docs/user-guide or /docs/admin, depending on whether it is a feature primarily intended for users that deploy applications or cluster administrators, respectively. Actual application examples belong in /examples. + - Examples should also illustrate [best practices for using the system](../user-guide/config-best-practices.md) + - Third-party code + - Third-party Go code is managed using Godeps + - Other third-party code belongs in /third_party + - Third-party code must include licenses + - This includes modified third-party code and excerpts, as well + +Coding advice + - Go + - [Go landmines](https://gist.github.com/lavalamp/4bd23295a9f32706a48f) diff --git a/development.md b/development.md index a266f7cb..44ceee1c 100644 --- a/development.md +++ b/development.md @@ -112,6 +112,8 @@ fixups (e.g. automated doc formatting), use one or more commits for the changes to tooling and a final commit to apply the fixup en masse. This makes reviews much easier. +See [Faster Reviews](faster_reviews.md) for more details. + ## godep and dependency management Kubernetes uses [godep](https://github.com/tools/godep) to manage dependencies. It is not strictly required for building Kubernetes but it is required when managing dependencies under the Godeps/ tree, and is required by a number of the build and test scripts. Please make sure that ``godep`` is installed and in your ``$PATH``. diff --git a/faster_reviews.md b/faster_reviews.md index 3ea030d3..0c70e435 100644 --- a/faster_reviews.md +++ b/faster_reviews.md @@ -53,15 +53,24 @@ later, just as soon as they have more free time (ha!). Let's talk about how to avoid this. +## 0. Familiarize yourself with project conventions + +* [Development guide](development.md) +* [Coding conventions](coding-conventions.md) +* [API conventions](api-conventions.md) +* [Kubectl conventions](kubectl-conventions.md) + ## 1. Don't build a cathedral in one PR Are you sure FeatureX is something the Kubernetes team wants or will accept, or that it is implemented to fit with other changes in flight? Are you willing to bet a few days or weeks of work on it? If you have any doubt at all about the -usefulness of your feature or the design - make a proposal doc or a sketch PR -or both. Write or code up just enough to express the idea and the design and -why you made those choices, then get feedback on this. Now, when we ask you to -change a bunch of facets of the design, you don't have to re-write it all. +usefulness of your feature or the design - make a proposal doc (in docs/proposals; +for example [the QoS proposal](http://prs.k8s.io/11713)) or a sketch PR (e.g., just +the API or Go interface) or both. Write or code up just enough to express the idea +and the design and why you made those choices, then get feedback on this. Be clear +about what type of feedback you are asking for. Now, if we ask you to change a +bunch of facets of the design, you won't have to re-write it all. ## 2. Smaller diffs are exponentially better @@ -154,7 +163,20 @@ commit and re-push. Your reviewer can then look at that commit on its own - so much faster to review than starting over. We might still ask you to clean up your commits at the very end, for the sake -of a more readable history. +of a more readable history, but don't do this until asked, typically at the point +where the PR would otherwise be tagged LGTM. + +General squashing guidelines: + +* Sausage => squash + + When there are several commits to fix bugs in the original commit(s), address reviewer feedback, etc. Really we only want to see the end state and commit message for the whole PR. + +* Layers => don't squash + + When there are independent changes layered upon each other to achieve a single goal. For instance, writing a code munger could be one commit, applying it could be another, and adding a precommit check could be a third. One could argue they should be separate PRs, but there's really no way to test/review the munger without seeing it applied, and there needs to be a precommit check to ensure the munged output doesn't immediately get out of date. + +A commit, as much as possible, should be a single logical change. Each commit should always have a good title line (<70 characters) and include an additional description paragraph describing in more detail the change intended. Do not link pull requests by `#` in a commit description, because GitHub creates lots of spam. Instead, reference other PRs via the PR your commit is in. ## 8. KISS, YAGNI, MVP, etc diff --git a/kubectl-conventions.md b/kubectl-conventions.md index 5739708c..a37e5899 100644 --- a/kubectl-conventions.md +++ b/kubectl-conventions.md @@ -34,7 +34,7 @@ Documentation for other releases can be found at Kubectl Conventions =================== -Updated: 8/12/2015 +Updated: 8/27/2015 **Table of Contents** @@ -77,6 +77,31 @@ Updated: 8/12/2015 * Flags are all lowercase, with words separated by hyphens * Flag names and single-character aliases should have the same meaning across all commands * Command-line flags corresponding to API fields should accept API enums exactly (e.g., --restart=Always) +* Do not reuse flags for different semantic purposes, and do not use different flag names for the same semantic purpose -- grep for `"Flags()"` before adding a new flag +* Use short flags sparingly, only for the most frequently used options, prefer lowercase over uppercase for the most common cases, try to stick to well known conventions for UNIX commands and/or Docker, where they exist, and update this list when adding new short flags + * `-f`: Resource file + * also used for `--follow` in `logs`, but should be deprecated in favor of `-F` + * `-l`: Label selector + * also used for `--labels` in `expose`, but should be deprecated + * `-L`: Label columns + * `-c`: Container + * also used for `--client` in `version`, but should be deprecated + * `-i`: Attach stdin + * `-t`: Allocate TTY + * also used for `--template`, but deprecated + * `-w`: Watch (currently also used for `--www` in `proxy`, but should be deprecated) + * `-p`: Previous + * also used for `--pod` in `exec`, but deprecated + * also used for `--patch` in `patch`, but should be deprecated + * also used for `--port` in `proxy`, but should be deprecated + * `-P`: Static file prefix in `proxy`, but should be deprecated + * `-r`: Replicas + * `-u`: Unix socket + * `-v`: Verbose logging level +* `--dry-run`: Don't modify the live state; simulate the mutation and display the output +* `--local`: Don't contact the server; just do local read, transformation, generation, etc. and display the output +* `--output-version=...`: Convert the output to a different API group/version +* `--validate`: Validate the resource schema ## Output conventions -- cgit v1.2.3 From e3bbb1eb035637aad7118bf5ec21ced68b5284f5 Mon Sep 17 00:00:00 2001 From: qiaolei Date: Wed, 2 Sep 2015 15:11:22 +0800 Subject: Update quota example Update quota example to track latest changes --- admission_control_resource_quota.md | 2 -- 1 file changed, 2 deletions(-) diff --git a/admission_control_resource_quota.md b/admission_control_resource_quota.md index 1931143c..4b417ead 100644 --- a/admission_control_resource_quota.md +++ b/admission_control_resource_quota.md @@ -218,8 +218,6 @@ replicationcontrollers 0 20 resourcequotas 1 1 secrets 1 10 services 0 5 - - ``` ## More information -- cgit v1.2.3 From 8c4c1cb764238293cb3805074b78c70327258865 Mon Sep 17 00:00:00 2001 From: hw-qiaolei Date: Sat, 29 Aug 2015 21:08:46 +0000 Subject: Adjust the architecture diagram Some modifications of the architecture diagram: 1. adjust the order of authz and authn; since the API server usually first authenticate user, if it is a valid user then authorize it 2. adjust the arrow to point to kubelet instead of to node of the second node 3. change `replication controller` to `controller manager(replication controller etc.)` which connects to the REST API Server 4. some tiny adjustments of the arrow position 5. affected files: architecture.svg, architecture.png and architecture.dia --- architecture.dia | Bin 6519 -> 6523 bytes architecture.png | Bin 223860 -> 268126 bytes architecture.svg | 2220 ++++++++++++++++++++++++++++++++++++++++++++---------- 3 files changed, 1832 insertions(+), 388 deletions(-) diff --git a/architecture.dia b/architecture.dia index 441e3563..5c87409f 100644 Binary files a/architecture.dia and b/architecture.dia differ diff --git a/architecture.png b/architecture.png index b03cfe88..0ee8bceb 100644 Binary files a/architecture.png and b/architecture.png differ diff --git a/architecture.svg b/architecture.svg index cacc7fbf..d6b6aab0 100644 --- a/architecture.svg +++ b/architecture.svg @@ -1,499 +1,1943 @@ - - - - - - - - - - - - Node + + + + + image/svg+xml + + + + + + + + + + + + + + + + Node - - - - - kubelet + + + + + kubelet - - - + + + - - - - - container + + + + + container - - - - - container + + + + + container - - - - - cAdvisor + + + + + cAdvisor - - + + - - Pod + + Pod - - - - + + + + - - - - - container + + + + + container - - - - - container + + + + + container - - - - - container + + + + + container - - + + - - Pod + + Pod - - - - + + + + - - - - - container + + + + + container - - - - - container + + + + + container - - - - - container + + + + + container - - + + - - Pod + + Pod - - - - - Proxy + + + + + Proxy - - - - - kubectl (user commands) + + + + + kubectl (user commands) - - + + - - - - - - - - - - Firewall + + + + + + + + + + Firewall - - - - - Internet + + + + + Internet - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - replication controller + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + controller manager + (replication controller etc.) - - - - - Scheduler + + + + + Scheduler - - - - - Scheduler + + + + + Scheduler - - Master components - Colocated, or spread across machines, - as dictated by cluster size. + + Master components + Colocated, or spread across machines, + as dictated by cluster size. - - + + - - + + - - - - - REST - (pods, services, - rep. controllers) + + + + + REST + (pods, services, + rep. controllers) - - - - - authorization - authentication + + + + + authentication + authorization - - - - - scheduling - actuator + + + + + scheduling + actuator - - APIs + + APIs - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - - docker + + docker - - - - + + + + - - .. + + .. - - ... + + ... - - - - - - - - - - - - + + + + + + + + + + + + - - - + + + - - - + + + - - Node + + Node - - - - - kubelet + + + + + kubelet - - - + + + - - - - - container + + + + + container - - - - - container + + + + + container - - - - - cAdvisor + + + + + cAdvisor - - + + - - Pod + + Pod - - - - + + + + - - - - - container + + + + + container - - - - - container + + + + + container - - - - - container + + + + + container - - + + - - Pod + + Pod - - - - + + + + - - - - - container + + + + + container - - - - - container + + + + + container - - - - - container + + + + + container - - + + - - Pod + + Pod - - - - - Proxy + + + + + Proxy - - - - + + + + - - - - + + + + - - - - + + + + - - docker + + docker - - - - + + + + - - .. + + .. - - ... + + ... - - - - - - - - - - - - + + + + + + + + + + + + - - - - - - - - - - - - Distributed - Watchable - Storage - - (implemented via etcd) + + + + + + + + + + + + Distributed + Watchable + Storage + + (implemented via etcd) -- cgit v1.2.3 From f8a0e45ebb98f2d15446906c651962a395f38dbe Mon Sep 17 00:00:00 2001 From: Eric Paris Date: Wed, 2 Sep 2015 18:00:52 -0400 Subject: Fix the link to the submit-queue whitelist --- pull-requests.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pull-requests.md b/pull-requests.md index 126b8996..157646c0 100644 --- a/pull-requests.md +++ b/pull-requests.md @@ -59,7 +59,7 @@ There are several requirements for the submit queue to work: * No changes can be made since last lgtm label was applied * k8s-bot must have reported the GCE E2E build and test steps passed (Travis, Shippable and Jenkins build) -Additionally, for infrequent or new contributors, we require the on call to apply the "ok-to-merge" label manually. This is gated by the [whitelist](https://github.com/contrib/tree/master/submit-queue/whitelist.txt). +Additionally, for infrequent or new contributors, we require the on call to apply the "ok-to-merge" label manually. This is gated by the [whitelist](https://github.com/kubernetes/contrib/tree/master/submit-queue/whitelist.txt). -- cgit v1.2.3 From d5abea115d0aef5aae87565c2e31165da70c96da Mon Sep 17 00:00:00 2001 From: Eric Paris Date: Thu, 3 Sep 2015 10:10:11 -0400 Subject: s|github.com/GoogleCloudPlatform/kubernetes|github.com/kubernetes/kubernetes| --- cherry-picks.md | 2 +- cli-roadmap.md | 6 +++--- flaky-tests.md | 2 +- instrumentation.md | 12 ++++++------ issues.md | 2 +- making-release-notes.md | 2 +- pull-requests.md | 2 +- 7 files changed, 14 insertions(+), 14 deletions(-) diff --git a/cherry-picks.md b/cherry-picks.md index 519c73c3..7cb60465 100644 --- a/cherry-picks.md +++ b/cherry-picks.md @@ -62,7 +62,7 @@ conflict***. Now that we've structured cherry picks as PRs, searching for all cherry-picks against a release is a GitHub query: For example, -[this query is all of the v0.21.x cherry-picks](https://github.com/GoogleCloudPlatform/kubernetes/pulls?utf8=%E2%9C%93&q=is%3Apr+%22automated+cherry+pick%22+base%3Arelease-0.21) +[this query is all of the v0.21.x cherry-picks](https://github.com/kubernetes/kubernetes/pulls?utf8=%E2%9C%93&q=is%3Apr+%22automated+cherry+pick%22+base%3Arelease-0.21) diff --git a/cli-roadmap.md b/cli-roadmap.md index 69084555..42784dbc 100644 --- a/cli-roadmap.md +++ b/cli-roadmap.md @@ -34,9 +34,9 @@ Documentation for other releases can be found at # Kubernetes CLI/Configuration Roadmap See github issues with the following labels: -* [area/app-config-deployment](https://github.com/GoogleCloudPlatform/kubernetes/labels/area/app-config-deployment) -* [component/CLI](https://github.com/GoogleCloudPlatform/kubernetes/labels/component/CLI) -* [component/client](https://github.com/GoogleCloudPlatform/kubernetes/labels/component/client) +* [area/app-config-deployment](https://github.com/kubernetes/kubernetes/labels/area/app-config-deployment) +* [component/CLI](https://github.com/kubernetes/kubernetes/labels/component/CLI) +* [component/client](https://github.com/kubernetes/kubernetes/labels/component/client) diff --git a/flaky-tests.md b/flaky-tests.md index 9db9e15c..3a7af51e 100644 --- a/flaky-tests.md +++ b/flaky-tests.md @@ -64,7 +64,7 @@ spec: - name: TEST_PACKAGE value: pkg/tools - name: REPO_SPEC - value: https://github.com/GoogleCloudPlatform/kubernetes + value: https://github.com/kubernetes/kubernetes ``` Note that we omit the labels and the selector fields of the replication controller, because they will be populated from the labels field of the pod template by default. diff --git a/instrumentation.md b/instrumentation.md index 8cc9e2b2..683f9d93 100644 --- a/instrumentation.md +++ b/instrumentation.md @@ -44,18 +44,18 @@ We use the Prometheus monitoring system's golang client library for instrumentin 2. Give the metric a name and description. 3. Pick whether you want to distinguish different categories of things using labels on the metric. If so, add "Vec" to the name of the type of metric you want and add a slice of the label names to the definition. - https://github.com/GoogleCloudPlatform/kubernetes/blob/cd3299307d44665564e1a5c77d0daa0286603ff5/pkg/apiserver/apiserver.go#L53 - https://github.com/GoogleCloudPlatform/kubernetes/blob/cd3299307d44665564e1a5c77d0daa0286603ff5/pkg/kubelet/metrics/metrics.go#L31 + https://github.com/kubernetes/kubernetes/blob/cd3299307d44665564e1a5c77d0daa0286603ff5/pkg/apiserver/apiserver.go#L53 + https://github.com/kubernetes/kubernetes/blob/cd3299307d44665564e1a5c77d0daa0286603ff5/pkg/kubelet/metrics/metrics.go#L31 3. Register the metric so that prometheus will know to export it. - https://github.com/GoogleCloudPlatform/kubernetes/blob/cd3299307d44665564e1a5c77d0daa0286603ff5/pkg/kubelet/metrics/metrics.go#L74 - https://github.com/GoogleCloudPlatform/kubernetes/blob/cd3299307d44665564e1a5c77d0daa0286603ff5/pkg/apiserver/apiserver.go#L78 + https://github.com/kubernetes/kubernetes/blob/cd3299307d44665564e1a5c77d0daa0286603ff5/pkg/kubelet/metrics/metrics.go#L74 + https://github.com/kubernetes/kubernetes/blob/cd3299307d44665564e1a5c77d0daa0286603ff5/pkg/apiserver/apiserver.go#L78 4. Use the metric by calling the appropriate method for your metric type (Set, Inc/Add, or Observe, respectively for Gauge, Counter, or Histogram/Summary), first calling WithLabelValues if your metric has any labels - https://github.com/GoogleCloudPlatform/kubernetes/blob/3ce7fe8310ff081dbbd3d95490193e1d5250d2c9/pkg/kubelet/kubelet.go#L1384 - https://github.com/GoogleCloudPlatform/kubernetes/blob/cd3299307d44665564e1a5c77d0daa0286603ff5/pkg/apiserver/apiserver.go#L87 + https://github.com/kubernetes/kubernetes/blob/3ce7fe8310ff081dbbd3d95490193e1d5250d2c9/pkg/kubelet/kubelet.go#L1384 + https://github.com/kubernetes/kubernetes/blob/cd3299307d44665564e1a5c77d0daa0286603ff5/pkg/apiserver/apiserver.go#L87 These are the metric type definitions if you're curious to learn about them or need more information: diff --git a/issues.md b/issues.md index 46beb9ce..c7bda07b 100644 --- a/issues.md +++ b/issues.md @@ -33,7 +33,7 @@ Documentation for other releases can be found at GitHub Issues for the Kubernetes Project ======================================== -A list quick overview of how we will review and prioritize incoming issues at https://github.com/GoogleCloudPlatform/kubernetes/issues +A list quick overview of how we will review and prioritize incoming issues at https://github.com/kubernetes/kubernetes/issues Priorities ---------- diff --git a/making-release-notes.md b/making-release-notes.md index 1efab1ac..871e65b4 100644 --- a/making-release-notes.md +++ b/making-release-notes.md @@ -66,7 +66,7 @@ With the final markdown all set, cut and paste it to the top of `CHANGELOG.md` ### 5) Update the Release page - * Switch to the [releases](https://github.com/GoogleCloudPlatform/kubernetes/releases) page. + * Switch to the [releases](https://github.com/kubernetes/kubernetes/releases) page. * Open up the release you are working on. * Cut and paste the final markdown from above into the release notes * Press Save. diff --git a/pull-requests.md b/pull-requests.md index 157646c0..a81c01c5 100644 --- a/pull-requests.md +++ b/pull-requests.md @@ -52,7 +52,7 @@ Life of a Pull Request Unless in the last few weeks of a milestone when we need to reduce churn and stabilize, we aim to be always accepting pull requests. -Either the [on call](https://github.com/GoogleCloudPlatform/kubernetes/wiki/Kubernetes-on-call-rotation) manually or the [submit queue](https://github.com/contrib/tree/master/submit-queue) automatically will manage merging PRs. +Either the [on call](https://github.com/kubernetes/kubernetes/wiki/Kubernetes-on-call-rotation) manually or the [submit queue](https://github.com/contrib/tree/master/submit-queue) automatically will manage merging PRs. There are several requirements for the submit queue to work: * Author must have signed CLA ("cla: yes" label added to PR) -- cgit v1.2.3 From a7118ba1b290cd97dcc65b3d906dede726955432 Mon Sep 17 00:00:00 2001 From: Eric Paris Date: Thu, 3 Sep 2015 10:10:11 -0400 Subject: s|github.com/GoogleCloudPlatform/kubernetes|github.com/kubernetes/kubernetes| --- autoscaling.md | 2 +- deployment.md | 2 +- horizontal-pod-autoscaler.md | 8 ++++---- job.md | 6 +++--- 4 files changed, 9 insertions(+), 9 deletions(-) diff --git a/autoscaling.md b/autoscaling.md index 9c5ec752..ea60af74 100644 --- a/autoscaling.md +++ b/autoscaling.md @@ -47,7 +47,7 @@ done automatically based on statistical analysis and thresholds. * Provide a concrete proposal for implementing auto-scaling pods within Kubernetes * Implementation proposal should be in line with current discussions in existing issues: * Scale verb - [1629](http://issue.k8s.io/1629) - * Config conflicts - [Config](https://github.com/GoogleCloudPlatform/kubernetes/blob/c7cb991987193d4ca33544137a5cb7d0292cf7df/docs/config.md#automated-re-configuration-processes) + * Config conflicts - [Config](https://github.com/kubernetes/kubernetes/blob/c7cb991987193d4ca33544137a5cb7d0292cf7df/docs/config.md#automated-re-configuration-processes) * Rolling updates - [1353](http://issue.k8s.io/1353) * Multiple scalable types - [1624](http://issue.k8s.io/1624) diff --git a/deployment.md b/deployment.md index 0a79ca86..6819acee 100644 --- a/deployment.md +++ b/deployment.md @@ -260,7 +260,7 @@ Apart from the above, we want to add support for the following: ## References -- https://github.com/GoogleCloudPlatform/kubernetes/issues/1743 has most of the +- https://github.com/kubernetes/kubernetes/issues/1743 has most of the discussion that resulted in this proposal. diff --git a/horizontal-pod-autoscaler.md b/horizontal-pod-autoscaler.md index c10f54f7..6ae84532 100644 --- a/horizontal-pod-autoscaler.md +++ b/horizontal-pod-autoscaler.md @@ -61,7 +61,7 @@ HorizontalPodAutoscaler object will be bound with exactly one Scale subresource autoscaling associated replication controller/deployment through it. The main advantage of such approach is that whenever we introduce another type we want to auto-scale, we just need to implement Scale subresource for it (w/o modifying autoscaler code or API). -The wider discussion regarding Scale took place in [#1629](https://github.com/GoogleCloudPlatform/kubernetes/issues/1629). +The wider discussion regarding Scale took place in [#1629](https://github.com/kubernetes/kubernetes/issues/1629). Scale subresource will be present in API for replication controller or deployment under the following paths: @@ -192,7 +192,7 @@ The autoscaler will be implemented as a control loop. It will periodically (e.g.: every 1 minute) query pods described by ```Status.PodSelector``` of Scale subresource, and check their average CPU or memory usage from the last 1 minute (there will be API on master for this purpose, see -[#11951](https://github.com/GoogleCloudPlatform/kubernetes/issues/11951). +[#11951](https://github.com/kubernetes/kubernetes/issues/11951). Then, it will compare the current CPU or memory consumption with the Target, and adjust the count of the Scale if needed to match the target (preserving condition: MinCount <= Count <= MaxCount). @@ -265,9 +265,9 @@ Our design is in general compatible with them. and then turned-on when there is a demand for them. When a request to service with no pods arrives, kube-proxy will generate an event for autoscaler to create a new pod. - Discussed in [#3247](https://github.com/GoogleCloudPlatform/kubernetes/issues/3247). + Discussed in [#3247](https://github.com/kubernetes/kubernetes/issues/3247). * When scaling down, make more educated decision which pods to kill (e.g.: if two or more pods are on the same node, kill one of them). - Discussed in [#4301](https://github.com/GoogleCloudPlatform/kubernetes/issues/4301). + Discussed in [#4301](https://github.com/kubernetes/kubernetes/issues/4301). * Allow rule based autoscaling: instead of specifying the target value for metric, specify a rule, e.g.: “if average CPU consumption of pod is higher than 80% add two more replicas”. This approach was initially suggested in diff --git a/job.md b/job.md index 57717ea5..198a1437 100644 --- a/job.md +++ b/job.md @@ -40,8 +40,8 @@ for managing pod(s) that require running once to completion even if the machine the pod is running on fails, in contrast to what ReplicationController currently offers. Several existing issues and PRs were already created regarding that particular subject: -* Job Controller [#1624](https://github.com/GoogleCloudPlatform/kubernetes/issues/1624) -* New Job resource [#7380](https://github.com/GoogleCloudPlatform/kubernetes/pull/7380) +* Job Controller [#1624](https://github.com/kubernetes/kubernetes/issues/1624) +* New Job resource [#7380](https://github.com/kubernetes/kubernetes/pull/7380) ## Use Cases @@ -181,7 +181,7 @@ Below are the possible future extensions to the Job controller: * Be able to limit the execution time for a job, similarly to ActiveDeadlineSeconds for Pods. * Be able to create a chain of jobs dependent one on another. * Be able to specify the work each of the workers should execute (see type 1 from - [this comment](https://github.com/GoogleCloudPlatform/kubernetes/issues/1624#issuecomment-97622142)) + [this comment](https://github.com/kubernetes/kubernetes/issues/1624#issuecomment-97622142)) * Be able to inspect Pods running a Job, especially after a Job has finished, e.g. by providing pointers to Pods in the JobStatus ([see comment](https://github.com/kubernetes/kubernetes/pull/11746/files#r37142628)). -- cgit v1.2.3 From 4ad8a68e14a047e5cf7be93b222b00198315882c Mon Sep 17 00:00:00 2001 From: Eric Paris Date: Thu, 3 Sep 2015 10:10:11 -0400 Subject: s|github.com/GoogleCloudPlatform/kubernetes|github.com/kubernetes/kubernetes| --- architecture.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/architecture.md b/architecture.md index b17345ef..2a761dea 100644 --- a/architecture.md +++ b/architecture.md @@ -51,7 +51,7 @@ The `kubelet` manages [pods](../user-guide/pods.md) and their containers, their ### `kube-proxy` -Each node also runs a simple network proxy and load balancer (see the [services FAQ](https://github.com/GoogleCloudPlatform/kubernetes/wiki/Services-FAQ) for more details). This reflects `services` (see [the services doc](../user-guide/services.md) for more details) as defined in the Kubernetes API on each node and can do simple TCP and UDP stream forwarding (round robin) across a set of backends. +Each node also runs a simple network proxy and load balancer (see the [services FAQ](https://github.com/kubernetes/kubernetes/wiki/Services-FAQ) for more details). This reflects `services` (see [the services doc](../user-guide/services.md) for more details) as defined in the Kubernetes API on each node and can do simple TCP and UDP stream forwarding (round robin) across a set of backends. Service endpoints are currently found via [DNS](../admin/dns.md) or through environment variables (both [Docker-links-compatible](https://docs.docker.com/userguide/dockerlinks/) and Kubernetes `{FOO}_SERVICE_HOST` and `{FOO}_SERVICE_PORT` variables are supported). These variables resolve to ports managed by the service proxy. -- cgit v1.2.3 From e695f6052c68d6d7ed131c4833fb20cfe91254b0 Mon Sep 17 00:00:00 2001 From: Marcin Wielgus Date: Mon, 7 Sep 2015 12:25:04 +0200 Subject: Update for scaling rules in HorizontalPodAutoscaler --- horizontal-pod-autoscaler.md | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/horizontal-pod-autoscaler.md b/horizontal-pod-autoscaler.md index 6ae84532..924988d2 100644 --- a/horizontal-pod-autoscaler.md +++ b/horizontal-pod-autoscaler.md @@ -200,16 +200,20 @@ and adjust the count of the Scale if needed to match the target The target number of pods will be calculated from the following formula: ``` -TargetNumOfPods = sum(CurrentPodsConsumption) / Target +TargetNumOfPods =ceil(sum(CurrentPodsConsumption) / Target) ``` -To make scaling more stable, scale-up will happen only when the floor of ```TargetNumOfPods``` is higher than -the current number, while scale-down will happen only when the ceiling of ```TargetNumOfPods``` is lower than -the current number. +Starting and stopping pods may introduce noise to the metrics (for instance starting may temporarily increase +CPU and decrease average memory consumption) so, after each action, the autoscaler should wait some time for reliable data. -The decision to scale-up will be executed instantly. -However, we will execute scale-down only if the sufficient time has passed from the last scale-up (e.g.: 10 minutes). -Such approach has two benefits: +Scale-up will happen if there was no rescaling within the last 3 minutes. +Scale-down will wait for 10 minutes from the last rescaling. Moreover any scaling will only be made if + +``` +avg(CurrentPodsConsumption) / Target +``` + +drops below 0.9 or increases above 1.1 (10% tolerance). Such approach has two benefits: * Autoscaler works in a conservative way. If new user load appears, it is important for us to rapidly increase the number of pods, @@ -218,10 +222,6 @@ Such approach has two benefits: * Autoscaler avoids thrashing, i.e.: prevents rapid execution of conflicting decision if the load is not stable. - -As the CPU consumption of a pod immediately after start may be highly variable due to initialization/startup, -autoscaler will skip metrics from the first minute of pod lifecycle. - ## Relative vs. absolute metrics The question arises whether the values of the target metrics should be absolute (e.g.: 0.6 core, 100MB of RAM) -- cgit v1.2.3 From cf287959ee7840799aefe2e49fd3909b45669dd9 Mon Sep 17 00:00:00 2001 From: Paul Morie Date: Tue, 8 Sep 2015 13:37:12 -0400 Subject: Update api change docs --- api_changes.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/api_changes.md b/api_changes.md index d26fdda9..45f0dd4c 100644 --- a/api_changes.md +++ b/api_changes.md @@ -399,6 +399,10 @@ The conversion code resides with each versioned API. There are two files: functions - `pkg/api//conversion_generated.go` containing auto-generated conversion functions + - `pkg/expapi//conversion.go` containing manually written conversion + functions + - `pkg/expapi//conversion_generated.go` containing auto-generated + conversion functions Since auto-generated conversion functions are using manually written ones, those manually written should be named with a defined convention, i.e. a function @@ -433,6 +437,7 @@ of your versioned api objects. The deep copy code resides with each versioned API: - `pkg/api//deep_copy_generated.go` containing auto-generated copy functions + - `pkg/expapi//deep_copy_generated.go` containing auto-generated copy functions To regenerate them: - run -- cgit v1.2.3 From 99bb877ce3f282ac5cc0899ae0e0645801e963f3 Mon Sep 17 00:00:00 2001 From: goltermann Date: Wed, 2 Sep 2015 14:51:19 -0700 Subject: Replace IRC with Slack in docs. --- writing-a-getting-started-guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/writing-a-getting-started-guide.md b/writing-a-getting-started-guide.md index 7441474a..c9d4e2ca 100644 --- a/writing-a-getting-started-guide.md +++ b/writing-a-getting-started-guide.md @@ -76,7 +76,7 @@ These guidelines say *what* to do. See the Rationale section for *why*. If you have a cluster partially working, but doing all the above steps seems like too much work, we still want to hear from you. We suggest you write a blog post or a Gist, and we will link to it on our wiki page. -Just file an issue or chat us on IRC and one of the committers will link to it from the wiki. +Just file an issue or chat us on [Slack](../troubleshooting.md#slack) and one of the committers will link to it from the wiki. ## Development Distro Guidelines -- cgit v1.2.3 From 1a62ae0c98bf9f280d27f4ab59d88c03f8d3f3dc Mon Sep 17 00:00:00 2001 From: dinghaiyang Date: Fri, 4 Sep 2015 18:44:56 +0800 Subject: Replace limits with request where appropriate --- resources.md | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/resources.md b/resources.md index fe6f0ec7..f9bbc8db 100644 --- a/resources.md +++ b/resources.md @@ -33,8 +33,8 @@ Documentation for other releases can be found at **Note: this is a design doc, which describes features that have not been completely implemented. User documentation of the current state is [here](../user-guide/compute-resources.md). The tracking issue for implementation of this model is -[#168](http://issue.k8s.io/168). Currently, only memory and -cpu limits on containers (not pods) are supported. "memory" is in bytes and "cpu" is in +[#168](http://issue.k8s.io/168). Currently, both limits and requests of memory and +cpu on containers (not pods) are supported. "memory" is in bytes and "cpu" is in milli-cores.** # The Kubernetes resource model @@ -123,7 +123,6 @@ Where: * Internally, the Kubernetes master can decide the defaulting behavior and the kubelet implementation may expected an absolute specification. For example, if the master decided that "the default is unbounded" it would pass 2^64 to the kubelet. - ## Kubernetes-defined resource types The following resource types are predefined ("reserved") by Kubernetes in the `kubernetes.io` namespace, and so cannot be used for user-defined resources. Note that the syntax of all resource types in the resource spec is deliberately similar, but some resource types (e.g., CPU) may receive significantly more support than simply tracking quantities in the schedulers and/or the Kubelet. -- cgit v1.2.3 From aeebd9492c5b1fda8925f36dbe2e134b2a2026b6 Mon Sep 17 00:00:00 2001 From: dinghaiyang Date: Fri, 4 Sep 2015 18:44:56 +0800 Subject: Replace limits with request where appropriate --- resource-qos.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/resource-qos.md b/resource-qos.md index 6d7ddcce..f76a0306 100644 --- a/resource-qos.md +++ b/resource-qos.md @@ -101,7 +101,8 @@ API changes for request - Add validation code that checks request <= limit, and validation test cases (api/validation/validation.go) Scheduler Changes -- Use requests instead of limits in CheckPodsExceedingCapacity and PodFitsResources (scheduler/algorithm/predicates.go) +- Predicates: Use requests instead of limits in CheckPodsExceedingCapacity and PodFitsResources (scheduler/algorithm/predicates/predicates.go) +- Priorities: Use requests instead of limits in LeastRequestedPriority and BalancedResourceAllocation(scheduler/algorithm/priorities/priorities.go)(PR #12718) Container Manager Changes - Use requests to assign CPU shares for Docker (kubelet/dockertools/container_manager.go) -- cgit v1.2.3 From 7c58a4a72923659e2a6036eec4eef7cd86b88d28 Mon Sep 17 00:00:00 2001 From: Kevin Date: Thu, 10 Sep 2015 00:22:43 +0800 Subject: fix a typo in development.md and update git_workflow.png --- development.md | 2 +- git_workflow.png | Bin 90004 -> 114745 bytes 2 files changed, 1 insertion(+), 1 deletion(-) diff --git a/development.md b/development.md index fc14333b..75cb2365 100644 --- a/development.md +++ b/development.md @@ -96,7 +96,7 @@ git push -f origin myfeature ### Creating a pull request -1. Visit http://github.com/$YOUR_GITHUB_USERNAME/kubernetes +1. Visit https://github.com/$YOUR_GITHUB_USERNAME/kubernetes 2. Click the "Compare and pull request" button next to your "myfeature" branch. 3. Check out the pull request [process](pull-requests.md) for more details diff --git a/git_workflow.png b/git_workflow.png index e3bd70da..80a66248 100644 Binary files a/git_workflow.png and b/git_workflow.png differ -- cgit v1.2.3 From 94c5155a987711b6cff4a4bdf13d463a6eddb42a Mon Sep 17 00:00:00 2001 From: Clayton Coleman Date: Wed, 9 Sep 2015 18:03:54 -0400 Subject: Define lock coding convention --- coding-conventions.md | 1 + 1 file changed, 1 insertion(+) diff --git a/coding-conventions.md b/coding-conventions.md index 1569d1aa..8ddf000e 100644 --- a/coding-conventions.md +++ b/coding-conventions.md @@ -50,6 +50,7 @@ Code conventions - so pkg/controllers/autoscaler/foo.go should say `package autoscaler` not `package autoscalercontroller`. - Unless there's a good reason, the `package foo` line should match the name of the directory in which the .go file exists. - Importers can use a different name if they need to disambiguate. + - Locks should be called `lock` and should never be embedded (always `lock sync.Mutex`). When multiple locks are present, give each lock a distinct name following Go conventions - `stateLock`, `mapLock` etc. - API conventions - [API changes](api_changes.md) - [API conventions](api-conventions.md) -- cgit v1.2.3 From bcd35da0c878b7821ddd01e67cfcfc092cf87a6e Mon Sep 17 00:00:00 2001 From: Chao Xu Date: Wed, 19 Aug 2015 16:12:44 -0700 Subject: adding a proposal for api groups --- api-group.md | 152 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 152 insertions(+) create mode 100644 api-group.md diff --git a/api-group.md b/api-group.md new file mode 100644 index 00000000..53531d43 --- /dev/null +++ b/api-group.md @@ -0,0 +1,152 @@ + + + + +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest 1.0.x release of this document can be found +[here](http://releases.k8s.io/release-1.0/docs/proposals/api-group.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- + + + + + +# Supporting multiple API groups + +## Goal + +1. Breaking the monolithic v1 API into modular groups and allowing groups to be enabled/disabled individually. This allows us to break the monolithic API server to smaller components in the future. + +2. Supporting different versions in different groups. This allows different groups to evolve at different speed. + +3. Supporting identically named kinds to exist in different groups. This is useful when we experiment new features of an API in the experimental group while supporting the stable API in the original group at the same time. + +4. Exposing the API groups and versions supported by the server. This is required to develop a dynamic client. + +5. Laying the basis for [API Plugin](../../docs/design/extending-api.md). + +6. Keeping the user interaction easy. For example, we should allow users to omit group name when using kubectl if there is no ambiguity. + + +## Bookkeeping for groups + +1. No changes to TypeMeta: + + Currently many internal structures, such as RESTMapper and Scheme, are indexed and retrieved by APIVersion. For a fast implementation targeting the v1.1 deadline, we will concatenate group with version, in the form of "group/version", and use it where a version string is expected, so that many code can be reused. This implies we will not add a new field to TypeMeta, we will use TypeMeta.APIVersion to hold "group/version". + + For backward compatibility, v1 objects belong to the group with an empty name, so existing v1 config files will remain valid. + +2. /pkg/conversion#Scheme: + + The key of /pkg/conversion#Scheme.versionMap for versioned types will be "group/version". For now, the internal version types of all groups will be registered to versionMap[""], as we don't have any identically named kinds in different groups yet. In the near future, internal version types will be registered to versionMap["group/"], and pkg/conversion#Scheme.InternalVersion will have type []string. + + We will need a mechanism to express if two kinds in different groups (e.g., compute/pods and experimental/pods) are convertible, and auto-generate the conversions if they are. + +3. meta.RESTMapper: + + Each group will have its own RESTMapper (of type DefaultRESTMapper), and these mappers will be registered to pkg/api#RESTMapper (of type MultiRESTMapper). + + To support identically named kinds in different groups, We need to expand the input of RESTMapper.VersionAndKindForResource from (resource string) to (group, resource string). If group is not specified and there is ambiguity (i.e., the resource exists in multiple groups), an error should be returned to force the user to specify the group. + +## Server-side implementation + +1. resource handlers' URL: + + We will force the URL to be in the form of prefix/group/version/... + + Prefix is used to differentiate API paths from other paths like /healthz. All groups will use the same prefix="apis", except when backward compatibility requires otherwise. No "/" is allowed in prefix, group, or version. Specifically, + + * for /api/v1, we set the prefix="api" (which is populated from cmd/kube-apiserver/app#APIServer.APIPrefix), group="", version="v1", so the URL remains to be /api/v1. + + * for new kube API groups, we will set the prefix="apis" (we will add a field in type APIServer to hold this prefix), group=GROUP_NAME, version=VERSION. For example, the URL of the experimental resources will be /apis/experimental/v1alpha1. + + * for OpenShift v1 API, because it's currently registered at /oapi/v1, to be backward compatible, OpenShift may set prefix="oapi", group="". + + * for other new third-party API, they should also use the prefix="apis" and choose the group and version. This can be done through the thirdparty API plugin mechanism in [13000](http://pr.k8s.io/13000). + +2. supporting API discovery: + + * At /prefix (e.g., /apis), API server will return the supported groups and their versions using pkg/api/unversioned#APIVersions type, setting the Versions field to "group/version". This is backward compatible, because currently API server does return "v1" encoded in pkg/api/unversioned#APIVersions at /api. (We will also rename the JSON field name from `versions` to `apiVersions`, to be consistent with pkg/api#TypeMeta.APIVersion field) + + * At /prefix/group, API server will return all supported versions of the group. We will create a new type VersionList (name is open to discussion) in pkg/api/unversioned as the API. + + * At /prefix/group/version, API server will return all supported resources in this group, and whether each resource is namespaced. We will create a new type APIResourceList (name is open to discussion) in pkg/api/unversioned as the API. + + We will design how to handle deeper path in other proposals. + + * At /swaggerapi/swagger-version/prefix/group/version, API server will return the Swagger spec of that group/version in `swagger-version` (e.g. we may support both Swagger v1.2 and v2.0). + +3. handling common API objects: + + * top-level common API objects: + + To handle the top-level API objects that are used by all groups, we either have to register them to all schemes, or we can choose not to encode them to a version. We plan to take the latter approach and place such types in a new package called `unversioned`, because many of the common top-level objects, such as APIVersions, VersionList, and APIResourceList, which are used in the API discovery, and pkg/api#Status, are part of the protocol between client and server, and do not belong to the domain-specific parts of the API, which will evolve independently over time. + + Types in the unversioned package will not have the APIVersion field, but may retain the Kind field. + + For backward compatibility, when hanlding the Status, the server will encode it to v1 if the client expects the Status to be encoded in v1, otherwise the server will send the unversioned#Status. If an error occurs before the version can be determined, the server will send the unversioned#Status. + + * non-top-level common API objects: + + Assuming object o belonging to group X is used as a field in an object belonging to group Y, currently genconversion will generate the conversion functions for o in package Y. Hence, we don't need any special treatment for non-top-level common API objects. + + TypeMeta is an exception, because it is a common object that is used by objects in all groups but does not logically belong to any group. We plan to move it to the package `unversioned`. + +## Client-side implementation + +1. clients: + + Currently we have structured (pkg/client/unversioned#ExperimentalClient, pkg/client/unversioned#Client) and unstructured (pkg/kubectl/resource#Helper) clients. The structured clients are not scalable because each of them implements specific interface, e.g., [here](../../pkg/client/unversioned/client.go#L32). Only the unstructured clients are scalable. We should either auto-generate the code for structured clients or migrate to use the unstructured clients as much as possible. + + We should also move the unstructured client to pkg/client/. + +2. Spelling the URL: + + The URL is in the form of prefix/group/version/. The prefix is hard-coded in the client/unversioned.Config (see [here](../../pkg/client/unversioned/experimental.go#L101)). The client should be able to figure out `group` and `version` using the RESTMapper. For a third-party client which does not have access to the RESTMapper, it should discover the mapping of `group`, `version` and `kind` by querying the server as described in point 2 of #server-side-implementation. + +3. kubectl: + + kubectl should accept arguments like `group/resource`, `group/resource/name`. Nevertheless, the user can omit the `group`, then kubectl shall rely on RESTMapper.VersionAndKindForResource() to figure out the default group/version of the resource. For example, for resources (like `node`) that exist in both k8s v1 API and k8s modularized API (like `infra/v2`), we should set kubectl default to use one of them. If there is no default group, kubectl should return an error for the ambiguity. + + When kubectl is used with a single resource type, the --api-version and --output-version flag of kubectl should accept values in the form of `group/version`, and they should work as they do today. For multi-resource operations, we will disable these two flags initially. + + Currently, by setting pkg/client/unversioned/clientcmd/api/v1#Config.NamedCluster[x].Cluster.APIVersion ([here](../../pkg/client/unversioned/clientcmd/api/v1/types.go#L58)), user can configure the default apiVersion used by kubectl to talk to server. It does not make sense to set a global version used by kubectl when there are multiple groups, so we plan to deprecate this field. We may extend the version negotiation function to negotiate the preferred version of each group. Details will be in another proposal. + +## OpenShift integration + +OpenShift can take a similar approach to break monolithic v1 API: keeping the v1 where they are, and gradually adding groups. + +For the v1 objects in OpenShift, they should keep doing what they do now: they should remain registered to Scheme.versionMap["v1"] scheme, they should keep being added to originMapper. + +For new OpenShift groups, they should do the same as native Kubernetes groups would do: each group should register to Scheme.versionMap["group/version"], each should has separate RESTMapper and the register the MultiRESTMapper. + +To expose a list of the supported Openshift groups to clients, OpenShift just has to call to pkg/cmd/server/origin#call initAPIVersionRoute() as it does now, passing in the supported "group/versions" instead of "versions". + + +## Future work + +1. Dependencies between groups: we need an interface to register the dependencies between groups. It is not our priority now as the use cases are not clear yet. + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/api-group.md?pixel)]() + -- cgit v1.2.3 From a5b0fdf237d90f6c2212811be8573896ea535f72 Mon Sep 17 00:00:00 2001 From: gmarek Date: Tue, 11 Aug 2015 16:58:24 +0200 Subject: Initial kubemark proposal --- Kubemark_architecture.png | Bin 0 -> 30417 bytes kubemark.md | 190 ++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 190 insertions(+) create mode 100644 Kubemark_architecture.png create mode 100644 kubemark.md diff --git a/Kubemark_architecture.png b/Kubemark_architecture.png new file mode 100644 index 00000000..479ad8b1 Binary files /dev/null and b/Kubemark_architecture.png differ diff --git a/kubemark.md b/kubemark.md new file mode 100644 index 00000000..51ea4375 --- /dev/null +++ b/kubemark.md @@ -0,0 +1,190 @@ + + + + +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest 1.0.x release of this document can be found +[here](http://releases.k8s.io/release-1.0/docs/proposals/kubemark.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- + + + + + +# Kubemark proposal + +## Goal of this document + +This document describes a design of Kubemark - a system that allows performance testing of a Kubernetes cluster. It describes the +assumption, high level design and discusses possible solutions for lower-level problems. It is supposed to be a starting point for more +detailed discussion. + +## Current state and objective + +Currently performance testing happens on ‘live’ clusters of up to 100 Nodes. It takes quite a while to start such cluster or to push +updates to all Nodes, and it uses quite a lot of resources. At this scale the amount of wasted time and used resources is still acceptable. +In the next quarter or two we’re targeting 1000 Node cluster, which will push it way beyond ‘acceptable’ level. Additionally we want to +enable people without many resources to run scalability tests on bigger clusters than they can afford at given time. Having an ability to +cheaply run scalability tests will enable us to run some set of them on "normal" test clusters, which in turn would mean ability to run +them on every PR. + +This means that we need a system that will allow for realistic performance testing on (much) smaller number of “real” machines. First +assumption we make is that Nodes are independent, i.e. number of existing Nodes do not impact performance of a single Node. This is not +entirely true, as number of Nodes can increase latency of various components on Master machine, which in turn may increase latency of Node +operations, but we’re not interested in measuring this effect here. Instead we want to measure how number of Nodes and the load imposed by +Node daemons affects the performance of Master components. + +## Kubemark architecture overview + +The high-level idea behind Kubemark is to write library that allows running artificial "Hollow" Nodes that will be able to simulate a +behavior of real Kubelet and KubeProxy in a single, lightweight binary. Hollow components will need to correctly respond to Controllers +(via API server), and preferably, in the fullness of time, be able to ‘replay’ previously recorded real traffic (this is out of scope for +initial version). To teach Hollow components replaying recorded traffic they will need to store data specifying when given Pod/Container +should die (e.g. observed lifetime). Such data can be extracted e.g. from etcd Raft logs, or it can be reconstructed from Events. In the +initial version we only want them to be able to fool Master components and put some configurable (in what way TBD) load on them. + +When we have Hollow Node ready, we’ll be able to test performance of Master Components by creating a real Master Node, with API server, +Controllers, etcd and whatnot, and create number of Hollow Nodes that will register to the running Master. + +To make Kubemark easier to maintain when system evolves Hollow components will reuse real "production" code for Kubelet and KubeProxy, but +will mock all the backends with no-op or very simple mocks. We believe that this approach is better in the long run than writing special +"performance-test-aimed" separate version of them. This may take more time to create an initial version, but we think maintenance cost will +be noticeably smaller. + +### Option 1 + +For the initial version we will teach Master components to use port number to identify Kubelet/KubeProxy. This will allow running those +components on non-default ports, and in the same time will allow to run multiple Hollow Nodes on a single machine. During setup we will +generate credentials for cluster communication and pass them to HollowKubelet/HollowProxy to use. Master will treat all HollowNodes as +normal ones. + +![Kubmark architecture diagram for option 1](Kubemark_architecture.png?raw=true "Kubemark architecture overview") +*Kubmark architecture diagram for option 1* + +### Option 2 + +As a second (equivalent) option we will run Kubemark on top of 'real' Kubernetes cluster, where both Master and Hollow Nodes will be Pods. +In this option we'll be able to use Kubernetes mechanisms to streamline setup, e.g. by using Kubernetes networking to ensure unique IPs for +Hollow Nodes, or using Secrets to distribute Kubelet credentials. The downside of this configuration is that it's likely that some noise +will appear in Kubemark results from either CPU/Memory pressure from other things running on Nodes (e.g. FluentD, or Kubelet) or running +cluster over an overlay network. We believe that it'll be possible to turn off cluster monitoring for Kubemark runs, so that the impact +of real Node daemons will be minimized, but we don't know what will be the impact of using higher level networking stack. Running a +comparison will be an interesting test in itself. + +### Discussion + +Before taking a closer look at steps necessary to set up a minimal Hollow cluster it's hard to tell which approach will be simpler. It's +quite possible that the initial version will end up as hybrid between running the Hollow cluster directly on top of VMs and running the +Hollow cluster on top of a Kubernetes cluster that is running on top of VMs. E.g. running Nodes as Pods in Kubernetes cluster and Master +directly on top of VM. + +## Things to simulate + +In real Kubernetes on a single Node we run two daemons that communicate with Master in some way: Kubelet and KubeProxy. + +### KubeProxy + +As a replacement for KubeProxy we'll use HollowProxy, which will be a real KubeProxy with injected no-op mocks everywhere it makes sense. + +### Kubelet + +As a replacement for Kubelet we'll use HollowKubelet, which will be a real Kubelet with injected no-op or simple mocks everywhere it makes +sense. + +Kubelet also exposes cadvisor endpoint which is scraped by Heapster, healthz to be read by supervisord, and we have FluentD running as a +Pod on each Node that exports logs to Elasticsearch (or Google Cloud Logging). Both Heapster and Elasticsearch are running in Pods in the +cluster so do not add any load on a Master components by themselves. There can be other systems that scrape Heapster through proxy running +on Master, which adds additional load, but they're not the part of default setup, so in the first version we won't simulate this behavior. + +In the first version we’ll assume that all started Pods will run indefinitely if not explicitly deleted. In the future we can add a model +of short-running batch jobs, but in the initial version we’ll assume only serving-like Pods. + +### Heapster + +In addition to system components we run Heapster as a part of cluster monitoring setup. Heapster currently watches Events, Pods and Nodes +through the API server. In the test setup we can use real heapster for watching API server, with mocked out piece that scrapes cAdvisor +data from Kubelets. + +### Elasticsearch and Fluentd + +Similarly to Heapster Elasticsearch runs outside the Master machine but generates some traffic on it. Fluentd “daemon” running on Master +periodically sends Docker logs it gathered to the Elasticsearch running on one of the Nodes. In the initial version we omit Elasticsearch, +as it produces only a constant small load on Master Node that does not change with the size of the cluster. + +## Necessary work + +There are three more or less independent things that needs to be worked on: +- HollowNode implementation, creating a library/binary that will be able to listen to Watches and respond in a correct fashion with Status +updates. This also involves creation of a CloudProvider that can produce such Hollow Nodes, or making sure that HollowNodes can correctly +self-register in no-provider Master. +- Kubemark setup, including figuring networking model, number of Hollow Nodes that will be allowed to run on a single “machine”, writing +setup/run/teardown scripts (in [option 1](#option-1)), or figuring out how to run Master and Hollow Nodes on top of Kubernetes +(in [option 2](#option-2)) +- Creating a Player component that will send requests to the API server putting a load on a cluster. This involves creating a way to +specify desired workload. This task is +very well isolated from the rest, as it is about sending requests to the real API server. Because of that we can discuss requirements +separately. + +## Concerns + +Network performance most likely won't be a problem for the initial version if running on directly on VMs rather than on top of a Kubernetes +cluster, as Kubemark will be running on standard networking stack (no cloud-provider software routes, or overlay network is needed, as we +don't need custom routing between Pods). Similarly we don't think that running Kubemark on Kubernetes virtualized cluster networking will +cause noticeable performance impact, but it requires testing. + +On the other hand when adding additional features it may turn out that we need to simulate Kubernetes Pod network. In such, when running +'pure' Kubemark we may try one of the following: + - running overlay network like Flannel or OVS instead of using cloud providers routes, + - write simple network multiplexer to multiplex communications from the Hollow Kubelets/KubeProxies on the machine. + +In case of Kubemark on Kubernetes it may turn that we run into a problem with adding yet another layer of network virtualization, but we +don't need to solve this problem now. + +## Work plan + +- Teach/make sure that Master can talk to multiple Kubelets on the same Machine [option 1](#option-1): + - make sure that Master can talk to a Kubelet on non-default port, + - make sure that Master can talk to all Kubelets on different ports, +- Write HollowNode library: + - new HollowProxy, + - new HollowKubelet, + - new HollowNode combining the two, + - make sure that Master can talk to two HollowKubelets running on the same machine +- Make sure that we can run Hollow cluster on top of Kubernetes [option 2](#option-2) +- Write a player that will automatically put some predefined load on Master, <- this is the moment when it’s possible to play with it and is useful by itself for +scalability tests. Alternatively we can just use current density/load tests, +- Benchmark our machines - see how many Watch clients we can have before everything explodes, +- See how many HollowNodes we can run on a single machine by attaching them to the real master <- this is the moment it starts to useful +- Update kube-up/kube-down scripts to enable creating “HollowClusters”/write a new scripts/something, integrate HollowCluster with a Elasticsearch/Heapster equivalents, +- Allow passing custom configuration to the Player + +## Future work + +In the future we want to add following capabilities to the Kubemark system: +- replaying real traffic reconstructed from the recorded Events stream, +- simulating scraping things running on Nodes through Master proxy. + + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/kubemark.md?pixel)]() + -- cgit v1.2.3 From b1fef7374e6dc465845feb0459b703827d0e4081 Mon Sep 17 00:00:00 2001 From: Daniel Smith Date: Thu, 3 Sep 2015 14:50:12 -0700 Subject: Manually fixing docs, since gendocs messes up the links. --- event_compression.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/event_compression.md b/event_compression.md index e8f9775b..3d900e07 100644 --- a/event_compression.md +++ b/event_compression.md @@ -60,7 +60,7 @@ Instead of a single Timestamp, each event object [contains](http://releases.k8s. Each binary that generates events: * Maintains a historical record of previously generated events: - * Implemented with ["Least Recently Used Cache"](https://github.com/golang/groupcache/blob/master/lru/lru.go) in [`pkg/client/unversioned/record/events_cache.go`](../../pkg/client/unversioned/record/events_cache.go). + * Implemented with ["Least Recently Used Cache"](https://github.com/golang/groupcache/blob/master/lru/lru.go) in [`pkg/client/record/events_cache.go`](../../pkg/client/record/events_cache.go). * The key in the cache is generated from the event object minus timestamps/count/transient fields, specifically the following events fields are used to construct a unique key for an event: * `event.Source.Component` * `event.Source.Host` -- cgit v1.2.3 From d6f413931d2206f8a4788df03552fe1e99c508b0 Mon Sep 17 00:00:00 2001 From: Daniel Smith Date: Thu, 3 Sep 2015 14:50:12 -0700 Subject: Manually fixing docs, since gendocs messes up the links. --- apiserver-watch.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/apiserver-watch.md b/apiserver-watch.md index b8069030..2917eec4 100644 --- a/apiserver-watch.md +++ b/apiserver-watch.md @@ -166,7 +166,7 @@ the same time, we can introduce an additional etcd event type: Thus, we need to create the EtcdResync event, extend watch.Interface and its implementations to support it and handle those events appropriately in places like - [Reflector](../../pkg/client/unversioned/cache/reflector.go) + [Reflector](../../pkg/client/cache/reflector.go) However, this might turn out to be unnecessary optimization if apiserver will always keep up (which is possible in the new design). We will work -- cgit v1.2.3 From 15a05b820df9675188b8dc83341014ad3ec3319a Mon Sep 17 00:00:00 2001 From: derekwaynecarr Date: Tue, 8 Sep 2015 11:03:08 -0400 Subject: Move resource quota doc from user-guide to admin --- admission_control_resource_quota.md | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/admission_control_resource_quota.md b/admission_control_resource_quota.md index 4b417ead..a9de7a9c 100644 --- a/admission_control_resource_quota.md +++ b/admission_control_resource_quota.md @@ -201,9 +201,9 @@ kubectl is modified to support the **ResourceQuota** resource. For example, ```console -$ kubectl create -f docs/user-guide/resourcequota/namespace.yaml +$ kubectl create -f docs/admin/resourcequota/namespace.yaml namespace "quota-example" created -$ kubectl create -f docs/user-guide/resourcequota/quota.yaml --namespace=quota-example +$ kubectl create -f docs/admin/resourcequota/quota.yaml --namespace=quota-example resourcequota "quota" created $ kubectl describe quota quota --namespace=quota-example Name: quota @@ -222,8 +222,7 @@ services 0 5 ## More information -See [resource quota document](../admin/resource-quota.md) and the [example of Resource Quota](../user-guide/resourcequota/) for more information. - +See [resource quota document](../admin/resource-quota.md) and the [example of Resource Quota](../admin/resourcequota/) for more information. [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/admission_control_resource_quota.md?pixel)]() -- cgit v1.2.3 From acb2ce01b3f5000553d4cc407efcd046cb5c46de Mon Sep 17 00:00:00 2001 From: Daniel Smith Date: Wed, 9 Sep 2015 16:01:08 -0700 Subject: Fix tooling for apis/experimental's new home * fix package name * add a script to auto-gofmt everything, useful after grep/sed incantations * update conversion/deep copy generation * doc update --- api_changes.md | 27 ++++++++++++++++++++++----- 1 file changed, 22 insertions(+), 5 deletions(-) diff --git a/api_changes.md b/api_changes.md index 45f0dd4c..e0a65fe0 100644 --- a/api_changes.md +++ b/api_changes.md @@ -38,7 +38,7 @@ with a number of existing API types and with the [API conventions](api-conventions.md). If creating a new API type/resource, we also recommend that you first send a PR containing just a proposal for the new API types, and that you initially target -the experimental API (pkg/expapi). +the experimental API (pkg/apis/experimental). The Kubernetes API has two major components - the internal structures and the versioned APIs. The versioned APIs are intended to be stable, while the @@ -399,10 +399,10 @@ The conversion code resides with each versioned API. There are two files: functions - `pkg/api//conversion_generated.go` containing auto-generated conversion functions - - `pkg/expapi//conversion.go` containing manually written conversion - functions - - `pkg/expapi//conversion_generated.go` containing auto-generated + - `pkg/apis/experimental//conversion.go` containing manually written conversion functions + - `pkg/apis/experimental//conversion_generated.go` containing + auto-generated conversion functions Since auto-generated conversion functions are using manually written ones, those manually written should be named with a defined convention, i.e. a function @@ -437,7 +437,7 @@ of your versioned api objects. The deep copy code resides with each versioned API: - `pkg/api//deep_copy_generated.go` containing auto-generated copy functions - - `pkg/expapi//deep_copy_generated.go` containing auto-generated copy functions + - `pkg/apis/experimental//deep_copy_generated.go` containing auto-generated copy functions To regenerate them: - run @@ -446,6 +446,23 @@ To regenerate them: hack/update-generated-deep-copies.sh ``` +## Making a new API Group + +This section is under construction, as we make the tooling completely generic. + +At the moment, you'll have to make a new directory under pkg/apis/; copy the +directory structure from pkg/apis/experimental. Add the new group/version to all +of the hack/{verify,update}-generated-{deep-copy,conversions,swagger}.sh files +in the appropriate places--it should just require adding your new group/version +to a bash array. You will also need to make sure your new types are imported by +the generation commands (cmd/gendeepcopy/ & cmd/genconversion). These +instructions may not be complete and will be updated as we gain experience. + +Adding API groups outside of the pkg/apis/ directory is not currently supported, +but is clearly desirable. The deep copy & conversion generators need to work by +parsing go files instead of by reflection; then they will be easy to point at +arbitrary directories: see issue [#13775](http://issue.k8s.io/13775). + ## Update the fuzzer Part of our testing regimen for APIs is to "fuzz" (fill with random values) API -- cgit v1.2.3 From f1c3e8db656da678e73bf5f6ef0173e2d943fb57 Mon Sep 17 00:00:00 2001 From: eulerzgy Date: Mon, 14 Sep 2015 17:46:59 +0800 Subject: fix document --- access.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/access.md b/access.md index 92840f73..123516f9 100644 --- a/access.md +++ b/access.md @@ -66,18 +66,18 @@ This document is primarily concerned with K8s API paths, and secondarily with In ### Assets to protect External User assets: - - Personal information like private messages, or images uploaded by External Users - - web server logs + - Personal information like private messages, or images uploaded by External Users. + - web server logs. K8s User assets: - - External User assets of each K8s User + - External User assets of each K8s User. - things private to the K8s app, like: - credentials for accessing other services (docker private repos, storage services, facebook, etc) - SSL certificates for web servers - proprietary data and code K8s Cluster assets: - - Assets of each K8s User + - Assets of each K8s User. - Machine Certificates or secrets. - The value of K8s cluster computing resources (cpu, memory, etc). @@ -104,7 +104,7 @@ Org-run cluster: - Nodes may be on-premises VMs or physical machines; Cloud VMs; or a mix. Hosted cluster: - - Offering K8s API as a service, or offering a Paas or Saas built on K8s + - Offering K8s API as a service, or offering a Paas or Saas built on K8s. - May already offer web services, and need to integrate with existing customer account concept, and existing authentication, accounting, auditing, and security policy infrastructure. - May want to leverage K8s User accounts and accounting to manage their User accounts (not a priority to support this use case.) - Precise and accurate accounting of resources needed. Resource controls needed for hard limits (Users given limited slice of data) and soft limits (Users can grow up to some limit and then be expanded). @@ -137,7 +137,7 @@ K8s will have a `userAccount` API object. - `userAccount` has a UID which is immutable. This is used to associate users with objects and to record actions in audit logs. - `userAccount` has a name which is a string and human readable and unique among userAccounts. It is used to refer to users in Policies, to ensure that the Policies are human readable. It can be changed only when there are no Policy objects or other objects which refer to that name. An email address is a suggested format for this field. - `userAccount` is not related to the unix username of processes in Pods created by that userAccount. -- `userAccount` API objects can have labels +- `userAccount` API objects can have labels. The system may associate one or more Authentication Methods with a `userAccount` (but they are not formally part of the userAccount object.) -- cgit v1.2.3 From 04666c6e834df249cf6d56cd4831477be9f512b1 Mon Sep 17 00:00:00 2001 From: derekwaynecarr Date: Mon, 14 Sep 2015 13:03:11 -0400 Subject: Fix broken link to submit queue --- pull-requests.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pull-requests.md b/pull-requests.md index a81c01c5..1050cd0d 100644 --- a/pull-requests.md +++ b/pull-requests.md @@ -52,7 +52,7 @@ Life of a Pull Request Unless in the last few weeks of a milestone when we need to reduce churn and stabilize, we aim to be always accepting pull requests. -Either the [on call](https://github.com/kubernetes/kubernetes/wiki/Kubernetes-on-call-rotation) manually or the [submit queue](https://github.com/contrib/tree/master/submit-queue) automatically will manage merging PRs. +Either the [on call](https://github.com/kubernetes/kubernetes/wiki/Kubernetes-on-call-rotation) manually or the [submit queue](https://github.com/kubernetes/contrib/tree/master/submit-queue) automatically will manage merging PRs. There are several requirements for the submit queue to work: * Author must have signed CLA ("cla: yes" label added to PR) -- cgit v1.2.3 From 15de2cf23060b648291401e201359d32794885c0 Mon Sep 17 00:00:00 2001 From: Brendan Burns Date: Tue, 8 Sep 2015 11:16:14 -0700 Subject: Add some documentation describing out developer/repository automation. --- README.md | 2 + automation.md | 138 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ pull-requests.md | 6 +++ 3 files changed, 146 insertions(+) create mode 100644 automation.md diff --git a/README.md b/README.md index 267bca23..756846ce 100644 --- a/README.md +++ b/README.md @@ -51,6 +51,8 @@ Guide](../admin/README.md). * **Getting Recent Builds** ([getting-builds.md](getting-builds.md)): How to get recent builds including the latest builds that pass CI. +* **Automated Tools** ([automation.md](automation.md)): Descriptions of the automation that is running on our github repository. + ## Setting up your dev environment, coding, and debugging diff --git a/automation.md b/automation.md new file mode 100644 index 00000000..eb36cc63 --- /dev/null +++ b/automation.md @@ -0,0 +1,138 @@ + + + + +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest 1.0.x release of this document can be found +[here](http://releases.k8s.io/release-1.0/docs/devel/automation.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- + + + + + +# Kubernetes Development Automation + +## Overview + +Kubernetes uses a variety of automated tools in an attempt to relieve developers of repeptitive, low +brain power work. This document attempts to describe these processes. + + +## Submit Queue + +In an effort to + * reduce load on core developers + * maintain e2e stability + * load test githubs label feature + +We have added an automated [submit-queue](https://github.com/kubernetes/contrib/tree/master/submit-queue) +for kubernetes. + +The submit-queue does the following: + +```go +for _, pr := range readyToMergePRs() { + if testsAreStable() { + mergePR(pr) + } +} +``` + +The status of the submit-queue is [online.](http://submit-queue.k8s.io/) + +### Ready to merge status + +A PR is considered "ready for merging" if it matches the following: + * it has the `lgtm` label, and that `lgtm` is newer than the latest commit + * it has passed the cla pre-submit and has the `cla:yes` label + * it has passed the travis and shippable pre-submit tests + * one (or all) of + * its author is in kubernetes/contrib/submit-queue/whitelist.txt + * its author is in contributors.txt via the github API. + * the PR has the `ok-to-merge` label + * One (or both of) + * it has passed the Jenkins e2e test + * it has the `e2e-not-required` label + +Note that the combined whitelist/committer list is available at [submit-queue.k8s.io](http://submit-queue.k8s.io) + +### Merge process + +Merges _only_ occur when the `critical builds` (Jenkins e2e for gce, gke, scalability, upgrade) are passing. +We're open to including more builds here, let us know... + +Merges are serialized, so only a single PR is merged at a time, to ensure against races. + +If the PR has the `e2e-not-required` label, it is simply merged. +If the PR does not have this label, e2e tests are re-run, if these new tests pass, the PR is merged. + +If e2e flakes or is currently buggy, the PR will not be merged, but it will be re-run on the following +pass. + +## Github Munger + +We also run a [github "munger"](https://github.com/kubernetes/contrib/tree/master/mungegithub) + +This runs repeatedly over github pulls and issues and runs modular "mungers" similar to "mungedocs" + +Currently this runs: + * blunderbuss - Tries to automatically find an owner for a PR without an owner, uses mapping file here: + https://github.com/kubernetes/contrib/blob/master/mungegithub/blunderbuss.yml + * needs-rebase - Adds `needs-rebase` to PRs that aren't currently mergeable, and removes it from those that are. + * size - Adds `size/xs` - `size/xxl` labels to PRs + * ok-to-test - Adds the `ok-to-test` message to PRs that have an `lgtm` but the e2e-builder would otherwise not test due to whitelist + * ping-ci - Attempts to ping the ci systems (Travis/Shippable) if they are missing from a PR. + * lgtm-after-commit - Removes the `lgtm` label from PRs where there are commits that are newer than the `lgtm` label + +In the works: + * issue-detector - machine learning for determining if an issue that has been filed is a `support` issue, `bug` or `feature` + +Please feel free to unleash your creativity on this tool, send us new mungers that you think will help support the Kubernetes development process. + +## PR builder + +We also run a robotic PR builder that attempts to run e2e tests for each PR. + +Before a PR from an unknown user is run, the PR builder bot (`k8s-bot`) asks to a message from a +contributor that a PR is "ok to test", the contributor replies with that message. Contributors can also +add users to the whitelist by replying with the message "add to whitelist" ("please" is optional, but +remember to treat your robots with kindness...) + +If a PR is approved for testing, and tests either haven't run, or need to be re-run, you can ask the +PR builder to re-run the tests. To do this, reply to the PR with a message that begins with `@k8s-bot test this`, this should trigger a re-build/re-test. + + +## FAQ: + +#### How can I ask my PR to be tested again for Jenkins failures? + +Right now you have to ask a contributor (this may be you!) to re-run the test with "@k8s-bot test this" + +### How can I kick Shippable to re-test on a failure? + +Right now the easiest way is to close and then immediately re-open the PR. + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/automation.md?pixel)]() + diff --git a/pull-requests.md b/pull-requests.md index a81c01c5..a187920e 100644 --- a/pull-requests.md +++ b/pull-requests.md @@ -61,6 +61,12 @@ There are several requirements for the submit queue to work: Additionally, for infrequent or new contributors, we require the on call to apply the "ok-to-merge" label manually. This is gated by the [whitelist](https://github.com/kubernetes/contrib/tree/master/submit-queue/whitelist.txt). +Automation +---------- + +We use a variety of automation to manage pull requests. This automation is described in detail +[elsewhere.](automation.md) + [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/pull-requests.md?pixel)]() -- cgit v1.2.3 From a277212041843009b67f824bbf15ee57fea82b7e Mon Sep 17 00:00:00 2001 From: Zach Loafman Date: Mon, 14 Sep 2015 17:05:05 -0700 Subject: Fix the checkout instructions --- releasing.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/releasing.md b/releasing.md index 9950e6e4..9a73405f 100644 --- a/releasing.md +++ b/releasing.md @@ -139,7 +139,7 @@ manage cherry picks prior to cutting the release. 1. `export VER=x.y` (e.g. `0.20` for v0.20) 1. `export PATCH=Z` where `Z` is the patch level of `vX.Y.Z` 1. cd to the base of the repo -1. `git fetch upstream && git checkout -b upstream/release-${VER}` +1. `git fetch upstream && git checkout -b upstream/release-${VER} release-${VER}` 1. Make sure you don't have any files you care about littering your repo (they better be checked in or outside the repo, or the next step will delete them). 1. `make clean && git reset --hard HEAD && git clean -xdf` -- cgit v1.2.3 From 19ba8e37c486422cafcfcfa8647c22d08e20a981 Mon Sep 17 00:00:00 2001 From: Brian Grant Date: Tue, 15 Sep 2015 18:24:02 +0000 Subject: A couple more naming conventions. --- api-conventions.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/api-conventions.md b/api-conventions.md index 746d56cb..e7b8b4e9 100644 --- a/api-conventions.md +++ b/api-conventions.md @@ -679,7 +679,7 @@ Accumulate repeated events in the client, especially for frequent events, to red ## Naming conventions * Go field names must be CamelCase. JSON field names must be camelCase. Other than capitalization of the initial letter, the two should almost always match. No underscores nor dashes in either. -* Field and resource names should be declarative, not imperative (DoSomething, SomethingDoer). +* Field and resource names should be declarative, not imperative (DoSomething, SomethingDoer, DoneBy, DoneAt). * `Minion` has been deprecated in favor of `Node`. Use `Node` where referring to the node resource in the context of the cluster. Use `Host` where referring to properties of the individual physical/virtual system, such as `hostname`, `hostPath`, `hostNetwork`, etc. * `FooController` is a deprecated kind naming convention. Name the kind after the thing being controlled instead (e.g., `Job` rather than `JobController`). * The name of a field that specifies the time at which `something` occurs should be called `somethingTime`. Do not use `stamp` (e.g., `creationTimestamp`). @@ -690,6 +690,7 @@ Accumulate repeated events in the client, especially for frequent events, to red * Do not use abbreviations in the API, except where they are extremely commonly used, such as "id", "args", or "stdin". * Acronyms should similarly only be used when extremely commonly known. All letters in the acronym should have the same case, using the appropriate case for the situation. For example, at the beginning of a field name, the acronym should be all lowercase, such as "httpGet". Where used as a constant, all letters should be uppercase, such as "TCP" or "UDP". * The name of a field referring to another resource of kind `Foo` by name should be called `fooName`. The name of a field referring to another resource of kind `Foo` by ObjectReference (or subset thereof) should be called `fooRef`. +* More generally, include the units and/or type in the field name if they could be ambiguous and they are not specified by the value or value type. ## Label, selector, and annotation conventions -- cgit v1.2.3 From 26b055b78d12a6bf5f59ea717181ead15fa93f74 Mon Sep 17 00:00:00 2001 From: eulerzgy Date: Wed, 16 Sep 2015 02:30:42 +0800 Subject: fix the change of minions to nodes --- developer-guides/vagrant.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/developer-guides/vagrant.md b/developer-guides/vagrant.md index f451d755..d6a902b2 100644 --- a/developer-guides/vagrant.md +++ b/developer-guides/vagrant.md @@ -245,7 +245,7 @@ my-nginx-kqdjk 1/1 Waiting 0 33s my-nginx-nyj3x 1/1 Waiting 0 33s ``` -You need to wait for the provisioning to complete, you can monitor the minions by doing: +You need to wait for the provisioning to complete, you can monitor the nodes by doing: ```console $ sudo salt '*minion-1' cmd.run 'docker images' -- cgit v1.2.3 From 744c48405562541de55f5ebb8f54bd94d5129a28 Mon Sep 17 00:00:00 2001 From: eulerzgy Date: Wed, 16 Sep 2015 02:30:42 +0800 Subject: fix the change of minions to nodes --- event_compression.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/event_compression.md b/event_compression.md index 3d900e07..424f9ac2 100644 --- a/event_compression.md +++ b/event_compression.md @@ -96,11 +96,11 @@ Thu, 12 Feb 2015 01:13:02 +0000 Thu, 12 Feb 2015 01:13:02 +0000 1 Thu, 12 Feb 2015 01:13:09 +0000 Thu, 12 Feb 2015 01:13:09 +0000 1 kubernetes-minion-1.c.saad-dev-vms.internal Minion starting {kubelet kubernetes-minion-1.c.saad-dev-vms.internal} Starting kubelet. Thu, 12 Feb 2015 01:13:09 +0000 Thu, 12 Feb 2015 01:13:09 +0000 1 kubernetes-minion-3.c.saad-dev-vms.internal Minion starting {kubelet kubernetes-minion-3.c.saad-dev-vms.internal} Starting kubelet. Thu, 12 Feb 2015 01:13:09 +0000 Thu, 12 Feb 2015 01:13:09 +0000 1 kubernetes-minion-2.c.saad-dev-vms.internal Minion starting {kubelet kubernetes-minion-2.c.saad-dev-vms.internal} Starting kubelet. -Thu, 12 Feb 2015 01:13:05 +0000 Thu, 12 Feb 2015 01:13:12 +0000 4 monitoring-influx-grafana-controller-0133o Pod failedScheduling {scheduler } Error scheduling: no minions available to schedule pods -Thu, 12 Feb 2015 01:13:05 +0000 Thu, 12 Feb 2015 01:13:12 +0000 4 elasticsearch-logging-controller-fplln Pod failedScheduling {scheduler } Error scheduling: no minions available to schedule pods -Thu, 12 Feb 2015 01:13:05 +0000 Thu, 12 Feb 2015 01:13:12 +0000 4 kibana-logging-controller-gziey Pod failedScheduling {scheduler } Error scheduling: no minions available to schedule pods -Thu, 12 Feb 2015 01:13:05 +0000 Thu, 12 Feb 2015 01:13:12 +0000 4 skydns-ls6k1 Pod failedScheduling {scheduler } Error scheduling: no minions available to schedule pods -Thu, 12 Feb 2015 01:13:05 +0000 Thu, 12 Feb 2015 01:13:12 +0000 4 monitoring-heapster-controller-oh43e Pod failedScheduling {scheduler } Error scheduling: no minions available to schedule pods +Thu, 12 Feb 2015 01:13:05 +0000 Thu, 12 Feb 2015 01:13:12 +0000 4 monitoring-influx-grafana-controller-0133o Pod failedScheduling {scheduler } Error scheduling: no nodes available to schedule pods +Thu, 12 Feb 2015 01:13:05 +0000 Thu, 12 Feb 2015 01:13:12 +0000 4 elasticsearch-logging-controller-fplln Pod failedScheduling {scheduler } Error scheduling: no nodes available to schedule pods +Thu, 12 Feb 2015 01:13:05 +0000 Thu, 12 Feb 2015 01:13:12 +0000 4 kibana-logging-controller-gziey Pod failedScheduling {scheduler } Error scheduling: no nodes available to schedule pods +Thu, 12 Feb 2015 01:13:05 +0000 Thu, 12 Feb 2015 01:13:12 +0000 4 skydns-ls6k1 Pod failedScheduling {scheduler } Error scheduling: no nodes available to schedule pods +Thu, 12 Feb 2015 01:13:05 +0000 Thu, 12 Feb 2015 01:13:12 +0000 4 monitoring-heapster-controller-oh43e Pod failedScheduling {scheduler } Error scheduling: no nodes available to schedule pods Thu, 12 Feb 2015 01:13:20 +0000 Thu, 12 Feb 2015 01:13:20 +0000 1 kibana-logging-controller-gziey BoundPod implicitly required container POD pulled {kubelet kubernetes-minion-4.c.saad-dev-vms.internal} Successfully pulled image "kubernetes/pause:latest" Thu, 12 Feb 2015 01:13:20 +0000 Thu, 12 Feb 2015 01:13:20 +0000 1 kibana-logging-controller-gziey Pod scheduled {scheduler } Successfully assigned kibana-logging-controller-gziey to kubernetes-minion-4.c.saad-dev-vms.internal ``` -- cgit v1.2.3 From 50edc9fbc1608317e92eb70eccbc093a108e093e Mon Sep 17 00:00:00 2001 From: Sam Ghods Date: Tue, 15 Sep 2015 14:46:41 -0700 Subject: Rename Deployment API structs --- deployment.md | 22 ++++++++++++---------- 1 file changed, 12 insertions(+), 10 deletions(-) diff --git a/deployment.md b/deployment.md index 6819acee..ab23e69d 100644 --- a/deployment.md +++ b/deployment.md @@ -69,7 +69,9 @@ type DeploymentSpec struct { Replicas *int // Label selector for pods. Existing ReplicationControllers whose pods are - // selected by this will be scaled down. + // selected by this will be scaled down. New ReplicationControllers will be + // created with this selector, with a unique label as defined by UniqueLabelKey. + // If Selector is empty, it is defaulted to the labels present on the Pod template. Selector map[string]string // Describes the pods that will be created. @@ -90,27 +92,27 @@ type DeploymentSpec struct { type DeploymentStrategy struct { // Type of deployment. Can be "Recreate" or "RollingUpdate". - Type DeploymentType + Type DeploymentStrategyType // TODO: Update this to follow our convention for oneOf, whatever we decide it // to be. - // Rolling update config params. Present only if DeploymentType = + // Rolling update config params. Present only if DeploymentStrategyType = // RollingUpdate. - RollingUpdate *RollingUpdateDeploymentSpec + RollingUpdate *RollingUpdateDeploymentStrategy } -type DeploymentType string +type DeploymentStrategyType string const ( // Kill all existing pods before creating new ones. - DeploymentRecreate DeploymentType = "Recreate" + RecreateDeploymentStrategyType DeploymentStrategyType = "Recreate" // Replace the old RCs by new one using rolling update i.e gradually scale down the old RCs and scale up the new one. - DeploymentRollingUpdate DeploymentType = "RollingUpdate" + RollingUpdateDeploymentStrategyType DeploymentStrategyType = "RollingUpdate" ) // Spec to control the desired behavior of rolling update. -type RollingUpdateDeploymentSpec struct { +type RollingUpdateDeploymentStrategy struct { // The maximum number of pods that can be unavailable during the update. // Value can be an absolute number (ex: 5) or a percentage of total pods at the start of update (ex: 10%). // Absolute number is calculated from percentage by rounding up. @@ -246,7 +248,7 @@ To begin with, we will support 2 types of deployment: This results in a slower deployment, but there is no downtime. At all times during the deployment, there are a few pods available (old or new). The number of available pods and when is a pod considered "available" can be configured - using RollingUpdateDeploymentSpec. + using RollingUpdateDeploymentStrategy. In future, we want to support more deployment types. @@ -254,7 +256,7 @@ In future, we want to support more deployment types. Apart from the above, we want to add support for the following: * Running the deployment process in a pod: In future, we can run the deployment process in a pod. Then users can define their own custom deployments and we can run it using the image name. -* More DeploymentTypes: https://github.com/openshift/origin/blob/master/examples/deployment/README.md#deployment-types lists most commonly used ones. +* More DeploymentStrategyTypes: https://github.com/openshift/origin/blob/master/examples/deployment/README.md#deployment-types lists most commonly used ones. * Triggers: Deployment will have a trigger field to identify what triggered the deployment. Options are: Manual/UserTriggered, Autoscaler, NewImage. * Automatic rollback on error: We want to support automatic rollback on error or timeout. -- cgit v1.2.3 From 09de43d161f9b9545255b9c84ef1e82e5867d67f Mon Sep 17 00:00:00 2001 From: zhengguoyong Date: Wed, 16 Sep 2015 09:10:47 +0800 Subject: Update vagrant.md --- developer-guides/vagrant.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/developer-guides/vagrant.md b/developer-guides/vagrant.md index d6a902b2..f451d755 100644 --- a/developer-guides/vagrant.md +++ b/developer-guides/vagrant.md @@ -245,7 +245,7 @@ my-nginx-kqdjk 1/1 Waiting 0 33s my-nginx-nyj3x 1/1 Waiting 0 33s ``` -You need to wait for the provisioning to complete, you can monitor the nodes by doing: +You need to wait for the provisioning to complete, you can monitor the minions by doing: ```console $ sudo salt '*minion-1' cmd.run 'docker images' -- cgit v1.2.3 From 69ed92839d0156431c2054f30837bbf39c29e7bc Mon Sep 17 00:00:00 2001 From: Jerzy Szczepkowski Date: Thu, 17 Sep 2015 14:08:39 +0200 Subject: Cleanups in HorizontalPodAutoscaler API. Cleanups in HorizontalPodAutoscaler API: renamed Min/MaxCount to Min/MaxReplicas as Replicas is the proper name used in other objects. --- horizontal-pod-autoscaler.md | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/horizontal-pod-autoscaler.md b/horizontal-pod-autoscaler.md index 924988d2..9762d154 100644 --- a/horizontal-pod-autoscaler.md +++ b/horizontal-pod-autoscaler.md @@ -134,11 +134,11 @@ type HorizontalPodAutoscalerSpec struct { // ScaleRef is a reference to Scale subresource. HorizontalPodAutoscaler will learn the current // resource consumption from its status, and will set the desired number of pods by modifying its spec. ScaleRef *SubresourceReference - // MinCount is the lower limit for the number of pods that can be set by the autoscaler. - MinCount int - // MaxCount is the upper limit for the number of pods that can be set by the autoscaler. - // It cannot be smaller than MinCount. - MaxCount int + // MinReplicas is the lower limit for the number of pods that can be set by the autoscaler. + MinReplicas int + // MaxReplicas is the upper limit for the number of pods that can be set by the autoscaler. + // It cannot be smaller than MinReplicas. + MaxReplicas int // Target is the target average consumption of the given resource that the autoscaler will try // to maintain by adjusting the desired number of pods. // Currently this can be either "cpu" or "memory". @@ -173,7 +173,7 @@ type ResourceConsumption struct { ``` ```Scale``` will be a reference to the Scale subresource. -```MinCount```, ```MaxCount``` and ```Target``` will define autoscaler configuration. +```MinReplicas```, ```MaxReplicas``` and ```Target``` will define autoscaler configuration. We will also introduce HorizontalPodAutoscalerList object to enable listing all autoscalers in the cluster: ```go @@ -194,8 +194,8 @@ and check their average CPU or memory usage from the last 1 minute (there will be API on master for this purpose, see [#11951](https://github.com/kubernetes/kubernetes/issues/11951). Then, it will compare the current CPU or memory consumption with the Target, -and adjust the count of the Scale if needed to match the target -(preserving condition: MinCount <= Count <= MaxCount). +and adjust the replicas of the Scale if needed to match the target +(preserving condition: MinReplicas <= Replicas <= MaxReplicas). The target number of pods will be calculated from the following formula: -- cgit v1.2.3 From 3692d4871fc7c34291fe1cfca9b24b0e8a5ecfd6 Mon Sep 17 00:00:00 2001 From: "Timothy St. Clair" Date: Fri, 11 Sep 2015 16:16:56 -0500 Subject: Add developer documentation on e2e testing. --- e2e-tests.md | 145 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 145 insertions(+) create mode 100644 e2e-tests.md diff --git a/e2e-tests.md b/e2e-tests.md new file mode 100644 index 00000000..ca55b901 --- /dev/null +++ b/e2e-tests.md @@ -0,0 +1,145 @@ + + + + +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest 1.0.x release of this document can be found +[here](http://releases.k8s.io/release-1.0/docs/devel/e2e-tests.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- + + + + + +# End-2-End Testing in Kubernetes + +## Overview + +The end-2-end tests for kubernetes provide a mechanism to test behavior of the system, and to ensure end user operations match developer specifications. In distributed systems it is not uncommon that a minor change may pass all unit tests, but cause unforseen changes at the system level. Thus, the primary objectives of the end-2-end tests are to ensure a consistent and reliable behavior of the kubernetes code base, and to catch bugs early. + +The end-2-end tests in kubernetes are built atop of [ginkgo] (http://onsi.github.io/ginkgo/) and [gomega] (http://onsi.github.io/gomega/). There are a host of features that this BDD testing framework provides, and it is recommended that the developer read the documentation prior to diving into the tests. + +The purpose of *this* document is to serve as a primer for developers who are looking to execute, or add tests, using a local development environment. + +## Building and Running the Tests + +**NOTE:** The tests have an array of options. For simplicity, the examples will focus on leveraging the tests on a local cluster using `sudo ./hack/local-up-cluster.sh` + +### Building the Tests + +The tests are built into a single binary which can be run against any deployed kubernetes system. To build the tests, navigate to your source directory and execute: + +`$ make all` + +The output for the end-2-end tests will be a single binary called `e2e.test` under the default output directory, which is typically `_output/local/bin/linux/amd64/`. Within the repository there are scripts that are provided under the `./hack` directory that are helpful for automation, but may not apply for a local development purposes. Instead, we recommend familiarizing yourself with the executable options. To obtain the full list of options, run the following: + +`$ ./e2e.test --help` + +### Running the Tests + +For the purposes of brevity, we will look at a subset of the options, which are listed below: + +``` +-ginkgo.dryRun=false: If set, ginkgo will walk the test hierarchy without actually running anything. Best paired with -v. +-ginkgo.failFast=false: If set, ginkgo will stop running a test suite after a failure occurs. +-ginkgo.failOnPending=false: If set, ginkgo will mark the test suite as failed if any specs are pending. +-ginkgo.focus="": If set, ginkgo will only run specs that match this regular expression. +-ginkgo.skip="": If set, ginkgo will only run specs that do not match this regular expression. +-ginkgo.trace=false: If set, default reporter prints out the full stack trace when a failure occurs +-ginkgo.v=false: If set, default reporter print out all specs as they begin. +-host="": The host, or api-server, to connect to +-kubeconfig="": Path to kubeconfig containing embedded authinfo. +-prom-push-gateway="": The URL to prometheus gateway, so that metrics can be pushed during e2es and scraped by prometheus. Typically something like 127.0.0.1:9091. +-provider="": The name of the Kubernetes provider (gce, gke, local, vagrant, etc.) +-repo-root="../../": Root directory of kubernetes repository, for finding test files. +``` + +Prior to running the tests, it is recommended that you first create a simple auth file in your home directory, e.g. `$HOME/.kubernetes_auth` , with the following: + +``` +{ + "User": "root", + "Password": "" +} +``` + +Next, you will need a cluster that you can test against. As mentioned earlier, you will want to execute `sudo ./hack/local-up-cluster.sh`. To get a sense of what tests exist, you may want to run: + +`e2e.test --host="127.0.0.1:8080" --provider="local" --ginkgo.v=true -ginkgo.dryRun=true --kubeconfig="$HOME/.kubernetes_auth" --repo-root="$KUBERNETES_SRC_PATH"` + +If you wish to execute a specific set of tests you can use the `-ginkgo.focus=` regex, e.g.: + +`e2e.test ... --ginkgo.focus="DNS|(?i)nodeport(?-i)|kubectl guestbook"` + +Conversely, if you wish to exclude a set of tests, you can run: + +`e2e.test ... --ginkgo.skip="Density|Scale"` + +As mentioned earlier there are a host of other options that are available, but are left to the developer + +**NOTE:** If you are running tests on a local cluster repeatedly, you may need to periodically perform some manual cleanup. +- `rm -rf /var/run/kubernetes`, clear kube generated credentials, sometimes stale permissions can cause problems. +- `sudo iptables -F`, clear ip tables rules left by the kube-proxy. + +## Adding a New Test + +As mentioned above, prior to adding a new test, it is a good idea to perform a `-ginkgo.dryRun=true` on the system, in order to see if a behavior is already being tested, or to determine if it may be possible to augment an existing set of tests for a specific use case. + +If a behavior does not currently have coverage and a developer wishes to add a new e2e test, navigate to the ./test/e2e directory and create a new test using the existing suite as a guide. + +**TODO:** Create a self-documented example which has been disabled, but can be copied to create new tests and outlines the capabilities and libraries used. + +## Performance Evaluation + +Another benefit of the end-2-end tests is the ability to create reproducible loads on the system, which can then be used to determine the responsiveness, or analyze other characteristics of the system. For example, the density tests load the system to 30,50,100 pods per/node and measures the different characteristics of the system, such as throughput, api-latency, etc. + +For a good overview of how we analyze performance data, please read the following [post](http://blog.kubernetes.io/2015/09/kubernetes-performance-measurements-and.html) + +For developers who are interested in doing their own performance analysis, we recommend setting up [prometheus](http://prometheus.io/) for data collection, and using [promdash](http://prometheus.io/docs/visualization/promdash/) to visualize the data. There also exists the option of pushing your own metrics in from the tests using a [prom-push-gateway](http://prometheus.io/docs/instrumenting/pushing/). Containers for all of these components can be found [here](https://hub.docker.com/u/prom/). + +For more accurate measurements, you may wish to set up prometheus external to kubernetes in an environment where it can access the major system components (api-server, controller-manager, scheduler). This is especially useful when attempting to gather metrics in a load-balanced api-server environment, because all api-servers can be analyzed independently as well as collectively. On startup, configuration file is passed to prometheus that specifies the endpoints that prometheus will scrape, as well as the sampling interval. + +``` +#prometheus.conf +job: { + name: "kubernetes" + scrape_interval: "1s" + target_group: { + # apiserver(s) + target: "http://localhost:8080/metrics" + # scheduler + target: "http://localhost:10251/metrics" + # controller-manager + target: "http://localhost:10252/metrics" + } +``` + +Once prometheus is scraping the kubernetes endpoints, that data can then be plotted using promdash, and alerts can be created against the assortment of metrics that kubernetes provides. + +**HAPPY TESTING!** + + + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/e2e-tests.md?pixel)]() + -- cgit v1.2.3 From e5e84013a15a7d54c15525da389c3bec14f7f691 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Daniel=20Mart=C3=AD?= Date: Thu, 17 Sep 2015 15:21:55 -0700 Subject: Move pkg/util.Time to pkg/api/unversioned.Time Along with our time.Duration wrapper, as suggested by @lavalamp. --- compute-resource-metrics-api.md | 2 +- horizontal-pod-autoscaler.md | 2 +- job.md | 10 +++++----- 3 files changed, 7 insertions(+), 7 deletions(-) diff --git a/compute-resource-metrics-api.md b/compute-resource-metrics-api.md index 472e6a37..c9f3d9af 100644 --- a/compute-resource-metrics-api.md +++ b/compute-resource-metrics-api.md @@ -161,7 +161,7 @@ type MetricsWindows map[time.Duration]DerivedMetrics type DerivedMetrics struct { // End time of all the time windows in Metrics - EndTime util.Time `json:"endtime"` + EndTime unversioned.Time `json:"endtime"` Mean ResourceUsage `json:"mean"` Max ResourceUsage `json:"max"` diff --git a/horizontal-pod-autoscaler.md b/horizontal-pod-autoscaler.md index 924988d2..b5604d21 100644 --- a/horizontal-pod-autoscaler.md +++ b/horizontal-pod-autoscaler.md @@ -162,7 +162,7 @@ type HorizontalPodAutoscalerStatus struct { // LastScaleTimestamp is the last time the HorizontalPodAutoscaler scaled the number of pods. // This is used by the autoscaler to control how often the number of pods is changed. - LastScaleTimestamp *util.Time + LastScaleTimestamp *unversioned.Time } // ResourceConsumption is an object for specifying average resource consumption of a particular resource. diff --git a/job.md b/job.md index 198a1437..d3247b1a 100644 --- a/job.md +++ b/job.md @@ -131,13 +131,13 @@ type JobStatus struct { Conditions []JobCondition // CreationTime represents time when the job was created - CreationTime util.Time + CreationTime unversioned.Time // StartTime represents time when the job was started - StartTime util.Time + StartTime unversioned.Time // CompletionTime represents time when the job was completed - CompletionTime util.Time + CompletionTime unversioned.Time // Active is the number of actively running pods. Active int @@ -162,8 +162,8 @@ const ( type JobCondition struct { Type JobConditionType Status ConditionStatus - LastHeartbeatTime util.Time - LastTransitionTime util.Time + LastHeartbeatTime unversioned.Time + LastTransitionTime unversioned.Time Reason string Message string } -- cgit v1.2.3 From c0e44162bc75fe062e183b27fccc578e837c19b2 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Daniel=20Mart=C3=AD?= Date: Thu, 17 Sep 2015 15:21:55 -0700 Subject: Move pkg/util.Time to pkg/api/unversioned.Time Along with our time.Duration wrapper, as suggested by @lavalamp. --- api-conventions.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/api-conventions.md b/api-conventions.md index e7b8b4e9..31225e18 100644 --- a/api-conventions.md +++ b/api-conventions.md @@ -189,8 +189,8 @@ The `FooCondition` type for some resource type `Foo` may include a subset of the ```golang Type FooConditionType `json:"type" description:"type of Foo condition"` Status ConditionStatus `json:"status" description:"status of the condition, one of True, False, Unknown"` - LastHeartbeatTime util.Time `json:"lastHeartbeatTime,omitempty" description:"last time we got an update on a given condition"` - LastTransitionTime util.Time `json:"lastTransitionTime,omitempty" description:"last time the condition transit from one status to another"` + LastHeartbeatTime unversioned.Time `json:"lastHeartbeatTime,omitempty" description:"last time we got an update on a given condition"` + LastTransitionTime unversioned.Time `json:"lastTransitionTime,omitempty" description:"last time the condition transit from one status to another"` Reason string `json:"reason,omitempty" description:"one-word CamelCase reason for the condition's last transition"` Message string `json:"message,omitempty" description:"human-readable message indicating details about last transition"` ``` -- cgit v1.2.3 From 6583718cd4689f89955e6a86827c7d891b5a694a Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Daniel=20Mart=C3=AD?= Date: Thu, 17 Sep 2015 15:21:55 -0700 Subject: Move pkg/util.Time to pkg/api/unversioned.Time Along with our time.Duration wrapper, as suggested by @lavalamp. --- event_compression.md | 4 ++-- expansion.md | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/event_compression.md b/event_compression.md index 424f9ac2..b9861717 100644 --- a/event_compression.md +++ b/event_compression.md @@ -49,9 +49,9 @@ Event compression should be best effort (not guaranteed). Meaning, in the worst ## Design Instead of a single Timestamp, each event object [contains](http://releases.k8s.io/HEAD/pkg/api/types.go#L1111) the following fields: - * `FirstTimestamp util.Time` + * `FirstTimestamp unversioned.Time` * The date/time of the first occurrence of the event. - * `LastTimestamp util.Time` + * `LastTimestamp unversioned.Time` * The date/time of the most recent occurrence of the event. * On first occurrence, this is equal to the FirstTimestamp. * `Count int` diff --git a/expansion.md b/expansion.md index 24a07f0d..b19731b9 100644 --- a/expansion.md +++ b/expansion.md @@ -265,7 +265,7 @@ type ObjectEventRecorder interface { Eventf(reason, messageFmt string, args ...interface{}) // PastEventf is just like Eventf, but with an option to specify the event's 'timestamp' field. - PastEventf(timestamp util.Time, reason, messageFmt string, args ...interface{}) + PastEventf(timestamp unversioned.Time, reason, messageFmt string, args ...interface{}) } ``` -- cgit v1.2.3 From 292225b77b525a0e55c9f2c5ad6904288e751c7c Mon Sep 17 00:00:00 2001 From: Matt McNaughton Date: Fri, 18 Sep 2015 00:34:25 -0400 Subject: Fix indendation on devel/coding-conventions.md Fixing the indendation means the markdown will now render correcly on Github. Signed-off-by: Matt McNaughton --- coding-conventions.md | 26 +++++++++++++------------- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/coding-conventions.md b/coding-conventions.md index 8ddf000e..3e3abaf7 100644 --- a/coding-conventions.md +++ b/coding-conventions.md @@ -65,19 +65,19 @@ Testing conventions - Unit tests must pass on OS X and Windows platforms - if you use Linux specific features, your test case must either be skipped on windows or compiled out (skipped is better when running Linux specific commands, compiled out is required when your code does not compile on Windows). Directory and file conventions - - Avoid package sprawl. Find an appropriate subdirectory for new packages. (See [#4851](http://issues.k8s.io/4851) for discussion.) - - Libraries with no more appropriate home belong in new package subdirectories of pkg/util - - Avoid general utility packages. Packages called "util" are suspect. Instead, derive a name that describes your desired function. For example, the utility functions dealing with waiting for operations are in the "wait" package and include functionality like Poll. So the full name is wait.Poll - - Go source files and directories use underscores, not dashes - - Package directories should generally avoid using separators as much as possible (when packages are multiple words, they usually should be in nested subdirectories). - - Document directories and filenames should use dashes rather than underscores - - Contrived examples that illustrate system features belong in /docs/user-guide or /docs/admin, depending on whether it is a feature primarily intended for users that deploy applications or cluster administrators, respectively. Actual application examples belong in /examples. - - Examples should also illustrate [best practices for using the system](../user-guide/config-best-practices.md) - - Third-party code - - Third-party Go code is managed using Godeps - - Other third-party code belongs in /third_party - - Third-party code must include licenses - - This includes modified third-party code and excerpts, as well + - Avoid package sprawl. Find an appropriate subdirectory for new packages. (See [#4851](http://issues.k8s.io/4851) for discussion.) + - Libraries with no more appropriate home belong in new package subdirectories of pkg/util + - Avoid general utility packages. Packages called "util" are suspect. Instead, derive a name that describes your desired function. For example, the utility functions dealing with waiting for operations are in the "wait" package and include functionality like Poll. So the full name is wait.Poll + - Go source files and directories use underscores, not dashes + - Package directories should generally avoid using separators as much as possible (when packages are multiple words, they usually should be in nested subdirectories). + - Document directories and filenames should use dashes rather than underscores + - Contrived examples that illustrate system features belong in /docs/user-guide or /docs/admin, depending on whether it is a feature primarily intended for users that deploy applications or cluster administrators, respectively. Actual application examples belong in /examples. + - Examples should also illustrate [best practices for using the system](../user-guide/config-best-practices.md) + - Third-party code + - Third-party Go code is managed using Godeps + - Other third-party code belongs in /third_party + - Third-party code must include licenses + - This includes modified third-party code and excerpts, as well Coding advice - Go -- cgit v1.2.3 From 6d04d610747b27e3fd27ee0c240648024eafe2da Mon Sep 17 00:00:00 2001 From: qiaolei Date: Sat, 19 Sep 2015 09:32:17 +0800 Subject: Change 'params' to 'extraParams' to keep align with naming conventions Go field names must be CamelCase. JSON field names must be camelCase. Other than capitalization of the initial letter, the two should almost always match. No underscores nor dashes in either Please refer 'https://github.com/kubernetes/kubernetes/blob/master/docs/devel/api-conventions.md#naming-conventions' --- api_changes.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/api_changes.md b/api_changes.md index e0a65fe0..a7c4c3c8 100644 --- a/api_changes.md +++ b/api_changes.md @@ -157,7 +157,7 @@ type Frobber struct { Height int `json:"height"` Width int `json:"width"` Param string `json:"param"` // the first param - ExtraParams []string `json:"params"` // additional params + ExtraParams []string `json:"extraParams"` // additional params } ``` -- cgit v1.2.3 From efee6727cd73f876454bbcb7d7f2737f0ea3a0b5 Mon Sep 17 00:00:00 2001 From: Tim Hockin Date: Sun, 20 Sep 2015 21:00:41 -0700 Subject: Clarify experimental annotation format --- api-conventions.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/api-conventions.md b/api-conventions.md index 31225e18..fb7cbe10 100644 --- a/api-conventions.md +++ b/api-conventions.md @@ -33,7 +33,7 @@ Documentation for other releases can be found at API Conventions =============== -Updated: 8/24/2015 +Updated: 9/20/2015 *This document is oriented at users who want a deeper understanding of the Kubernetes API structure, and developers wanting to extend the Kubernetes API. An introduction to @@ -712,7 +712,7 @@ Therefore, resources supporting auto-generation of unique labels should have a ` Annotations have very different intended usage from labels. We expect them to be primarily generated and consumed by tooling and system extensions. I'm inclined to generalize annotations to permit them to directly store arbitrary json. Rigid names and name prefixes make sense, since they are analogous to API fields. -In fact, experimental API fields, including to represent fields of newer alpha/beta API versions in the older, stable storage version, may be represented as annotations with the prefix `experimental.kubernetes.io/`. +In fact, experimental API fields, including those used to represent fields of newer alpha/beta API versions in the older stable storage version, may be represented as annotations with the form `something.experimental.kubernetes.io/name`. For example `net.experimental.kubernetes.io/policy` might represent an experimental network policy field. Other advice regarding use of labels, annotations, and other generic map keys by Kubernetes components and tools: - Key names should be all lowercase, with words separated by dashes, such as `desired-replicas` -- cgit v1.2.3 From ac4ed3a76c22eefbb043d0917bbcbedfa9012b57 Mon Sep 17 00:00:00 2001 From: eulerzgy Date: Mon, 21 Sep 2015 15:21:11 +0800 Subject: change etcdIndec to etcdIndex --- apiserver-watch.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/apiserver-watch.md b/apiserver-watch.md index 2917eec4..3e92d1e0 100644 --- a/apiserver-watch.md +++ b/apiserver-watch.md @@ -116,7 +116,7 @@ we will store two things: This should be as simple as having an array an treating it as a cyclic buffer. Obviously resourceVersion of objects watched from etcd will be increasing, but they are necessary for registering a new watcher that is interested in all the - changes since a given etcdIndec. + changes since a given etcdIndex. Additionally, we should support LIST operation, otherwise clients can never start watching at now. We may consider passing lists through etcd, however -- cgit v1.2.3 From 05697f05b991d67f13f9063a60fb17a75d487004 Mon Sep 17 00:00:00 2001 From: AnanyaKumar Date: Mon, 31 Aug 2015 00:01:13 -0400 Subject: Add daemon design doc --- daemon.md | 128 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 128 insertions(+) create mode 100644 daemon.md diff --git a/daemon.md b/daemon.md new file mode 100644 index 00000000..a948b78a --- /dev/null +++ b/daemon.md @@ -0,0 +1,128 @@ +# Daemons in Kubernetes + +**Author**: Ananya Kumar (@AnanyaKumar) + +**Status**: Draft proposal; prototype in progress. + +This document presents the design of a daemon controller for Kubernetes, outlines relevant Kubernetes concepts, describes use cases, and lays out milestones for its development. + +## Motivation + +In Kubernetes, a Replication Controller ensures that the specified number of a specified pod are running in the cluster at all times (pods are restarted if they are killed). With the Replication Controller, users cannot control which nodes their pods run on - Kubernetes decides how to schedule the pods onto nodes. However, many users want control over how certain pods are scheduled. In particular, many users have requested for a way to run a daemon on every node in the cluster, or on a certain set of nodes in the cluster. This is essential for use cases such as building a sharded datastore, or running a logger on every node. In comes the daemon controller, a way to conveniently create and manage daemon-like workloads in Kubernetes. + +## Use Cases + +The daemon controller can be used for user-specified system services, cluster level applications with strong node ties, and Kubernetes node services. Below are example use cases in each category. + +### User-Specified System Services: +Logging: Some users want a way to collect statistics about nodes in a cluster and send those logs to an external database. For example, system administrators might want to know if their machines are performing as expected, if they need to add more machines to the cluster, or if they should switch cloud providers. The daemon controller can be used to run a data collection service (for example fluentd) and send the data to a service like ElasticSearch for analysis. + +### Cluster-Level Applications +Datastore: Users might want to implement a sharded datastore in their cluster. A few nodes in the cluster, labeled ‘datastore’, might be responsible for storing data shards, and pods running on these nodes might serve data. This architecture requires a way to bind pods to specific nodes, so it cannot be achieved using a Replication Controller. A daemon controller is a convenient way to implement such a datastore. + +For other uses, see the related [feature request](https://github.com/GoogleCloudPlatform/kubernetes/issues/1518) + +## Functionality + +The Daemon Controller will support standard API features: +- create + - The spec for daemon controllers will have a pod template field. + - Using the pod’s node selector field, Daemon controllers can be restricted to operate over nodes that have a certain label. For example, suppose that in a cluster some nodes are labeled ‘database’. You can use a daemon controller to launch a datastore pod on exactly those nodes labeled ‘database’. + - Using the pod's node name field, Daemon controllers can be restricted to operate on a specified node. + - The spec for pod templates that run with the Daemon Controller is the same as the spec for pod templates that run with the Replication Controller, except there will not be a ‘replicas’ field (exactly 1 daemon pod will be launched per node). + - We will not guarantee that daemon pods show up on nodes before regular pods - run ordering is out of scope for this controller. + - The Daemon Controller will not guarantee that Daemon pods show up on nodes (for example because of resource limitations of the node), but will make a best effort to launch Daemon pods (like Replication Controllers do with pods) + - A daemon controller named “foo” will add a “controller: foo” annotation to all the pods that it creates + - YAML example: +```YAML + apiVersion: v1 + kind: Daemon + metadata: + labels: + name: datastore + name: datastore + spec: + template: + metadata: + labels: + name: datastore-shard + spec: + node-selector: + name: datastore-node + containers: + name: datastore-shard + image: kubernetes/sharded + ports: + - containerPort: 9042 + name: main +``` + - commands that get info + - get (e.g. kubectl get dc) + - describe + - Modifiers + - delete + - stop: first we turn down the Daemon Controller foo, and then we turn down all pods matching the query “controller: foo” + - label + - update + - Daemon controllers will have labels, so you could, for example, list all daemon controllers with a certain label (the same way you would for a Replication Controller). + - In general, for all the supported features like get, describe, update, etc, the Daemon Controller will work in a similar way to the Replication Controller. However, note that the Daemon Controller and the Replication Controller are different constructs. + +### Health checks + - Ordinary health checks specified in the pod template will of course work to keep pods created by a Daemon Controller running. + +### Cluster Mutations + - When a new node is added to the cluster the daemon controller should start the daemon on the node (if the node’s labels match the user-specified selectors). This is a big advantage of the Daemon Controller compared to alternative ways of launching daemons and configuring clusters. + - Suppose the user launches a daemon controller that runs a logging daemon on all nodes labeled “tolog”. If the user then adds the “tolog” label to a node (that did not initially have the “tolog” label), the logging daemon should be launched on the node. Additionally, if a user removes the “tolog” label from a node, the logging daemon on that node should be killed. + +## Alternatives Considered + +An alternative way to launch daemons is to avoid going through the API server, and instead provide ways to package the daemon into the node. For example, users could: + +1. Include the daemon in the machine image +2. Use config files to launch daemons +3. Use static pod manifests to launch daemon pods when the node initializes + +These alternatives don’t work as well because the daemons won’t be well integrated into Kubernetes. In particular, + +1. In alternatives (1) and (2), health checking for the daemons would need to be re-implemented, or would not exist at all (because the daemons are not run inside pods). In the current proposal, the Kubelet will health-check daemon pods and restart them if necessary. +2. In alternatives (1) and (2), binding services to a group of daemons is difficult (which is needed in use cases such as the sharded data store use case described above), because the daemons are not run inside pods +3. A big disadvantage of these methods is that adding new daemons in existing nodes is difficult (for example, if a cluster manager wants to add a logging daemon after a cluster has been deployed). +4. The above alternatives are less user-friendly. Users need to learn two ways of launching pods: using the API when launching pods associated with Replication Controllers, and using manifests when launching daemons. So in the alternatives, deployment is more difficult. +5. It’s difficult to upgrade binaries launched in any of those three ways. + +Another alternative is for the user to explicitly assign pods to specific nodes (using the Pod spec) when creating pods. A big disadvantage of this alternative is that the user would need to manually check whether new nodes satisfy the desired labels, and if so add the daemon to the node. This makes deployment painful, and could lead to costly mistakes (if a certain daemon is not launched on a new node which it is supposed to run on). In essence, every user will be re-implementing the Daemon Controller for themselves. + +A third alternative is to generalize the Replication Controller. We could add a field for the user to specify that she wishes to bind pods to certain nodes in the cluster. Or we could add a field to the pod-spec allowing the user to specify that each node can have exactly one instance of a pod (so the user would create a Replication Controller with a very large number of replicas, and set the anti-affinity field to true preventing more than one pod with that label from being scheduled onto a single node). The disadvantage of these methods is that the Daemon Controller and the Replication Controller are very different concepts. The Daemon Controller operates on a per-node basis, while the Replication Controller operates on a per-job basis (in particular, the Daemon Controller will take action when a node is changed or added). So presenting them as different concepts makes for a better user interface. Having small and directed controllers for distinct purposes makes Kubernetes easier to understand and use, compared to having one controller to rule them all. + +## Design + +#### Client +- Add support for daemon controller commands to kubectl and the client. Client code was added to client/unversioned. The main files in Kubectl that were modified are kubectl/describe.go and kubectl/stop.go, since for other calls like Get, Create, and Update, the client simply forwards the request to the backend via the REST API. + +#### Apiserver +- Accept, parse, validate client commands +- REST API calls will be handled in registry/daemon + - In particular, the api server will add the object to etcd + - DaemonManager listens for updates to etcd (using Framework.informer) +- API object for Daemon Controller will be created in expapi/v1/types.go and expapi/v1/register.go +- Validation code is in expapi/validation + +#### Daemon Manager +- Creates new daemon controllers when requested. Launches the corresponding daemon pod on all nodes with labels matching the new daemon controller’s selector. +- Listens for addition of new nodes to the cluster, by setting up a framework.NewInformer that watches for the creation of Node API objects. When a new node is added, the daemon manager will loop through each daemon controller. If the label of the node matches the selector of the daemon controller, then the daemon manager will create the corresponding daemon pod in the new node. +- The daemon manager will create a pod on a node by sending a command to the API server, requesting for a pod to be bound to the node (the node will be specified via its hostname) + +#### Kubelet +- Does not need to be modified, but health checking for the daemon pods and revive the pods if they are killed (we will set the pod restartPolicy to Always). We reject Daemon Controller objects with pod templates that don’t have restartPolicy set to Always. + +## Testing + +Unit Tests: +Each component was unit tested, fakes were implemented when necessary. For example, when testing the client, a fake API server was used. + +End to End Tests: +One end-to-end test was implemented. The end-to-end test verified that the daemon manager runs the daemon on every node, that when a daemon pod is stopped it restarts, that the daemon controller can be reaped (stopped), and that the daemon adds/removes daemon pods appropriately from nodes when their labels change. + +## Open Issues +- Rolling updates across nodes should be performed according to the [anti-affinity policy in scheduler](https://github.com/GoogleCloudPlatform/kubernetes/blob/master/plugin/pkg/scheduler/api/v1/types.go). We need to figure out how to share that configuration. +- See how this can work with [Deployment design](https://github.com/GoogleCloudPlatform/kubernetes/issues/1743). -- cgit v1.2.3 From 17b1ec3333214e52782126da286e3952c5012859 Mon Sep 17 00:00:00 2001 From: Ananya Kumar Date: Tue, 1 Sep 2015 22:03:22 -0400 Subject: Update daemon.md --- daemon.md | 40 ++++++++++++++++++++-------------------- 1 file changed, 20 insertions(+), 20 deletions(-) diff --git a/daemon.md b/daemon.md index a948b78a..9cadfc4d 100644 --- a/daemon.md +++ b/daemon.md @@ -1,14 +1,14 @@ -# Daemons in Kubernetes +# Daemon Controller in Kubernetes **Author**: Ananya Kumar (@AnanyaKumar) **Status**: Draft proposal; prototype in progress. -This document presents the design of a daemon controller for Kubernetes, outlines relevant Kubernetes concepts, describes use cases, and lays out milestones for its development. +This document presents the design of the Kubernetes daemon controller, describes use cases, and gives an overview of the code. ## Motivation -In Kubernetes, a Replication Controller ensures that the specified number of a specified pod are running in the cluster at all times (pods are restarted if they are killed). With the Replication Controller, users cannot control which nodes their pods run on - Kubernetes decides how to schedule the pods onto nodes. However, many users want control over how certain pods are scheduled. In particular, many users have requested for a way to run a daemon on every node in the cluster, or on a certain set of nodes in the cluster. This is essential for use cases such as building a sharded datastore, or running a logger on every node. In comes the daemon controller, a way to conveniently create and manage daemon-like workloads in Kubernetes. +Many users have requested for a way to run a daemon on every node in a Kubernetes cluster, or on a certain set of nodes in a cluster. This is essential for use cases such as building a sharded datastore, or running a logger on every node. In comes the daemon controller, a way to conveniently create and manage daemon-like workloads in Kubernetes. ## Use Cases @@ -24,15 +24,15 @@ For other uses, see the related [feature request](https://github.com/GoogleCloud ## Functionality -The Daemon Controller will support standard API features: +The Daemon Controller supports standard API features: - create - - The spec for daemon controllers will have a pod template field. + - The spec for daemon controllers has a pod template field. - Using the pod’s node selector field, Daemon controllers can be restricted to operate over nodes that have a certain label. For example, suppose that in a cluster some nodes are labeled ‘database’. You can use a daemon controller to launch a datastore pod on exactly those nodes labeled ‘database’. - Using the pod's node name field, Daemon controllers can be restricted to operate on a specified node. - The spec for pod templates that run with the Daemon Controller is the same as the spec for pod templates that run with the Replication Controller, except there will not be a ‘replicas’ field (exactly 1 daemon pod will be launched per node). - - We will not guarantee that daemon pods show up on nodes before regular pods - run ordering is out of scope for this controller. - - The Daemon Controller will not guarantee that Daemon pods show up on nodes (for example because of resource limitations of the node), but will make a best effort to launch Daemon pods (like Replication Controllers do with pods) - - A daemon controller named “foo” will add a “controller: foo” annotation to all the pods that it creates + - We will not guarantee that daemon pods show up on nodes before regular pods - run ordering is out of scope for this controller. + - The initial implementation of Daemon Controller does not guarantee that Daemon pods show up on nodes (for example because of resource limitations of the node), but makes a best effort to launch Daemon pods (like Replication Controllers do with pods). Subsequent revisions might ensure that Daemon pods show up on nodes, pushing out other pods if necessary. + - A daemon controller named “foo” adds a “controller: foo” annotation to all the pods that it creates - YAML example: ```YAML apiVersion: v1 @@ -61,18 +61,19 @@ The Daemon Controller will support standard API features: - describe - Modifiers - delete - - stop: first we turn down the Daemon Controller foo, and then we turn down all pods matching the query “controller: foo” + - stop: first we turn down all the pods controller by the daemon (by setting the nodeName to a non-existed name). Then we turn down the daemon controller. - label - update - - Daemon controllers will have labels, so you could, for example, list all daemon controllers with a certain label (the same way you would for a Replication Controller). - - In general, for all the supported features like get, describe, update, etc, the Daemon Controller will work in a similar way to the Replication Controller. However, note that the Daemon Controller and the Replication Controller are different constructs. + - Daemon controllers have labels, so you could, for example, list all daemon controllers with a certain label (the same way you would for a Replication Controller). + - In general, for all the supported features like get, describe, update, etc, the Daemon Controller works in a similar way to the Replication Controller. However, note that the Daemon Controller and the Replication Controller are different constructs. -### Health checks - - Ordinary health checks specified in the pod template will of course work to keep pods created by a Daemon Controller running. +### Persisting Pods + - Ordinary health checks specified in the pod template work to keep pods created by a Daemon Controller running. + - If a daemon pod is killed or stopped, the daemon controller will create a new replica of the daemon pod on the node. ### Cluster Mutations - - When a new node is added to the cluster the daemon controller should start the daemon on the node (if the node’s labels match the user-specified selectors). This is a big advantage of the Daemon Controller compared to alternative ways of launching daemons and configuring clusters. - - Suppose the user launches a daemon controller that runs a logging daemon on all nodes labeled “tolog”. If the user then adds the “tolog” label to a node (that did not initially have the “tolog” label), the logging daemon should be launched on the node. Additionally, if a user removes the “tolog” label from a node, the logging daemon on that node should be killed. + - When a new node is added to the cluster the daemon controller starts the daemon on the node (if the node’s labels match the user-specified selectors). This is a big advantage of the Daemon Controller compared to alternative ways of launching daemons and configuring clusters. + - Suppose the user launches a daemon controller that runs a logging daemon on all nodes labeled “tolog”. If the user then adds the “tolog” label to a node (that did not initially have the “tolog” label), the logging daemon will launch on the node. Additionally, if a user removes the “tolog” label from a node, the logging daemon on that node will be killed. ## Alternatives Considered @@ -101,19 +102,19 @@ A third alternative is to generalize the Replication Controller. We could add a #### Apiserver - Accept, parse, validate client commands -- REST API calls will be handled in registry/daemon +- REST API calls are handled in registry/daemon - In particular, the api server will add the object to etcd - DaemonManager listens for updates to etcd (using Framework.informer) -- API object for Daemon Controller will be created in expapi/v1/types.go and expapi/v1/register.go +- API objects for Daemon Controller were created in expapi/v1/types.go and expapi/v1/register.go - Validation code is in expapi/validation #### Daemon Manager - Creates new daemon controllers when requested. Launches the corresponding daemon pod on all nodes with labels matching the new daemon controller’s selector. - Listens for addition of new nodes to the cluster, by setting up a framework.NewInformer that watches for the creation of Node API objects. When a new node is added, the daemon manager will loop through each daemon controller. If the label of the node matches the selector of the daemon controller, then the daemon manager will create the corresponding daemon pod in the new node. -- The daemon manager will create a pod on a node by sending a command to the API server, requesting for a pod to be bound to the node (the node will be specified via its hostname) +- The daemon manager creates a pod on a node by sending a command to the API server, requesting for a pod to be bound to the node (the node will be specified via its hostname) #### Kubelet -- Does not need to be modified, but health checking for the daemon pods and revive the pods if they are killed (we will set the pod restartPolicy to Always). We reject Daemon Controller objects with pod templates that don’t have restartPolicy set to Always. +- Does not need to be modified, but health checking will occur for the daemon pods and revive the pods if they are killed (we set the pod restartPolicy to Always). We reject Daemon Controller objects with pod templates that don’t have restartPolicy set to Always. ## Testing @@ -124,5 +125,4 @@ End to End Tests: One end-to-end test was implemented. The end-to-end test verified that the daemon manager runs the daemon on every node, that when a daemon pod is stopped it restarts, that the daemon controller can be reaped (stopped), and that the daemon adds/removes daemon pods appropriately from nodes when their labels change. ## Open Issues -- Rolling updates across nodes should be performed according to the [anti-affinity policy in scheduler](https://github.com/GoogleCloudPlatform/kubernetes/blob/master/plugin/pkg/scheduler/api/v1/types.go). We need to figure out how to share that configuration. - See how this can work with [Deployment design](https://github.com/GoogleCloudPlatform/kubernetes/issues/1743). -- cgit v1.2.3 From dc60757674f4b9c663261cf8caf123f9ae8d4ac6 Mon Sep 17 00:00:00 2001 From: David Oppenheimer Date: Mon, 21 Sep 2015 17:15:44 -0700 Subject: Design doc for daemon controller. Originally started as PR #13368. --- daemon.md | 86 ++++++++++++++++++++++++++++----------------------------------- 1 file changed, 38 insertions(+), 48 deletions(-) diff --git a/daemon.md b/daemon.md index 9cadfc4d..c4187c7b 100644 --- a/daemon.md +++ b/daemon.md @@ -1,54 +1,54 @@ -# Daemon Controller in Kubernetes +# DaemonSet in Kubernetes **Author**: Ananya Kumar (@AnanyaKumar) -**Status**: Draft proposal; prototype in progress. +**Status**: Implemented. -This document presents the design of the Kubernetes daemon controller, describes use cases, and gives an overview of the code. +This document presents the design of the Kubernetes DaemonSet, describes use cases, and gives an overview of the code. ## Motivation -Many users have requested for a way to run a daemon on every node in a Kubernetes cluster, or on a certain set of nodes in a cluster. This is essential for use cases such as building a sharded datastore, or running a logger on every node. In comes the daemon controller, a way to conveniently create and manage daemon-like workloads in Kubernetes. +Many users have requested for a way to run a daemon on every node in a Kubernetes cluster, or on a certain set of nodes in a cluster. This is essential for use cases such as building a sharded datastore, or running a logger on every node. In comes the DaemonSet, a way to conveniently create and manage daemon-like workloads in Kubernetes. ## Use Cases -The daemon controller can be used for user-specified system services, cluster level applications with strong node ties, and Kubernetes node services. Below are example use cases in each category. +The DaemonSet can be used for user-specified system services, cluster-level applications with strong node ties, and Kubernetes node services. Below are example use cases in each category. ### User-Specified System Services: -Logging: Some users want a way to collect statistics about nodes in a cluster and send those logs to an external database. For example, system administrators might want to know if their machines are performing as expected, if they need to add more machines to the cluster, or if they should switch cloud providers. The daemon controller can be used to run a data collection service (for example fluentd) and send the data to a service like ElasticSearch for analysis. +Logging: Some users want a way to collect statistics about nodes in a cluster and send those logs to an external database. For example, system administrators might want to know if their machines are performing as expected, if they need to add more machines to the cluster, or if they should switch cloud providers. The DaemonSet can be used to run a data collection service (for example fluentd) on every node and send the data to a service like ElasticSearch for analysis. ### Cluster-Level Applications -Datastore: Users might want to implement a sharded datastore in their cluster. A few nodes in the cluster, labeled ‘datastore’, might be responsible for storing data shards, and pods running on these nodes might serve data. This architecture requires a way to bind pods to specific nodes, so it cannot be achieved using a Replication Controller. A daemon controller is a convenient way to implement such a datastore. +Datastore: Users might want to implement a sharded datastore in their cluster. A few nodes in the cluster, labeled ‘app=datastore’, might be responsible for storing data shards, and pods running on these nodes might serve data. This architecture requires a way to bind pods to specific nodes, so it cannot be achieved using a Replication Controller. A DaemonSet is a convenient way to implement such a datastore. -For other uses, see the related [feature request](https://github.com/GoogleCloudPlatform/kubernetes/issues/1518) +For other uses, see the related [feature request](https://issues.k8s.io/1518) ## Functionality -The Daemon Controller supports standard API features: +The DaemonSet supports standard API features: - create - - The spec for daemon controllers has a pod template field. - - Using the pod’s node selector field, Daemon controllers can be restricted to operate over nodes that have a certain label. For example, suppose that in a cluster some nodes are labeled ‘database’. You can use a daemon controller to launch a datastore pod on exactly those nodes labeled ‘database’. - - Using the pod's node name field, Daemon controllers can be restricted to operate on a specified node. - - The spec for pod templates that run with the Daemon Controller is the same as the spec for pod templates that run with the Replication Controller, except there will not be a ‘replicas’ field (exactly 1 daemon pod will be launched per node). - - We will not guarantee that daemon pods show up on nodes before regular pods - run ordering is out of scope for this controller. - - The initial implementation of Daemon Controller does not guarantee that Daemon pods show up on nodes (for example because of resource limitations of the node), but makes a best effort to launch Daemon pods (like Replication Controllers do with pods). Subsequent revisions might ensure that Daemon pods show up on nodes, pushing out other pods if necessary. - - A daemon controller named “foo” adds a “controller: foo” annotation to all the pods that it creates + - The spec for DaemonSets has a pod template field. + - Using the pod’s nodeSelector field, DaemonSets can be restricted to operate over nodes that have a certain label. For example, suppose that in a cluster some nodes are labeled ‘app=database’. You can use a DaemonSet to launch a datastore pod on exactly those nodes labeled ‘app=database’. + - Using the pod's node name field, DaemonSets can be restricted to operate on a specified nodeName. + - The PodTemplateSpec used by the DaemonSet is the same as the PodTemplateSpec usedby the Replication Controller. + - We will not guarantee that daemon pods show up on nodes before regular pods - run ordering is out of scope for this abstraction in the initial implementation. + - The initial implementation of DaemonSet does not guarantee that Daemon pods show up on nodes (for example because of resource limitations of the node), but makes a best effort to launch Daemon pods (like Replication Controllers do with pods). Subsequent revisions might ensure that Daemon pods show up on nodes, preempting other pods if necessary. + - The DaemonSet controller adds an annotation "kubernetes.io/created-by: \" - YAML example: ```YAML apiVersion: v1 kind: Daemon metadata: labels: - name: datastore + app: datastore name: datastore spec: template: metadata: labels: - name: datastore-shard + app: datastore-shard spec: - node-selector: - name: datastore-node + nodeSelector: + app: datastore-node containers: name: datastore-shard image: kubernetes/sharded @@ -57,31 +57,29 @@ The Daemon Controller supports standard API features: name: main ``` - commands that get info - - get (e.g. kubectl get dc) + - get (e.g. kubectl get daemonsets) - describe - Modifiers - - delete - - stop: first we turn down all the pods controller by the daemon (by setting the nodeName to a non-existed name). Then we turn down the daemon controller. + - delete (if --cascade=true, then first the client turns down all the pods controlled by the DaemonSet (by setting the nodeName to a non-existant name); then it deletes the DaemonSet; then it deletes the pods) - label - - update - - Daemon controllers have labels, so you could, for example, list all daemon controllers with a certain label (the same way you would for a Replication Controller). - - In general, for all the supported features like get, describe, update, etc, the Daemon Controller works in a similar way to the Replication Controller. However, note that the Daemon Controller and the Replication Controller are different constructs. + - update (only allowed to selector and to nodeSelector and nodeName of pod template) + - DaemonSets have labels, so you could, for example, list all DaemonSets with a certain label (the same way you would for a Replication Controller). + - In general, for all the supported features like get, describe, update, etc, the DaemonSet works in a similar way to the Replication Controller. However, note that the DaemonSet and the Replication Controller are different constructs. ### Persisting Pods - - Ordinary health checks specified in the pod template work to keep pods created by a Daemon Controller running. - - If a daemon pod is killed or stopped, the daemon controller will create a new replica of the daemon pod on the node. + - Ordinary livenes probes specified in the pod template work to keep pods created by a DaemonSet running. + - If a daemon pod is killed or stopped, the DaemonSet will create a new replica of the daemon pod on the node. ### Cluster Mutations - - When a new node is added to the cluster the daemon controller starts the daemon on the node (if the node’s labels match the user-specified selectors). This is a big advantage of the Daemon Controller compared to alternative ways of launching daemons and configuring clusters. - - Suppose the user launches a daemon controller that runs a logging daemon on all nodes labeled “tolog”. If the user then adds the “tolog” label to a node (that did not initially have the “tolog” label), the logging daemon will launch on the node. Additionally, if a user removes the “tolog” label from a node, the logging daemon on that node will be killed. + - When a new node is added to the cluster the DaemonSet starts the daemon on the node (if the node’s labels match the user-specified selectors). + - Suppose the user launches a DaemonSet that runs a logging daemon on all nodes labeled “logger=fluentd”. If the user then adds the “logger=fluentd” label to a node (that did not initially have the label), the logging daemon will launch on the node. Additionally, if a user removes the label from a node, the logging daemon on that node will be killed. ## Alternatives Considered An alternative way to launch daemons is to avoid going through the API server, and instead provide ways to package the daemon into the node. For example, users could: 1. Include the daemon in the machine image -2. Use config files to launch daemons -3. Use static pod manifests to launch daemon pods when the node initializes +2. Use static pod manifests to launch daemon pods when the node initializes These alternatives don’t work as well because the daemons won’t be well integrated into Kubernetes. In particular, @@ -91,38 +89,30 @@ These alternatives don’t work as well because the daemons won’t be well inte 4. The above alternatives are less user-friendly. Users need to learn two ways of launching pods: using the API when launching pods associated with Replication Controllers, and using manifests when launching daemons. So in the alternatives, deployment is more difficult. 5. It’s difficult to upgrade binaries launched in any of those three ways. -Another alternative is for the user to explicitly assign pods to specific nodes (using the Pod spec) when creating pods. A big disadvantage of this alternative is that the user would need to manually check whether new nodes satisfy the desired labels, and if so add the daemon to the node. This makes deployment painful, and could lead to costly mistakes (if a certain daemon is not launched on a new node which it is supposed to run on). In essence, every user will be re-implementing the Daemon Controller for themselves. +Another alternative is for the user to explicitly assign pods to specific nodes (using the Pod spec) when creating pods. A big disadvantage of this alternative is that the user would need to manually check whether new nodes satisfy the desired labels, and if so add the daemon to the node. This makes deployment painful, and could lead to costly mistakes (if a certain daemon is not launched on a new node which it is supposed to run on). In essence, every user will be re-implementing the DaemonSet for themselves. -A third alternative is to generalize the Replication Controller. We could add a field for the user to specify that she wishes to bind pods to certain nodes in the cluster. Or we could add a field to the pod-spec allowing the user to specify that each node can have exactly one instance of a pod (so the user would create a Replication Controller with a very large number of replicas, and set the anti-affinity field to true preventing more than one pod with that label from being scheduled onto a single node). The disadvantage of these methods is that the Daemon Controller and the Replication Controller are very different concepts. The Daemon Controller operates on a per-node basis, while the Replication Controller operates on a per-job basis (in particular, the Daemon Controller will take action when a node is changed or added). So presenting them as different concepts makes for a better user interface. Having small and directed controllers for distinct purposes makes Kubernetes easier to understand and use, compared to having one controller to rule them all. +A third alternative is to generalize the Replication Controller. We could add a field for the user to specify that she wishes to bind pods to certain nodes in the cluster. Or we could add a field to the pod-spec allowing the user to specify that each node can have exactly one instance of a pod (so the user would create a Replication Controller with a very large number of replicas, and set the anti-affinity field to true preventing more than one pod with that label from being scheduled onto a single node). The disadvantage of these methods is that the DaemonSet and the Replication Controller are very different concepts. The DaemonSet operates on a per-node basis, while the Replication Controller operates on a per-job basis (in particular, the DaemonSet will take action when a node is changed or added). So presenting them as different concepts makes for a better user interface. Having small and directed controllers for distinct purposes makes Kubernetes easier to understand and use, compared to having one controller to rule them all (see ["Convert ReplicationController to a plugin"](http://issues.k8s.io/3058)). ## Design #### Client -- Add support for daemon controller commands to kubectl and the client. Client code was added to client/unversioned. The main files in Kubectl that were modified are kubectl/describe.go and kubectl/stop.go, since for other calls like Get, Create, and Update, the client simply forwards the request to the backend via the REST API. +- Add support for DaemonSet commands to kubectl and the client. Client code was added to client/unversioned. The main files in Kubectl that were modified are kubectl/describe.go and kubectl/stop.go, since for other calls like Get, Create, and Update, the client simply forwards the request to the backend via the REST API. #### Apiserver - Accept, parse, validate client commands - REST API calls are handled in registry/daemon - In particular, the api server will add the object to etcd - DaemonManager listens for updates to etcd (using Framework.informer) -- API objects for Daemon Controller were created in expapi/v1/types.go and expapi/v1/register.go +- API objects for DaemonSet were created in expapi/v1/types.go and expapi/v1/register.go - Validation code is in expapi/validation #### Daemon Manager -- Creates new daemon controllers when requested. Launches the corresponding daemon pod on all nodes with labels matching the new daemon controller’s selector. -- Listens for addition of new nodes to the cluster, by setting up a framework.NewInformer that watches for the creation of Node API objects. When a new node is added, the daemon manager will loop through each daemon controller. If the label of the node matches the selector of the daemon controller, then the daemon manager will create the corresponding daemon pod in the new node. +- Creates new DaemonSets when requested. Launches the corresponding daemon pod on all nodes with labels matching the new DaemonSet’s selector. +- Listens for addition of new nodes to the cluster, by setting up a framework.NewInformer that watches for the creation of Node API objects. When a new node is added, the daemon manager will loop through each DaemonSet. If the label of the node matches the selector of the DaemonSet, then the daemon manager will create the corresponding daemon pod in the new node. - The daemon manager creates a pod on a node by sending a command to the API server, requesting for a pod to be bound to the node (the node will be specified via its hostname) #### Kubelet -- Does not need to be modified, but health checking will occur for the daemon pods and revive the pods if they are killed (we set the pod restartPolicy to Always). We reject Daemon Controller objects with pod templates that don’t have restartPolicy set to Always. - -## Testing - -Unit Tests: -Each component was unit tested, fakes were implemented when necessary. For example, when testing the client, a fake API server was used. - -End to End Tests: -One end-to-end test was implemented. The end-to-end test verified that the daemon manager runs the daemon on every node, that when a daemon pod is stopped it restarts, that the daemon controller can be reaped (stopped), and that the daemon adds/removes daemon pods appropriately from nodes when their labels change. +- Does not need to be modified, but health checking will occur for the daemon pods and revive the pods if they are killed (we set the pod restartPolicy to Always). We reject DaemonSet objects with pod templates that don’t have restartPolicy set to Always. ## Open Issues -- See how this can work with [Deployment design](https://github.com/GoogleCloudPlatform/kubernetes/issues/1743). +- See how this can work with [Deployment design](http://issues.k8s.io/1743). -- cgit v1.2.3 From 8ad8f8cff031cb1072a13aaba55c495b4d6976ec Mon Sep 17 00:00:00 2001 From: Zichang Lin Date: Wed, 23 Sep 2015 14:58:16 +0800 Subject: Change a describe in docs/design/secrets.md --- secrets.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/secrets.md b/secrets.md index 895d9448..e8a5e42f 100644 --- a/secrets.md +++ b/secrets.md @@ -73,7 +73,7 @@ Goals of this design: 2. As a cluster operator, I want to allow a pod to access a Docker registry using credentials from a `.dockercfg` file, so that containers can push images 3. As a cluster operator, I want to allow a pod to access a git repository using SSH keys, - so that I can push and fetch to and from the repository + so that I can push to and fetch from the repository 2. As a user, I want to allow containers to consume supplemental information about services such as username and password which should be kept secret, so that I can share secrets about a service amongst the containers in my application securely -- cgit v1.2.3 From 7b4fa0ae9049528038d68cfdd941b9b40f702334 Mon Sep 17 00:00:00 2001 From: Paul Morie Date: Wed, 23 Sep 2015 14:45:00 -0400 Subject: Add link to dev e2e docs from api_changes doc --- api_changes.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/api_changes.md b/api_changes.md index e0a65fe0..b9fcd392 100644 --- a/api_changes.md +++ b/api_changes.md @@ -508,8 +508,8 @@ doing! ## Write end-to-end tests -This is, sadly, still sort of painful. Talk to us and we'll try to help you -figure out the best way to make sure your cool feature keeps working forever. +Check out the [E2E docs](e2e-tests.md) for detailed information about how to write end-to-end +tests for your feature. ## Examples and docs -- cgit v1.2.3 From 28aa2acb97f52a83e26514b73ee2c201b1a39660 Mon Sep 17 00:00:00 2001 From: Chao Xu Date: Wed, 16 Sep 2015 22:15:05 -0700 Subject: move experimental/v1 to experimental/v1alpha1; use "group/version" in many places where used to expect "version" only. --- extending-api.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/extending-api.md b/extending-api.md index bbd02a54..628b5a16 100644 --- a/extending-api.md +++ b/extending-api.md @@ -114,7 +114,7 @@ For example, if a user creates: ```yaml metadata: name: cron-tab.example.com -apiVersion: experimental/v1 +apiVersion: experimental/v1alpha1 kind: ThirdPartyResource description: "A specification of a Pod to run on a cron style schedule" versions: -- cgit v1.2.3 From 635b078fd44bb12e409674542882bb556e9d5855 Mon Sep 17 00:00:00 2001 From: David Oppenheimer Date: Thu, 24 Sep 2015 16:22:10 -0700 Subject: Respond to reviewer comments. --- daemon.md | 38 +++++++++++++++----------------------- 1 file changed, 15 insertions(+), 23 deletions(-) diff --git a/daemon.md b/daemon.md index c4187c7b..43c49465 100644 --- a/daemon.md +++ b/daemon.md @@ -28,15 +28,15 @@ The DaemonSet supports standard API features: - create - The spec for DaemonSets has a pod template field. - Using the pod’s nodeSelector field, DaemonSets can be restricted to operate over nodes that have a certain label. For example, suppose that in a cluster some nodes are labeled ‘app=database’. You can use a DaemonSet to launch a datastore pod on exactly those nodes labeled ‘app=database’. - - Using the pod's node name field, DaemonSets can be restricted to operate on a specified nodeName. - - The PodTemplateSpec used by the DaemonSet is the same as the PodTemplateSpec usedby the Replication Controller. - - We will not guarantee that daemon pods show up on nodes before regular pods - run ordering is out of scope for this abstraction in the initial implementation. - - The initial implementation of DaemonSet does not guarantee that Daemon pods show up on nodes (for example because of resource limitations of the node), but makes a best effort to launch Daemon pods (like Replication Controllers do with pods). Subsequent revisions might ensure that Daemon pods show up on nodes, preempting other pods if necessary. + - Using the pod's nodeName field, DaemonSets can be restricted to operate on a specified node. + - The PodTemplateSpec used by the DaemonSet is the same as the PodTemplateSpec used by the Replication Controller. + - The initial implementation will not guarnatee that DaemonSet pods are created on nodes before other pods. + - The initial implementation of DaemonSet does not guarantee that DaemonSet pods show up on nodes (for example because of resource limitations of the node), but makes a best effort to launch DaemonSet pods (like Replication Controllers do with pods). Subsequent revisions might ensure that DaemonSet pods show up on nodes, preempting other pods if necessary. - The DaemonSet controller adds an annotation "kubernetes.io/created-by: \" - YAML example: ```YAML apiVersion: v1 - kind: Daemon + kind: DaemonSet metadata: labels: app: datastore @@ -62,36 +62,28 @@ The DaemonSet supports standard API features: - Modifiers - delete (if --cascade=true, then first the client turns down all the pods controlled by the DaemonSet (by setting the nodeName to a non-existant name); then it deletes the DaemonSet; then it deletes the pods) - label - - update (only allowed to selector and to nodeSelector and nodeName of pod template) - - DaemonSets have labels, so you could, for example, list all DaemonSets with a certain label (the same way you would for a Replication Controller). + - annotate + - update operations like patch and replace (only allowed to selector and to nodeSelector and nodeName of pod template) + - DaemonSets have labels, so you could, for example, list all DaemonSets with certain labels (the same way you would for a Replication Controller). - In general, for all the supported features like get, describe, update, etc, the DaemonSet works in a similar way to the Replication Controller. However, note that the DaemonSet and the Replication Controller are different constructs. ### Persisting Pods - - Ordinary livenes probes specified in the pod template work to keep pods created by a DaemonSet running. + - Ordinary liveness probes specified in the pod template work to keep pods created by a DaemonSet running. - If a daemon pod is killed or stopped, the DaemonSet will create a new replica of the daemon pod on the node. ### Cluster Mutations - - When a new node is added to the cluster the DaemonSet starts the daemon on the node (if the node’s labels match the user-specified selectors). + - When a new node is added to the cluster, the DaemonSet controller starts daemon pods on the node for DaemonSets whose pod template nodeSelectors match the node’s labels. - Suppose the user launches a DaemonSet that runs a logging daemon on all nodes labeled “logger=fluentd”. If the user then adds the “logger=fluentd” label to a node (that did not initially have the label), the logging daemon will launch on the node. Additionally, if a user removes the label from a node, the logging daemon on that node will be killed. ## Alternatives Considered -An alternative way to launch daemons is to avoid going through the API server, and instead provide ways to package the daemon into the node. For example, users could: +We considered several alternatives, that were deemed inferior to the approach of creating a new DaemonSet abstraction. -1. Include the daemon in the machine image -2. Use static pod manifests to launch daemon pods when the node initializes +One alternative is to include the daemon in the machine image. In this case it would run outside of Kubernetes proper, and thus not be monitored, health checked, usable as a service endpoint, easily upgradable, etc. -These alternatives don’t work as well because the daemons won’t be well integrated into Kubernetes. In particular, +A related alternative is to package daemons as static pods. This would address most of the problems described above, but they would still not be easily upgradable, and more generally could not be managed through the API server interface. -1. In alternatives (1) and (2), health checking for the daemons would need to be re-implemented, or would not exist at all (because the daemons are not run inside pods). In the current proposal, the Kubelet will health-check daemon pods and restart them if necessary. -2. In alternatives (1) and (2), binding services to a group of daemons is difficult (which is needed in use cases such as the sharded data store use case described above), because the daemons are not run inside pods -3. A big disadvantage of these methods is that adding new daemons in existing nodes is difficult (for example, if a cluster manager wants to add a logging daemon after a cluster has been deployed). -4. The above alternatives are less user-friendly. Users need to learn two ways of launching pods: using the API when launching pods associated with Replication Controllers, and using manifests when launching daemons. So in the alternatives, deployment is more difficult. -5. It’s difficult to upgrade binaries launched in any of those three ways. - -Another alternative is for the user to explicitly assign pods to specific nodes (using the Pod spec) when creating pods. A big disadvantage of this alternative is that the user would need to manually check whether new nodes satisfy the desired labels, and if so add the daemon to the node. This makes deployment painful, and could lead to costly mistakes (if a certain daemon is not launched on a new node which it is supposed to run on). In essence, every user will be re-implementing the DaemonSet for themselves. - -A third alternative is to generalize the Replication Controller. We could add a field for the user to specify that she wishes to bind pods to certain nodes in the cluster. Or we could add a field to the pod-spec allowing the user to specify that each node can have exactly one instance of a pod (so the user would create a Replication Controller with a very large number of replicas, and set the anti-affinity field to true preventing more than one pod with that label from being scheduled onto a single node). The disadvantage of these methods is that the DaemonSet and the Replication Controller are very different concepts. The DaemonSet operates on a per-node basis, while the Replication Controller operates on a per-job basis (in particular, the DaemonSet will take action when a node is changed or added). So presenting them as different concepts makes for a better user interface. Having small and directed controllers for distinct purposes makes Kubernetes easier to understand and use, compared to having one controller to rule them all (see ["Convert ReplicationController to a plugin"](http://issues.k8s.io/3058)). +A third alternative is to generalize the Replication Controller. We would do something like: if you set the `replicas` field of the ReplicationConrollerSpec to -1, then it means "run exactly one replica on every node matching the nodeSelector in the pod template." The ReplicationController would pretend `replicas` had been set to some large number -- larger than the largest number of nodes ever expected in the cluster -- and would use some anti-affinity mechanism to ensure that no more than one Pod from the ReplicationController runs on any given node. There are two downsides to this approach. First, there would always be a large number of Pending pods in the scheduler (these will be scheduled onto new machines when they are added to the cluster). The second downside is more philosophical: DaemonSet and the Replication Controller are very different concepts. We believe that having small, targeted controllers for distinct purposes makes Kubernetes easier to understand and use, compared to having larger multi-functional controllers (see ["Convert ReplicationController to a plugin"](http://issues.k8s.io/3058) for some discussion of this topic). ## Design @@ -115,4 +107,4 @@ A third alternative is to generalize the Replication Controller. We could add a - Does not need to be modified, but health checking will occur for the daemon pods and revive the pods if they are killed (we set the pod restartPolicy to Always). We reject DaemonSet objects with pod templates that don’t have restartPolicy set to Always. ## Open Issues -- See how this can work with [Deployment design](http://issues.k8s.io/1743). +- Should work similarly to [Deployment](http://issues.k8s.io/1743). -- cgit v1.2.3 From f2ae9d3ebcb40e07c308210b93e1bce2992e3ff0 Mon Sep 17 00:00:00 2001 From: David Oppenheimer Date: Thu, 24 Sep 2015 17:17:39 -0700 Subject: Ran update-generated-docs.sh --- daemon.md | 51 ++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 50 insertions(+), 1 deletion(-) diff --git a/daemon.md b/daemon.md index 43c49465..c88fcec7 100644 --- a/daemon.md +++ b/daemon.md @@ -1,3 +1,36 @@ + + + + +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest 1.0.x release of this document can be found +[here](http://releases.k8s.io/release-1.0/docs/design/daemon.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- + + + + + # DaemonSet in Kubernetes **Author**: Ananya Kumar (@AnanyaKumar) @@ -8,16 +41,18 @@ This document presents the design of the Kubernetes DaemonSet, describes use cas ## Motivation -Many users have requested for a way to run a daemon on every node in a Kubernetes cluster, or on a certain set of nodes in a cluster. This is essential for use cases such as building a sharded datastore, or running a logger on every node. In comes the DaemonSet, a way to conveniently create and manage daemon-like workloads in Kubernetes. +Many users have requested for a way to run a daemon on every node in a Kubernetes cluster, or on a certain set of nodes in a cluster. This is essential for use cases such as building a sharded datastore, or running a logger on every node. In comes the DaemonSet, a way to conveniently create and manage daemon-like workloads in Kubernetes. ## Use Cases The DaemonSet can be used for user-specified system services, cluster-level applications with strong node ties, and Kubernetes node services. Below are example use cases in each category. ### User-Specified System Services: + Logging: Some users want a way to collect statistics about nodes in a cluster and send those logs to an external database. For example, system administrators might want to know if their machines are performing as expected, if they need to add more machines to the cluster, or if they should switch cloud providers. The DaemonSet can be used to run a data collection service (for example fluentd) on every node and send the data to a service like ElasticSearch for analysis. ### Cluster-Level Applications + Datastore: Users might want to implement a sharded datastore in their cluster. A few nodes in the cluster, labeled ‘app=datastore’, might be responsible for storing data shards, and pods running on these nodes might serve data. This architecture requires a way to bind pods to specific nodes, so it cannot be achieved using a Replication Controller. A DaemonSet is a convenient way to implement such a datastore. For other uses, see the related [feature request](https://issues.k8s.io/1518) @@ -34,6 +69,7 @@ The DaemonSet supports standard API features: - The initial implementation of DaemonSet does not guarantee that DaemonSet pods show up on nodes (for example because of resource limitations of the node), but makes a best effort to launch DaemonSet pods (like Replication Controllers do with pods). Subsequent revisions might ensure that DaemonSet pods show up on nodes, preempting other pods if necessary. - The DaemonSet controller adds an annotation "kubernetes.io/created-by: \" - YAML example: + ```YAML apiVersion: v1 kind: DaemonSet @@ -56,6 +92,7 @@ The DaemonSet supports standard API features: - containerPort: 9042 name: main ``` + - commands that get info - get (e.g. kubectl get daemonsets) - describe @@ -68,10 +105,12 @@ The DaemonSet supports standard API features: - In general, for all the supported features like get, describe, update, etc, the DaemonSet works in a similar way to the Replication Controller. However, note that the DaemonSet and the Replication Controller are different constructs. ### Persisting Pods + - Ordinary liveness probes specified in the pod template work to keep pods created by a DaemonSet running. - If a daemon pod is killed or stopped, the DaemonSet will create a new replica of the daemon pod on the node. ### Cluster Mutations + - When a new node is added to the cluster, the DaemonSet controller starts daemon pods on the node for DaemonSets whose pod template nodeSelectors match the node’s labels. - Suppose the user launches a DaemonSet that runs a logging daemon on all nodes labeled “logger=fluentd”. If the user then adds the “logger=fluentd” label to a node (that did not initially have the label), the logging daemon will launch on the node. Additionally, if a user removes the label from a node, the logging daemon on that node will be killed. @@ -88,9 +127,11 @@ A third alternative is to generalize the Replication Controller. We would do som ## Design #### Client + - Add support for DaemonSet commands to kubectl and the client. Client code was added to client/unversioned. The main files in Kubectl that were modified are kubectl/describe.go and kubectl/stop.go, since for other calls like Get, Create, and Update, the client simply forwards the request to the backend via the REST API. #### Apiserver + - Accept, parse, validate client commands - REST API calls are handled in registry/daemon - In particular, the api server will add the object to etcd @@ -99,12 +140,20 @@ A third alternative is to generalize the Replication Controller. We would do som - Validation code is in expapi/validation #### Daemon Manager + - Creates new DaemonSets when requested. Launches the corresponding daemon pod on all nodes with labels matching the new DaemonSet’s selector. - Listens for addition of new nodes to the cluster, by setting up a framework.NewInformer that watches for the creation of Node API objects. When a new node is added, the daemon manager will loop through each DaemonSet. If the label of the node matches the selector of the DaemonSet, then the daemon manager will create the corresponding daemon pod in the new node. - The daemon manager creates a pod on a node by sending a command to the API server, requesting for a pod to be bound to the node (the node will be specified via its hostname) #### Kubelet + - Does not need to be modified, but health checking will occur for the daemon pods and revive the pods if they are killed (we set the pod restartPolicy to Always). We reject DaemonSet objects with pod templates that don’t have restartPolicy set to Always. ## Open Issues + - Should work similarly to [Deployment](http://issues.k8s.io/1743). + + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/daemon.md?pixel)]() + -- cgit v1.2.3 From 1c186d172e362fbfe4fff339f1f7d038ab40dd47 Mon Sep 17 00:00:00 2001 From: Paul Morie Date: Thu, 24 Sep 2015 02:06:52 -0400 Subject: Proposal for pod level security context and backward compatibility --- pod-security-context.md | 407 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 407 insertions(+) create mode 100644 pod-security-context.md diff --git a/pod-security-context.md b/pod-security-context.md new file mode 100644 index 00000000..95e60856 --- /dev/null +++ b/pod-security-context.md @@ -0,0 +1,407 @@ + + + + +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest 1.0.x release of this document can be found +[here](http://releases.k8s.io/release-1.0/docs/proposals/pod-security-context.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- + + + + + +## Abstract + +A proposal for refactoring `SecurityContext` to have pod-level and container-level attributes in +order to correctly model pod- and container-level security concerns. + +## Motivation + +Currently, containers have a `SecurityContext` attribute which contains information about the +security settings the container uses. In practice many of these attributes are uniform across all +containers in a pod. Simultaneously, there is also a need to apply the security context pattern +at the pod level to correctly model security attributes that apply only at a pod level. + +Users should be able to: + +1. Express security settings that are applicable to the entire pod +2. Express base security settings that apply to all containers +3. Override only the settings that need to be differentiated from the base in individual + containers + +This proposal is a dependency for other changes related to security context: + +1. [Volume ownership management in the Kubelet](https://github.com/kubernetes/kubernetes/pull/12944) +2. [Generic SELinux label management in the Kubelet](https://github.com/kubernetes/kubernetes/pull/14192) + +Goals of this design: + +1. Describe the use cases for which a pod-level security context is necessary +2. Thoroughly describe the API backward compatibility issues that arise from the introduction of + a pod-level security context +3. Describe all implementation changes necessary for the feature + +## Constraints and assumptions + +1. We will not design for intra-pod security; we are not currently concerned about isolating + containers in the same pod from one another +1. We will design for backward compatibility with the current V1 API + +## Use Cases + +1. As a developer, I want to correctly model security attributes which belong to an entire pod +2. As a user, I want to be able to specify container attributes that apply to all containers + without repeating myself +3. As an existing user, I want to be able to use the existing container-level security API + +### Use Case: Pod level security attributes + +Some security attributes make sense only to model at the pod level. For example, it is a +fundamental property of pods that all containers in a pod share the same network namespace. +Therefore, using the host namespace makes sense to model at the pod level only, and indeed, today +it is part of the `PodSpec`. Other host namespace support is currently being added and these will +also be pod-level settings; it makes sense to model them as a pod-level collection of security +attributes. + +## Use Case: Override pod security context for container + +Some use cases require the containers in a pod to run with different security settings. As an +example, a user may want to have a pod with two containers, one of which runs as root with the +privileged setting, and one that runs as a non-root UID. To support use cases like this, it should +be possible to override appropriate (ie, not intrinsically pod-level) security settings for +individual containers. + +## Proposed Design + +### SecurityContext + +For posterity and ease of reading, note the current state of `SecurityContext`: + +```go +package api + +type Container struct { + // Other fields omitted + + // Optional: SecurityContext defines the security options the pod should be run with + SecurityContext *SecurityContext `json:"securityContext,omitempty"` +} + +type SecurityContext struct { + // Capabilities are the capabilities to add/drop when running the container + Capabilities *Capabilities `json:"capabilities,omitempty"` + + // Run the container in privileged mode + Privileged *bool `json:"privileged,omitempty"` + + // SELinuxOptions are the labels to be applied to the container + // and volumes + SELinuxOptions *SELinuxOptions `json:"seLinuxOptions,omitempty"` + + // RunAsUser is the UID to run the entrypoint of the container process. + RunAsUser *int64 `json:"runAsUser,omitempty"` + + // RunAsNonRoot indicates that the container should be run as a non-root user. If the RunAsUser + // field is not explicitly set then the kubelet may check the image for a specified user or + // perform defaulting to specify a user. + RunAsNonRoot bool `json:"runAsNonRoot,omitempty"` +} + +// SELinuxOptions contains the fields that make up the SELinux context of a container. +type SELinuxOptions struct { + // SELinux user label + User string `json:"user,omitempty"` + + // SELinux role label + Role string `json:"role,omitempty"` + + // SELinux type label + Type string `json:"type,omitempty"` + + // SELinux level label. + Level string `json:"level,omitempty"` +} +``` + +### PodSecurityContext + +`PodSecurityContext` specifies two types of security attributes: + +1. Attributes that apply to the pod itself +2. Attributes that apply to the containers of the pod + +In the internal API, fields of the `PodSpec` controlling the use of the host PID, IPC, and network +namespaces are relocated to this type: + +```go +package api + +type PodSpec struct { + // Other fields omitted + + // Optional: SecurityContext specifies pod-level attributes and container security attributes + // that apply to all containers. + SecurityContext *PodSecurityContext `json:"securityContext,omitempty"` +} + +// PodSecurityContext specifies security attributes of the pod and container attributes that apply +// to all containers of the pod. +type PodSecurityContext struct { + // Use the host's network namespace. If this option is set, the ports that will be + // used must be specified. + // Optional: Default to false. + HostNetwork bool + // Use the host's IPC namespace + HostIPC bool + + // Use the host's PID namespace + HostPID bool + + // Capabilities are the capabilities to add/drop when running containers + Capabilities *Capabilities `json:"capabilities,omitempty"` + + // Run the container in privileged mode + Privileged *bool `json:"privileged,omitempty"` + + // SELinuxOptions are the labels to be applied to the container + // and volumes + SELinuxOptions *SELinuxOptions `json:"seLinuxOptions,omitempty"` + + // RunAsUser is the UID to run the entrypoint of the container process. + RunAsUser *int64 `json:"runAsUser,omitempty"` + + // RunAsNonRoot indicates that the container should be run as a non-root user. If the RunAsUser + // field is not explicitly set then the kubelet may check the image for a specified user or + // perform defaulting to specify a user. + RunAsNonRoot bool +} + +// Comments and generated docs will change for the container.SecurityContext field to indicate +// the precedence of these fields over the pod-level ones. + +type Container struct { + // Other fields omitted + + // Optional: SecurityContext defines the security options the pod should be run with. + // Settings specified in this field take precedence over the settings defined in + // pod.Spec.SecurityContext. + SecurityContext *SecurityContext `json:"securityContext,omitempty"` +} +``` + +In the V1 API, the pod-level security attributes which are currently fields of the `PodSpec` are +retained on the `PodSpec` for backward compatibility purposes: + +```go +package v1 + +type PodSpec struct { + // Other fields omitted + + // Use the host's network namespace. If this option is set, the ports that will be + // used must be specified. + // Optional: Default to false. + HostNetwork bool `json:"hostNetwork,omitempty"` + // Use the host's pid namespace. + // Optional: Default to false. + HostPID bool `json:"hostPID,omitempty"` + // Use the host's ipc namespace. + // Optional: Default to false. + HostIPC bool `json:"hostIPC,omitempty"` + + // Optional: SecurityContext specifies pod-level attributes and container security attributes + // that apply to all containers. + SecurityContext *PodSecurityContext `json:"securityContext,omitempty"` +} +``` + +The `pod.Spec.SecurityContext` specifies the security context of all containers in the pod. +The containers' `securityContext` field is overlaid on the base security context to determine the +effective security context for the container. + +The new V1 API should be backward compatible with the existing API. Backward compatibility is +defined as: + +> 1. Any API call (e.g. a structure POSTed to a REST endpoint) that worked before your change must +> work the same after your change. +> 2. Any API call that uses your change must not cause problems (e.g. crash or degrade behavior) when +> issued against servers that do not include your change. +> 3. It must be possible to round-trip your change (convert to different API versions and back) with +> no loss of information. + +Previous versions of this proposal attempted to deal with backward compatiblity by defining +the affect of setting the pod-level fields on the container-level fields. While trying to find +consensus on this design, it became apparent that this approach was going to be extremely complex +to implement, explain, and support. Instead, we will approach backward compatibility as follows: + +1. Pod-level and container-level settings will not affect one another +2. Old clients will be able to use container-level settings in the exact same way +3. Container level settings always override pod-level settings if they are set + +#### Examples + +1. Old client using `pod.Spec.Containers[x].SecurityContext` + + An old client creates a pod: + + ```yaml + apiVersion: v1 + kind: Pod + metadata: + name: test-pod + spec: + containers: + - name: a + securityContext: + runAsUser: 1001 + - name: b + securityContest: + runAsUser: 1002 + ``` + + looks to old clients like: + + ```yaml + apiVersion: v1 + kind: Pod + metadata: + name: test-pod + spec: + containers: + - name: a + securityContext: + runAsUser: 1001 + - name: b + securityContext: + runAsUser: 1002 + ``` + + looks to new clients like: + + ```yaml + apiVersion: v1 + kind: Pod + metadata: + name: test-pod + spec: + containers: + - name: a + securityContext: + runAsUser: 1001 + - name: b + securityContext: + runAsUser: 1002 + ``` + +2. New client using `pod.Spec.SecurityContext` + + A new client creates a pod using a field of `pod.Spec.SecurityContext`: + + ```yaml + apiVersion: v1 + kind: Pod + metadata: + name: test-pod + spec: + securityContext: + runAsUser: 1001 + containers: + - name: a + - name: b + ``` + + appears to new clients as: + + ```yaml + apiVersion: v1 + kind: Pod + metadata: + name: test-pod + spec: + securityContext: + runAsUser: 1001 + containers: + - name: a + - name: b + ``` + + old clients will see: + + ```yaml + apiVersion: v1 + kind: Pod + metadata: + name: test-pod + spec: + containers: + - name: a + - name: b + ``` + +3. Pods created using `pod.Spec.SecurityContext` and `pod.Spec.Containers[x].SecurityContext` + + If a field is set in both `pod.Spec.SecurityContext` and + `pod.Spec.Containers[x].SecurityContext`, the value in `pod.Spec.Containers[x].SecurityContext` + wins. In the following pod: + + ```yaml + apiVersion: v1 + kind: Pod + metadata: + name: test-pod + spec: + securityContext: + runAsUser: 1001 + containers: + - name: a + securityContext: + runAsUser: 1002 + - name: b + ``` + + The effective setting for `runAsUser` for container A is `1002`. + +#### Testing + +A backward compatibility test suite will be established for the v1 API. The test suite will +verify compatibility by converting objects into the internal API and back to the version API and +examining the results. + +All of the examples here will be used as test-cases. As more test cases are added, the proposal will +be updated. + +An example of a test like this can be found in the +[OpenShift API package](https://github.com/openshift/origin/blob/master/pkg/api/compatibility_test.go) + +E2E test cases will be added to test the correct determination of the security context for containers. + +### Kubelet changes + +1. The Kubelet will use the new fields on the `PodSecurityContext` for host namespace control +2. The Kubelet will be modified to correctly implement the backward compatibility and effective + security context determination defined here + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/pod-security-context.md?pixel)]() + -- cgit v1.2.3 From 1f39f5da141250fbc46caa7f701b36ec82ab64c1 Mon Sep 17 00:00:00 2001 From: liguangbo Date: Mon, 28 Sep 2015 16:00:43 +0800 Subject: Change Oom to OOM --- resource-qos.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/resource-qos.md b/resource-qos.md index f76a0306..52671821 100644 --- a/resource-qos.md +++ b/resource-qos.md @@ -149,7 +149,7 @@ Container OOM score configuration - Hack, because these critical tasks might die if they conflict with guaranteed containers. in the future, we should place all user-pods into a separate cgroup, and set a limit on the memory they can consume. Setting OOM_SCORE_ADJ for a container -- Refactor existing ApplyOomScoreAdj to util/oom.go +- Refactor existing ApplyOOMScoreAdj to util/oom.go - To set OOM_SCORE_ADJ of a container, we loop through all processes in the container, and set OOM_SCORE_ADJ - We keep looping until the list of processes in the container stabilizes. This is sufficient because child processes inherit OOM_SCORE_ADJ. -- cgit v1.2.3 From d3d7bf18668c2ea71583ca8a86a2470a5aa46b8f Mon Sep 17 00:00:00 2001 From: feihujiang Date: Wed, 30 Sep 2015 09:49:29 +0800 Subject: Fix wrong URL in cli-roadmap doc --- cli-roadmap.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/cli-roadmap.md b/cli-roadmap.md index 42784dbc..2b713260 100644 --- a/cli-roadmap.md +++ b/cli-roadmap.md @@ -35,8 +35,8 @@ Documentation for other releases can be found at See github issues with the following labels: * [area/app-config-deployment](https://github.com/kubernetes/kubernetes/labels/area/app-config-deployment) -* [component/CLI](https://github.com/kubernetes/kubernetes/labels/component/CLI) -* [component/client](https://github.com/kubernetes/kubernetes/labels/component/client) +* [component/kubectl](https://github.com/kubernetes/kubernetes/labels/component/kubectl) +* [component/clientlib](https://github.com/kubernetes/kubernetes/labels/component/clientlib) -- cgit v1.2.3 From d29c41354ec8aec136df95d3a9a03b6807b7bcd6 Mon Sep 17 00:00:00 2001 From: HaiyangDING Date: Tue, 29 Sep 2015 17:44:26 +0800 Subject: Replace PodFitsPorts with PodFitsHostPorts --- scheduler_algorithm.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/scheduler_algorithm.md b/scheduler_algorithm.md index 7964ab33..d6a8b6c5 100755 --- a/scheduler_algorithm.md +++ b/scheduler_algorithm.md @@ -41,7 +41,7 @@ The purpose of filtering the nodes is to filter out the nodes that do not meet c - `NoDiskConflict`: Evaluate if a pod can fit due to the volumes it requests, and those that are already mounted. - `PodFitsResources`: Check if the free resource (CPU and Memory) meets the requirement of the Pod. The free resource is measured by the capacity minus the sum of requests of all Pods on the node. To learn more about the resource QoS in Kubernetes, please check [QoS proposal](../proposals/resource-qos.md). -- `PodFitsPorts`: Check if any HostPort required by the Pod is already occupied on the node. +- `PodFitsHostPorts`: Check if any HostPort required by the Pod is already occupied on the node. - `PodFitsHost`: Filter out all nodes except the one specified in the PodSpec's NodeName field. - `PodSelectorMatches`: Check if the labels of the node match the labels specified in the Pod's `nodeSelector` field ([Here](../user-guide/node-selection/) is an example of how to use `nodeSelector` field). - `CheckNodeLabelPresence`: Check if all the specified labels exist on a node or not, regardless of the value. -- cgit v1.2.3 From 279e87daf527ab15ab7812ded65e7d59d9785164 Mon Sep 17 00:00:00 2001 From: Paul Morie Date: Fri, 18 Sep 2015 14:45:48 -0400 Subject: Proposal for pod-level supplemental group and volume ownership mangement --- volumes.md | 515 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 515 insertions(+) create mode 100644 volumes.md diff --git a/volumes.md b/volumes.md new file mode 100644 index 00000000..b340acc0 --- /dev/null +++ b/volumes.md @@ -0,0 +1,515 @@ + + + + +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest 1.0.x release of this document can be found +[here](http://releases.k8s.io/release-1.0/docs/proposals/volumes.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- + + + + + +## Abstract + +A proposal for sharing volumes between containers in a pod using a special supplemental group. + +## Motivation + +Kubernetes volumes should be usable regardless of the UID a container runs as. This concern cuts +across all volume types, so the system should be able to handle them in a generalized way to provide +uniform functionality across all volume types and lower the barrier to new plugins. + +Goals of this design: + +1. Enumerate the different use-cases for volume usage in pods +2. Define the desired goal state for ownership and permission management in Kubernetes +3. Describe the changes necessary to acheive desired state + +## Constraints and Assumptions + +1. When writing permissions in this proposal, `D` represents a don't-care value; example: `07D0` + represents permissions where the owner has `7` permissions, all has `0` permissions, and group + has a don't-care value +2. Read-write usability of a volume from a container is defined as one of: + 1. The volume is owned by the container's effective UID and has permissions `07D0` + 2. The volume is owned by the container's effective GID or one of its supplemental groups and + has permissions `0D70` +3. Volume plugins should not have to handle setting permissions on volumes +5. Preventing two containers within a pod from reading and writing to the same volume (by choosing + different container UIDs) is not something we intend to support today +6. We will not design to support multiple processes running in a single container as different + UIDs; use cases that require work by different UIDs should be divided into different pods for + each UID + +## Current State Overview + +### Kubernetes + +Kubernetes volumes can be divided into two broad categories: + +1. Unshared storage: + 1. Volumes created by the kubelet on the host directory: empty directory, git repo, secret, + downward api. All volumes in this category delegate to `EmptyDir` for their underlying + storage. These volumes are created with ownership `root:root`. + 2. Volumes based on network block devices: AWS EBS, iSCSI, RBD, etc, *when used exclusively + by a single pod*. +2. Shared storage: + 1. `hostPath` is shared storage because it is necessarily used by a container and the host + 2. Network file systems such as NFS, Glusterfs, Cephfs, etc. For these volumes, the ownership + is determined by the configuration of the shared storage system. + 3. Block device based volumes in `ReadOnlyMany` or `ReadWriteMany` modes are shared because + they may be used simultaneously by multiple pods. + +The `EmptyDir` volume was recently modified to create the volume directory with `0777` permissions +from `0750` to support basic usability of that volume as a non-root UID. + +### Docker + +Docker recently added supplemental group support. This adds the ability to specify additional +groups that a container should be part of, and will be released with Docker 1.8. + +There is a [proposal](https://github.com/docker/docker/pull/14632) to add a bind-mount flag to tell +Docker to change the ownership of a volume to the effective UID and GID of a container, but this has +not yet been accepted. + +### Rocket + +Rocket +[image manifests](https://github.com/appc/spec/blob/master/spec/aci.md#image-manifest-schema) can +specify users and groups, similarly to how a Docker image can. A Rocket +[pod manifest](https://github.com/appc/spec/blob/master/spec/pods.md#pod-manifest-schema) can also +override the default user and group specified by the image manifest. + +Rocket does not currently support supplemental groups or changing the owning UID or +group of a volume, but it has been [requested](https://github.com/coreos/rkt/issues/1309). + +## Use Cases + +1. As a user, I want the system to set ownership and permissions on volumes correctly to enable + reads and writes with the following scenarios: + 1. All containers running as root + 2. All containers running as the same non-root user + 3. Multiple containers running as a mix of root and non-root users + +### All containers running as root + +For volumes that only need to be used by root, no action needs to be taken to change ownership or +permissions, but setting the ownership based on the supplemental group shared by all containers in a +pod will also work. For situations where read-only access to a shared volume is required from one +or more containers, the `VolumeMount`s in those containers should have the `readOnly` field set. + +### All containers running as a single non-root user + +In use cases whether a volume is used by a single non-root UID the volume ownership and permissions +should be set to enable read/write access. + +Currently, a non-root UID will not have permissions to write to any but an `EmptyDir` volume. +Today, users that need this case to work can: + +1. Grant the container the necessary capabilities to `chown` and `chmod` the volume: + - `CAP_FOWNER` + - `CAP_CHOWN` + - `CAP_DAC_OVERRIDE` +2. Run a wrapper script that runs `chown` and `chmod` commands to set the desired ownership and + permissions on the volume before starting their main process + +This workaround has significant drawbacks: + +1. It grants powerful kernel capabilities to the code in the image and thus is not securing, + defeating the reason containers are run as non-root users +2. The user experience is poor; it requires changing Dockerfile, adding a layer, or modifying the + container's command + +Some cluster operators manage the ownership of shared storage volumes on the server side. +In this scenario, the UID of the container using the volume is known in advance. The ownership of +the volume is set to match the container's UID on the server side. + +### Containers running as a mix of root and non-root users + +If the list of UIDs that need to use a volume includes both root and non-root users, supplemental +groups can be applied to enable sharing volumes between containers. The ownership and permissions +`root: 2770` will make a volume usable from both containers running as root and +running as a non-root UID and the supplemental group. The setgid bit is used to ensure that files +created in the volume will inherit the owning GID of the volume. + +## Community Design Discussion + +- [kubernetes/2630](https://github.com/GoogleCloudPlatform/kubernetes/issues/2630) +- [kubernetes/11319](https://github.com/GoogleCloudPlatform/kubernetes/issues/11319) +- [kubernetes/9384](https://github.com/GoogleCloudPlatform/kubernetes/pull/9384) + +## Analysis + +The system needs to be able to: + +1. Model correctly which volumes require ownership management +1. Determine the correct ownership of each volume in a pod if required +1. Set the ownership and permissions on volumes when required + +### Modeling whether a volume requires ownership management + +#### Unshared storage: volumes derived from `EmptyDir` + +Since Kubernetes creates `EmptyDir` volumes, it should ensure the ownership is set to enable the +volumes to be usable for all of the above scenarios. + +#### Unshared storage: network block devices + +Volume plugins based on network block devices such as AWS EBS and RBS can be treated the same way +as local volumes. Since inodes are written to these block devices in the same way as `EmptyDir` +volumes, permissions and ownership can be managed on the client side by the Kubelet when used +exclusively by one pod. When the volumes are used outside of a persistent volume, or with the +`ReadWriteOnce` mode, they are effectively unshared storage. + +When used by multiple pods, there are many additional use-cases to analyze before we can be +confident that we can support ownership management robustly with these file systems. The right +design is one that makes it easy to experiment and develop support for ownership management with +volume plugins to enable developers and cluster operators to continue exploring these issues. + +#### Shared storage: hostPath + +The `hostPath` volume should only be used by effective-root users, and the permissions of paths +exposed into containers via hostPath volumes should always be managed by the cluster operator. If +the Kubelet managed the ownership for `hostPath` volumes, a user who could create a `hostPath` +volume could affect changes in the state of arbitrary paths within the host's filesystem. This +would be a severe security risk, so we will consider hostPath a corner case that the kubelet should +never perform ownership management for. + +#### Shared storage + +Ownership management of shared storage is a complex topic. Ownership for existing shared storage +will be managed externally from Kubernetes. For this case, our API should make it simple to express +whether a particular volume should have these concerns managed by Kubernetes. + +We will not attempt to address the ownership and permissions concerns of new shared storage +in this proposal. + +When a network block device is used as a persistent volume in `ReadWriteMany` or `ReadOnlyMany` +modes, it is shared storage, and thus outside the scope of this proposal. + +#### Plugin API requirements + +From the above, we know that some volume plugins will 'want' ownership management from the Kubelet +and others will not. Plugins should be able to opt in to ownership management from the Kubelet. To +facilitate this, there should be a method added to the `volume.Plugin` interface that the Kubelet +uses to determine whether to perform ownership management for a volume. + +### Determining correct ownership of a volume + +Using the approach of a pod-level supplemental group to own volumes solves the problem in any of the +cases of UID/GID combinations within a pod. Since this is the simplest approach that handles all +use-cases, our solution will be made in terms of it. + +Eventually, Kubernetes should allocate a unique group for each pod so that a pod's volumes are +usable by that pod's containers, but not by containers of another pod. The supplemental group used +to share volumes must be unique in a multitenant cluster. If uniqueness is enforced at the host +level, pods from one host may be able to use shared filesystems meant for pods on another host. + +Eventually, Kubernetes should integrate with external identity management systems to populate pod +specs with the right supplemental groups necessary to use shared volumes. In the interim until the +identity management story is far enough along to implement this type of integration, we will rely +on being able to set arbitrary groups. (Note: as of this writing, a PR is being prepared for +setting arbitrary supplemental groups). + +An admission controller could handle allocating groups for each pod and setting the group in the +pod's security context. + +#### A note on the root group + +Today, by default, all docker containers are run in the root group (GID 0). This is relied on by +image authors that make images to run with a range of UIDs: they set the group ownership for +important paths to be the root group, so that containers running as GID 0 *and* an arbitrary UID +can read and write to those paths normally. + +It is important to note that the changes proposed here will not affect the primary GID of +containers in pods. Setting the `pod.Spec.SecurityContext.FSGroup` field will not +override the primary GID and should be safe to use in images that expect GID 0. + +### Setting ownership and permissions on volumes + +For `EmptyDir`-based volumes and unshared storage, `chown` and `chmod` on the node are sufficient to +set ownershp and permissions. Shared storage is different because: + +1. Shared storage may not live on the node a pod that uses it runs on +2. Shared storage may be externally managed + +## Proposed design: + +Our design should minimize code for handling ownership required in the Kubelet and volume plugins. + +### API changes + +We should not interfere with images that need to run as a particular UID or primary GID. A pod +level supplemental group allows us to express a group that all containers in a pod run as in a way +that is orthogonal to the primary UID and GID of each container process. + +```go +package api + +type PodSecurityContext struct { + // FSGroup is a supplemental group that all containers in a pod run under. This group will own + // volumes that the Kubelet manages ownership for. If this is not specified, the Kubelet will + // not set the group ownership of any volumes. + FSGroup *int64 `json:"supplementalGroup"` +} +``` + +The V1 API will be extended with the same field: + +```go +package v1 + +type PodSecurityContext struct { + // FSGroup is a supplemental group that all containers in a pod run under. This group will own + // volumes that the Kubelet manages ownership for. If this is not specified, the Kubelet will + // not set the group ownership of any volumes. + FSGroup *int64 `json:"supplementalGroup"` +} +``` + +The values that can be specified for the `pod.Spec.SecurityContext.FSGroup` field are governed by +[pod security policy](https://github.com/kubernetes/kubernetes/pull/7893). + +#### API backward compatibility + +Pods created by old clients will have the `pod.Spec.SecurityContext.FSGroup` field unset; +these pods will not have their volumes managed by the Kubelet. Old clients will not be able to set +or read the `pod.Spec.SecurityContext.FSGroup` field. + +### Volume changes + +The `volume.Builder` interface should have a new method added that indicates whether the plugin +supports ownership management: + +```go +package volume + +type Builder interface { + // other methods omitted + + // SupportsOwnershipManagement indicates that this volume supports having ownership + // and permissions managed by the Kubelet; if true, the caller may manipulate UID + // or GID of this volume. + SupportsOwnershipManagement() bool +} +``` + +In the first round of work, only `hostPath` and `emptyDir` and its derivations will be tested with +ownership management support: + +| Plugin Name | SupportsOwnershipManagement | +|-------------------------|-------------------------------| +| `hostPath` | false | +| `emptyDir` | true | +| `gitRepo` | true | +| `secret` | true | +| `downwardAPI` | true | +| `gcePersistentDisk` | false | +| `awsElasticBlockStore` | false | +| `nfs` | false | +| `iscsi` | false | +| `glusterfs` | false | +| `persistentVolumeClaim` | depends on underlying volume and PV mode | +| `rbd` | false | +| `cinder` | false | +| `cephfs` | false | + +Ultimately, the matrix will theoretically look like: + +| Plugin Name | SupportsOwnershipManagement | +|-------------------------|-------------------------------| +| `hostPath` | false | +| `emptyDir` | true | +| `gitRepo` | true | +| `secret` | true | +| `downwardAPI` | true | +| `gcePersistentDisk` | true | +| `awsElasticBlockStore` | true | +| `nfs` | false | +| `iscsi` | true | +| `glusterfs` | false | +| `persistentVolumeClaim` | depends on underlying volume and PV mode | +| `rbd` | true | +| `cinder` | false | +| `cephfs` | false | + +### Kubelet changes + +The Kubelet should be modified to perform ownership and label management when required for a volume. + +For ownership management the criteria are: + +1. The `pod.Spec.SecurityContext.FSGroup` field is populated +2. The volume builder returns `true` from `SupportsOwnershipManagement` + +Logic should be added to the `mountExternalVolumes` method that runs a local `chgrp` and `chmod` if +the pod-level supplemental group is set and the volume supports ownership management: + +```go +package kubelet + +type ChgrpRunner interface { + Chgrp(path string, gid int) error +} + +type ChmodRunner interface { + Chmod(path string, mode os.FileMode) error +} + +type Kubelet struct { + chgrpRunner ChgrpRunner + chmodRunner ChmodRunner +} + +func (kl *Kubelet) mountExternalVolumes(pod *api.Pod) (kubecontainer.VolumeMap, error) { + podFSGroup = pod.Spec.PodSecurityContext.FSGroup + podFSGroupSet := false + if podFSGroup != 0 { + podFSGroupSet = true + } + + podVolumes := make(kubecontainer.VolumeMap) + + for i := range pod.Spec.Volumes { + volSpec := &pod.Spec.Volumes[i] + + rootContext, err := kl.getRootDirContext() + if err != nil { + return nil, err + } + + // Try to use a plugin for this volume. + internal := volume.NewSpecFromVolume(volSpec) + builder, err := kl.newVolumeBuilderFromPlugins(internal, pod, volume.VolumeOptions{RootContext: rootContext}, kl.mounter) + if err != nil { + glog.Errorf("Could not create volume builder for pod %s: %v", pod.UID, err) + return nil, err + } + if builder == nil { + return nil, errUnsupportedVolumeType + } + err = builder.SetUp() + if err != nil { + return nil, err + } + + if builder.SupportsOwnershipManagement() && + podFSGroupSet { + err = kl.chgrpRunner.Chgrp(builder.GetPath(), podFSGroup) + if err != nil { + return nil, err + } + + err = kl.chmodRunner.Chmod(builder.GetPath(), os.FileMode(1770)) + if err != nil { + return nil, err + } + } + + podVolumes[volSpec.Name] = builder + } + + return podVolumes, nil +} +``` + +This allows the volume plugins to determine when they do and don't want this type of support from +the Kubelet, and allows the criteria each plugin uses to evolve without changing the Kubelet. + +The docker runtime will be modified to set the supplemental group of each container based on the +`pod.Spec.SecurityContext.FSGroup` field. Theoretically, the `rkt` runtime could support this +feature in a similar way. + +### Examples + +#### EmptyDir + +For a pod that has two containers sharing an `EmptyDir` volume: + +```yaml +apiVersion: v1 +kind: Pod +metadata: + name: test-pod +spec: + securityContext: + supplementalGroup: 1001 + containers: + - name: a + securityContext: + runAsUser: 1009 + volumeMounts: + - mountPath: "/example/hostpath/a" + name: empty-vol + - name: b + securityContext: + runAsUser: 1010 + volumeMounts: + - mountPath: "/example/hostpath/b" + name: empty-vol + volumes: + - name: empty-vol +``` + +When the Kubelet runs this pod, the `empty-vol` volume will have ownership root:1001 and permissions +`0770`. It will be usable from both containers a and b. + +#### HostPath + +For a volume that uses a `hostPath` volume with containers running as different UIDs: + +```yaml +apiVersion: v1 +kind: Pod +metadata: + name: test-pod +spec: + securityContext: + supplementalGroup: 1001 + containers: + - name: a + securityContext: + runAsUser: 1009 + volumeMounts: + - mountPath: "/example/hostpath/a" + name: host-vol + - name: b + securityContext: + runAsUser: 1010 + volumeMounts: + - mountPath: "/example/hostpath/b" + name: host-vol + volumes: + - name: host-vol + hostPath: + path: "/tmp/example-pod" +``` + +The cluster operator would need to manually `chgrp` and `chmod` the `/tmp/example-pod` on the host +in order for the volume to be usable from the pod. + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/volumes.md?pixel)]() + -- cgit v1.2.3 From 4da98bc23fee1940a6390399b99efaaaee44b52e Mon Sep 17 00:00:00 2001 From: Paul Morie Date: Fri, 18 Sep 2015 14:49:20 -0400 Subject: Proposal: generic SELinux support for volumes --- selinux.md | 347 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 347 insertions(+) create mode 100644 selinux.md diff --git a/selinux.md b/selinux.md new file mode 100644 index 00000000..c16ab0a5 --- /dev/null +++ b/selinux.md @@ -0,0 +1,347 @@ + + + + +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest 1.0.x release of this document can be found +[here](http://releases.k8s.io/release-1.0/docs/proposals/selinux.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- + + + + + +## Abstract + +A proposal for enabling containers in a pod to share volumes using a pod level SELinux context. + +## Motivation + +Many users have a requirement to run pods on systems that have SELinux enabled. Volume plugin +authors should not have to explicitly account for SELinux except for volume types that require +special handling of the SELinux context during setup. + +Currently, each container in a pod has an SELinux context. This is not an ideal factoring for +sharing resources using SELinux. + +We propose a pod-level SELinux context and a mechanism to support SELinux labeling of volumes in a +generic way. + +Goals of this design: + +1. Describe the problems with a container SELinux context +2. Articulate a design for generic SELinux support for volumes using a pod level SELinux context + which is backward compatible with the v1.0.0 API + +## Constraints and Assumptions + +1. We will not support securing containers within a pod from one another +2. Volume plugins should not have to handle setting SELinux context on volumes +3. We will not deal with shared storage + +## Current State Overview + +### Docker + +Docker uses a base SELinux context and calculates a unique MCS label per container. The SELinux +context of a container can be overriden with the `SecurityOpt` api that allows setting the different +parts of the SELinux context individually. + +Docker has functionality to relabel bind-mounts with a usable SElinux and supports two different +use-cases: + +1. The `:Z` bind-mount flag, which tells Docker to relabel a bind-mount with the container's + SELinux context +2. The `:z` bind-mount flag, which tells Docker to relabel a bind-mount with the container's + SElinux context, but remove the MCS labels, making the volume shareable beween containers + +We should avoid using the `:z` flag, because it relaxes the SELinux context so that any container +(from an SELinux standpoint) can use the volume. + +### Rocket + +Rocket currently reads the base SELinux context to use from `/etc/selinux/*/contexts/lxc_contexts` +and allocates a unique MCS label per pod. + +### Kubernetes + + +There is a [proposed change](https://github.com/GoogleCloudPlatform/kubernetes/pull/9844) to the +EmptyDir plugin that adds SELinux relabeling capabilities to that plugin, which is also carried as a +patch in [OpenShift](https://github.com/openshift/origin). It is preferable to solve the problem +in general of handling SELinux in kubernetes to merging this PR. + +A new `PodSecurityContext` type has been added that carries information about security attributes +that apply to the entire pod and that apply to all containers in a pod. See: + +1. [Skeletal implementation](https://github.com/kubernetes/kubernetes/pull/13939) +1. [Proposal for inlining container security fields](https://github.com/kubernetes/kubernetes/pull/12823) + +## Use Cases + +1. As a cluster operator, I want to support securing pods from one another using SELinux when + SELinux integration is enabled in the cluster +2. As a user, I want volumes sharing to work correctly amongst containers in pods + +#### SELinux context: pod- or container- level? + +Currently, SELinux context is specifiable only at the container level. This is an inconvenient +factoring for sharing volumes and other SELinux-secured resources between containers because there +is no way in SELinux to share resources between processes with different MCS labels except to +remove MCS labels from the shared resource. This is a big security risk: _any container_ in the +system can work with a resource which has the same SELinux context as it and no MCS labels. Since +we are also not interested in isolating containers in a pod from one another, the SELinux context +should be shared by all containers in a pod to facilitate isolation from the containers in other +pods and sharing resources amongst all the containers of a pod. + +#### Volumes + +Kubernetes volumes can be divided into two broad categories: + +1. Unshared storage: + 1. Volumes created by the kubelet on the host directory: empty directory, git repo, secret, + downward api. All volumes in this category delegate to `EmptyDir` for their underlying + storage. + 2. Volumes based on network block devices: AWS EBS, iSCSI, RBD, etc, *when used exclusively + by a single pod*. +2. Shared storage: + 1. `hostPath` is shared storage because it is necessarily used by a container and the host + 2. Network file systems such as NFS, Glusterfs, Cephfs, etc. + 3. Block device based volumes in `ReadOnlyMany` or `ReadWriteMany` modes are shared because + they may be used simultaneously by multiple pods. + +For unshared storage, SELinux handling for most volumes can be generalized into running a `chcon` operation on the volume directory after running the volume plugin's `Setup` function. For these +volumes, the Kubelet can perform the `chcon` operation and keep SELinux concerns out of the volume +plugin code. Some volume plugins may need to use the SELinux context during a mount operation in +certain cases. To account for this, our design must have a way for volume plugins to state that +a particular volume should or should not receive generic label management. + +For shared storage, the picture is murkier. Labels for existing shared storage will be managed +outside Kubernetes and administrators will have to set the SELinux context of pods correctly. +The problem of solving SELinux label management for new shared storage is outside the scope for +this proposal. + +## Analysis + +The system needs to be able to: + +1. Model correctly which volumes require SELinux label management +1. Relabel volumes with the correct SELinux context when required + +### Modeling whether a volume requires label management + +#### Unshared storage: volumes derived from `EmptyDir` + +Empty dir and volumes derived from it are created by the system, so Kubernetes must always ensure +that the ownership and SELinux context (when relevant) are set correctly for the volume to be +usable. + +#### Unshared storage: network block devices + +Volume plugins based on network block devices such as AWS EBS and RBS can be treated the same way +as local volumes. Since inodes are written to these block devices in the same way as `EmptyDir` +volumes, permissions and ownership can be managed on the client side by the Kubelet when used +exclusively by one pod. When the volumes are used outside of a persistent volume, or with the +`ReadWriteOnce` mode, they are effectively unshared storage. + +When used by multiple pods, there are many additional use-cases to analyze before we can be +confident that we can support SELinux label management robustly with these file systems. The right +design is one that makes it easy to experiment and develop support for ownership management with +volume plugins to enable developers and cluster operators to continue exploring these issues. + +#### Shared storage: hostPath + +The `hostPath` volume should only be used by effective-root users, and the permissions of paths +exposed into containers via hostPath volumes should always be managed by the cluster operator. If +the Kubelet managed the SELinux labels for `hostPath` volumes, a user who could create a `hostPath` +volume could affect changes in the state of arbitrary paths within the host's filesystem. This +would be a severe security risk, so we will consider hostPath a corner case that the kubelet should +never perform ownership management for. + +#### Shared storage: network + +Ownership management of shared storage is a complex topic. SELinux labels for existing shared +storage will be managed externally from Kubernetes. For this case, our API should make it simple to +express whether a particular volume should have these concerns managed by Kubernetes. + +We will not attempt to address the concerns of new shared storage in this proposal. + +When a network block device is used as a persistent volume in `ReadWriteMany` or `ReadOnlyMany` +modes, it is shared storage, and thus outside the scope of this proposal. + +#### API requirements + +From the above, we know that label management must be applied: + +1. To some volume types always +2. To some volume types never +3. To some volume types *sometimes* + +Volumes should be relabeled with the correct SELinux context. Docker has this capability today; it +is desireable for other container runtime implementations to provide similar functionality. + +Relabeling should be an optional aspect of a volume plugin to accomodate: + +1. volume types for which generalized relabeling support is not sufficient +2. testing for each volume plugin individually + +## Proposed Design + +Our design should minimize code for handling SELinux labelling required in the Kubelet and volume +plugins. + +### Deferral: MCS label allocation + +Our short-term goal is to facilitate volume sharing and isolation with SELinux and expose the +primitives for higher level composition; making these automatic is a longer-term goal. Allocating +groups and MCS labels are fairly complex problems in their own right, and so our proposal will not +encompass either of these topics. There are several problems that the solution for allocation +depends on: + +1. Users and groups in Kubernetes +2. General auth policy in Kubernetes +3. [security policy](https://github.com/GoogleCloudPlatform/kubernetes/pull/7893) + +### API changes + +The [inline container security attributes PR (12823)](https://github.com/kubernetes/kubernetes/pull/12823) +adds a `pod.Spec.SecurityContext.SELinuxOptions` field. The change to the API in this proposal is +the addition of the semantics to this field: + +* When the `pod.Spec.SecurityContext.SELinuxOptions` field is set, volumes that support ownership +management in the Kubelet have their SELinuxContext set from this field. + +```go +package api + +type PodSecurityContext struct { + // SELinuxOptions captures the SELinux context for all containers in a Pod. If a container's + // SecurityContext.SELinuxOptions field is set, that setting has precedent for that container. + // + // This field will be used to set the SELinux of volumes that support SELinux label management + // by the kubelet. + SELinuxOptions *SELinuxOptions `json:"seLinuxOptions,omitempty"` +} +``` + +The V1 API is extended with the same semantics: + +```go +package v1 + +type PodSecurityContext struct { + // SELinuxOptions captures the SELinux context for all containers in a Pod. If a container's + // SecurityContext.SELinuxOptions field is set, that setting has precedent for that container. + // + // This field will be used to set the SELinux of volumes that support SELinux label management + // by the kubelet. + SELinuxOptions *SELinuxOptions `json:"seLinuxOptions,omitempty"` +} +``` + +#### API backward compatibility + +Old pods that do not have the `pod.Spec.SecurityContext.SELinuxOptions` field set will not receive +SELinux label management for their volumes. This is acceptable since old clients won't know about +this field and won't have any expectation of their volumes being managed this way. + +The existing backward compatibility semantics for SELinux do not change at all with this proposal. + +### Kubelet changes + +The Kubelet should be modified to perform SELinux label management when required for a volume. The +criteria to activate the kubelet SELinux label management for volumes are: + +1. SELinux integration is enabled in the cluster +2. SELinux is enabled on the node +3. The `pod.Spec.SecurityContext.SELinuxOptions` field is set +4. The volume plugin supports SELinux label management + +The `volume.Builder` interface should have a new method added that indicates whether the plugin +supports SELinux label management: + +```go +package volume + +type Builder interface { + // other methods omitted + SupportsSELinux() bool +} +``` + +Individual volume plugins are responsible for correctly reporting whether they support label +management in the kubelet. In the first round of work, only `hostPath` and `emptyDir` and its +derivations will be tested with ownership management support: + +| Plugin Name | SupportsOwnershipManagement | +|-------------------------|-------------------------------| +| `hostPath` | false | +| `emptyDir` | true | +| `gitRepo` | true | +| `secret` | true | +| `downwardAPI` | true | +| `gcePersistentDisk` | false | +| `awsElasticBlockStore` | false | +| `nfs` | false | +| `iscsi` | false | +| `glusterfs` | false | +| `persistentVolumeClaim` | depends on underlying volume and PV mode | +| `rbd` | false | +| `cinder` | false | +| `cephfs` | false | + +Ultimately, the matrix will theoretically look like: + +| Plugin Name | SupportsOwnershipManagement | +|-------------------------|-------------------------------| +| `hostPath` | false | +| `emptyDir` | true | +| `gitRepo` | true | +| `secret` | true | +| `downwardAPI` | true | +| `gcePersistentDisk` | true | +| `awsElasticBlockStore` | true | +| `nfs` | false | +| `iscsi` | true | +| `glusterfs` | false | +| `persistentVolumeClaim` | depends on underlying volume and PV mode | +| `rbd` | true | +| `cinder` | false | +| `cephfs` | false | + +In order to limit the amount of SELinux label management code in Kubernetes, we propose that it be a +function of the container runtime implementations. Initially, we will modify the docker runtime +implementation to correctly set the `:Z` flag on the appropriate bind-mounts in order to accomplish +generic label management for docker containers. + +Volume types that require SELinux context information at mount must be injected with and respect the +enablement setting for the labeling for the volume type. The proposed `VolumeConfig` mechanism +will be used to carry information about label management enablement to the volume plugins that have +to manage labels individually. + +This allows the volume plugins to determine when they do and don't want this type of support from +the Kubelet, and allows the criteria each plugin uses to evolve without changing the Kubelet. + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/selinux.md?pixel)]() + -- cgit v1.2.3 From 22064736c46dcbd3d1208b502e99cb99e24b2000 Mon Sep 17 00:00:00 2001 From: Vishnu Kannan Date: Wed, 2 Sep 2015 18:19:16 -0700 Subject: Adding a proposal for handling resource metrics in Kubernetes. --- metrics-plumbing.md | 134 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 134 insertions(+) create mode 100644 metrics-plumbing.md diff --git a/metrics-plumbing.md b/metrics-plumbing.md new file mode 100644 index 00000000..f4fffaea --- /dev/null +++ b/metrics-plumbing.md @@ -0,0 +1,134 @@ + + + + +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest 1.0.x release of this document can be found +[here](http://releases.k8s.io/release-1.0/docs/proposals/metrics-plumbing.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- + + + + + +# Resource Usage Metrics plumbing in Kubernetes + +**Author**: Vishnu Kannan (@vishh) + +**Status**: Draft proposal; some parts are already implemented + +* This document presents a design for handling container metrics in Kubernetes clusters* + +## Motivation + +Resource usage metrics are critical for various reasons: +* Monitor and maintain the health of the cluster and user applications. +* Improve the efficiency of the cluster by making more optimal scheduling decisions and enabling components like auto-scalers. + +There are multiple types of metrics that describe the state of a container. +Numerous strategies exist to aggregate these metrics from containers. +There are a variety of storage backends that can handle metrics. + +This document presents a design to abstract out collection and storage backends, and provide stable Kubernetes APIs that can be consumed by users and other cluster components. + +## Introduction + +Container metrics can be of two types. + +1. `Compute resource metrics` refers to compute resources being used by a container. Ex.: CPU, Memory, Network, File-system +2. `Service metrics` refers to container app specific metrics. Ex: QPS, query latency, etc. + +Metrics can be collected either for cluster components or for user containers. + +[cAdvisor](https://github.com/google/cadvisor) is a node level container metrics aggregator that is built into the kubelet. cAdvisor can collect both types of metrics, although the support for service metrics is limited at this point. cAdvisor collects metrics for both system components and user containers. + +[heapster](https://github.com/kubernetes/heapster) is a cluster level metrics aggregator that is run by default on most Kubernetes cluster. Heapster aggregates all the metrics exposed by cAdvisor from the nodes. Heapster has a pluggable storage backend. It supports the following timeseries storage backends - InfluxDB, Google Cloud Monitoring and Hawkular. +Heapster builds a model of the cluster and can aggregate metrics across pods, nodes, namespaces and the entire cluster. It exposes this data via [REST endpoints](https://github.com/kubernetes/heapster/blob/master/docs/model.md#api-documentation). + +Metrics data will be consumed by many different clients - scheduler, horizontal and vertical pod auto scalers, initial pod limits controller, kubectl, web consoles, cluster management software, etc. + +Storage backends can be shared for both monitoring of clusters and powering advanced cluster features. + +## Goals + +* Abstract out timeseries storage backends from Kubernetes components. +* Provide stable Kubernetes Metrics APIs that other components can consume. + +#### Non Goals + +* Requiring users to run a specific storage backend. +* Compatibility with other node level metrics aggregator. cAdvisor should be able to provide all the metrics. +* Support for service metrics at the cluster level is out of scope for this document. +Once the use cases for service metrics, other than monitoring, are clear, we can explore adding support for service metrics. + +## Design + +The basic idea is to evolve heapster to serve Metrics APIs which can then be consumed by other cluster components. +Heapster will be run in all clusters by default. Heapster's memory usage is proportional to the number of containers in the cluster and so it should be possible to run heapster by default even on small development or test clusters. +A cluster administrator will have to either run one of the supported storage backends or write a new storage plugin in heapster to support custom storage backends. +Heapster will manage versioning and storage schema for the various storage backends it supports. +Heapster APIs will be exposed as Kubernetes APIs once the apiserver supports [dynamic API plugins](https://github.com/kubernetes/kubernetes/issues/991). + +Heapster stores a days worth of historical metrics. Heapster will fetch data from storage backends on-demand to serve metrics that are older than a day. Setting [initial pod resources](initial-resources.md) requires access to metrics from the past 30 days. + +To make heapster APIs compatible with Kubernetes API requirements, heapster will have to incorporate the API server library. Until that is possible, we will run a secondary API server binary that supports the metrics APIs being consumed by other components. The initial plan is to use etcd to store the most recent metrics. Eventually, we would like to get rid of etcd for metrics and make heapster act as a backend to the api-server. + +This is the current plan for supporting node and pod metrics API as described in this [proposal](compute-resource-metrics-api.md). + +There will be proposals in the future for adding more heapster metrics APIs in Kubernetes. + +## Implementation plan + +Heapster has an in-build model of a cluster and can expose the average, 95%ile and max of compute resource metrics for containers, pods, nodes, namespaces and entire cluster. +However the existing APIs are not suitable for Kubernetes components. +The metrics are stored in a rolling window. Adding support for other percentiles should be straightforward. +Heapster is currently stateless and so it will loose its history upon restart. +Some of the specific work items include, + +1. Improve the existing API schema to be Kubernetes compatible ([Related issue](https://github.com/kubernetes/heapster/issues/476)) +2. Add support for fetching historical data from storage backends. +3. Fetch historical metrics from storage backends upon restarts to pre-populate the internal model. +4. Add support for image based aggregation. +5. Add support for label queries. +6. Expose heapster APIs via a Kubernetes service until the primary API server can handle plugins. + +### Non Goals + +### Known issues + +* Running other metrics aggregators + + An example here would be running collectd in-place of cadvisor and storing metrics to a custom database or running prometheus. We can let cluster admins run their own aggregation and storage stack as long as the storage backend is supported in heapster and the storage schema is versioned. Compatibility can be guaranteed by explicitly specifying the versions of different components that are supported in a specific Kubernetes release. + +* Heapster scalability + + Heapster's resource utilization is proportional to the number of containers running in the cluster. A fair amount of effort has gone into optimizing heapster's memory usage. As our cluster size increases, we can shard heapster. We believe the existing heapster design should scale for fairly large clusters with reasonable amount of compute resources. + +### How can you contribute? + +We are tracking heapster work items using [milestones](https://github.com/kubernetes/heapster/milestones) in the heapster repo. + + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/metrics-plumbing.md?pixel)]() + -- cgit v1.2.3 From 8589eb45f56e423954754d2a61c3991d27fd4e5c Mon Sep 17 00:00:00 2001 From: "Madhusudan.C.S" Date: Fri, 2 Oct 2015 12:26:59 -0700 Subject: Move the hooks section to the commit section. It doesn't make much sense to have a separate section for hooks right now because we only have a pre-commit hook at the moment and we should have it setup before making the first commit. We can probably create a separate section for hooks again when we have other types of hooks. --- development.md | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/development.md b/development.md index 75cb2365..87fb02d5 100644 --- a/development.md +++ b/development.md @@ -89,6 +89,16 @@ git remote set-url --push upstream no_push ### Committing changes to your fork +Before committing any changes, please link/copy these pre-commit hooks into your .git +directory. This will keep you from accidentally committing non-gofmt'd go code. + +```sh +cd kubernetes/.git/hooks/ +ln -s ../../hooks/pre-commit . +``` + +Then you can commit your changes and push them to your fork: + ```sh git commit git push -f origin myfeature @@ -203,16 +213,6 @@ It is sometimes expedient to manually fix the /Godeps/godeps.json file to minimi Please send dependency updates in separate commits within your PR, for easier reviewing. -## Hooks - -Before committing any changes, please link/copy these hooks into your .git -directory. This will keep you from accidentally committing non-gofmt'd go code. - -```sh -cd kubernetes/.git/hooks/ -ln -s ../../hooks/pre-commit . -``` - ## Unit tests ```sh -- cgit v1.2.3 From c436e3e4b00704b077fc9f29b332985703866105 Mon Sep 17 00:00:00 2001 From: Paul Morie Date: Wed, 7 Oct 2015 13:21:27 -0400 Subject: Fix name of FSGroup field in volume proposal json --- volumes.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/volumes.md b/volumes.md index b340acc0..34daf005 100644 --- a/volumes.md +++ b/volumes.md @@ -272,7 +272,7 @@ type PodSecurityContext struct { // FSGroup is a supplemental group that all containers in a pod run under. This group will own // volumes that the Kubelet manages ownership for. If this is not specified, the Kubelet will // not set the group ownership of any volumes. - FSGroup *int64 `json:"supplementalGroup"` + FSGroup *int64 `json:"fsGroup,omitempty"` } ``` @@ -285,7 +285,7 @@ type PodSecurityContext struct { // FSGroup is a supplemental group that all containers in a pod run under. This group will own // volumes that the Kubelet manages ownership for. If this is not specified, the Kubelet will // not set the group ownership of any volumes. - FSGroup *int64 `json:"supplementalGroup"` + FSGroup *int64 `json:"fsGroup,omitempty"` } ``` @@ -455,7 +455,7 @@ metadata: name: test-pod spec: securityContext: - supplementalGroup: 1001 + fsGroup: 1001 containers: - name: a securityContext: @@ -487,7 +487,7 @@ metadata: name: test-pod spec: securityContext: - supplementalGroup: 1001 + fsGroup: 1001 containers: - name: a securityContext: -- cgit v1.2.3 From 10a5a94e2db162bc55f7924fd02ff0bb50f6e2a9 Mon Sep 17 00:00:00 2001 From: Daniel Smith Date: Thu, 24 Sep 2015 14:00:27 -0700 Subject: Propose combining domain name & group Also remove group from versions. --- extending-api.md | 31 +++++++++++++++---------------- 1 file changed, 15 insertions(+), 16 deletions(-) diff --git a/extending-api.md b/extending-api.md index 628b5a16..beb3d7ac 100644 --- a/extending-api.md +++ b/extending-api.md @@ -73,11 +73,11 @@ Kubernetes API server to provide the following features: The `Kind` for an instance of a third-party object (e.g. CronTab) below is expected to be programmatically convertible to the name of the resource using the following conversion. Kinds are expected to be of the form ``, the -`APIVersion` for the object is expected to be `//`. +`APIVersion` for the object is expected to be `/`. To +prevent collisions, it's expected that you'll use a fulling qualified domain +name for the API group, e.g. `example.com`. -For example `example.com/stable/v1` - -`domain-name` is expected to be a fully qualified domain name. +For example `stable.example.com/v1` 'CamelCaseKind' is the specific type name. @@ -113,18 +113,17 @@ For example, if a user creates: ```yaml metadata: - name: cron-tab.example.com + name: cron-tab.stable.example.com apiVersion: experimental/v1alpha1 kind: ThirdPartyResource description: "A specification of a Pod to run on a cron style schedule" versions: - - name: stable/v1 - - name: experimental/v2 +- name: v1 +- name: v2 ``` -Then the API server will program in two new RESTful resource paths: - * `/thirdparty/example.com/stable/v1/namespaces//crontabs/...` - * `/thirdparty/example.com/experimental/v2/namespaces//crontabs/...` +Then the API server will program in the new RESTful resource path: + * `/apis/stable.example.com/v1/namespaces//crontabs/...` Now that this schema has been created, a user can `POST`: @@ -134,19 +133,19 @@ Now that this schema has been created, a user can `POST`: "metadata": { "name": "my-new-cron-object" }, - "apiVersion": "example.com/stable/v1", + "apiVersion": "stable.example.com/v1", "kind": "CronTab", "cronSpec": "* * * * /5", "image": "my-awesome-chron-image" } ``` -to: `/third-party/example.com/stable/v1/namespaces/default/crontabs/my-new-cron-object` +to: `/apis/stable.example.com/v1/namespaces/default/crontabs` and the corresponding data will be stored into etcd by the APIServer, so that when the user issues: ``` -GET /third-party/example.com/stable/v1/namespaces/default/crontabs/my-new-cron-object` +GET /apis/stable.example.com/v1/namespaces/default/crontabs/my-new-cron-object` ``` And when they do that, they will get back the same data, but with additional Kubernetes metadata @@ -155,21 +154,21 @@ And when they do that, they will get back the same data, but with additional Kub Likewise, to list all resources, a user can issue: ``` -GET /third-party/example.com/stable/v1/namespaces/default/crontabs +GET /apis/stable.example.com/v1/namespaces/default/crontabs ``` and get back: ```json { - "apiVersion": "example.com/stable/v1", + "apiVersion": "stable.example.com/v1", "kind": "CronTabList", "items": [ { "metadata": { "name": "my-new-cron-object" }, - "apiVersion": "example.com/stable/v1", + "apiVersion": "stable.example.com/v1", "kind": "CronTab", "cronSpec": "* * * * /5", "image": "my-awesome-chron-image" -- cgit v1.2.3 From 2ff148ae5c6534a1dcabd49759cc5ebbb35df180 Mon Sep 17 00:00:00 2001 From: Piotr Szczesniak Date: Thu, 8 Oct 2015 07:19:07 +0200 Subject: Fixed formatting in rescheduler proposal --- rescheduler.md | 1 + 1 file changed, 1 insertion(+) diff --git a/rescheduler.md b/rescheduler.md index 88747d08..512064f2 100644 --- a/rescheduler.md +++ b/rescheduler.md @@ -144,6 +144,7 @@ A key design question for a Rescheduler is how much knowledge it needs about the ## Appendix: Integrating rescheduler with cluster auto-scaler (scale up) For scaling up the cluster, a reasonable workflow might be: + 1. pod horizontal auto-scaler decides to add one or more Pods to a service, based on the metrics it is observing 1. the Pod goes PENDING due to lack of a suitable node with sufficient resources 1. rescheduler notices the PENDING Pod and determines that the Pod cannot schedule just by rearranging existing Pods (while respecting SLOs) -- cgit v1.2.3 From 85b4ff1bed4943036cdfef109b8280d9723ccf42 Mon Sep 17 00:00:00 2001 From: Vishnu kannan Date: Thu, 8 Oct 2015 12:01:57 -0700 Subject: Mark QoS as an experimental feature --- resource-qos.md | 89 ++++++++++++--------------------------------------------- 1 file changed, 18 insertions(+), 71 deletions(-) diff --git a/resource-qos.md b/resource-qos.md index 52671821..c7475586 100644 --- a/resource-qos.md +++ b/resource-qos.md @@ -33,12 +33,14 @@ Documentation for other releases can be found at # Resource Quality of Service in Kubernetes -**Author**: Ananya Kumar (@AnanyaKumar) +**Author**: Ananya Kumar (@AnanyaKumar) Vishnu Kannan (@vishh) -**Status**: Draft proposal; prototype in progress. +**Status**: Design & Implementation in progress. *This document presents the design of resource quality of service for containers in Kubernetes, and describes use cases and implementation details.* +**Quality of Service is still under development. Look [here](resource-qos.md#under-development) for more details** + ## Motivation Kubernetes allocates resources to containers in a simple way. Users can specify resource limits for containers. For example, a user can specify a 1gb memory limit for a container. The scheduler uses resource limits to schedule containers (technically, the scheduler schedules pods comprised of containers). For example, the scheduler will not place 5 containers with a 1gb memory limit onto a machine with 4gb memory. Currently, Kubernetes does not have robust mechanisms to ensure that containers run reliably on an overcommitted system. @@ -92,66 +94,20 @@ An alternative is to have user-specified numerical priorities that guide Kubelet 1. Achieved behavior would be emergent based on how users assigned priorities to their containers. No particular SLO could be delivered by the system, and usage would be subject to gaming if not restricted administratively 2. Changes to desired priority bands would require changes to all user container configurations. -## Implementation - -### To implement requests (PR #12035): - -API changes for request -- Default request to limit, if limit is specified but request is not (api/v1/defaults.go) -- Add validation code that checks request <= limit, and validation test cases (api/validation/validation.go) - -Scheduler Changes -- Predicates: Use requests instead of limits in CheckPodsExceedingCapacity and PodFitsResources (scheduler/algorithm/predicates/predicates.go) -- Priorities: Use requests instead of limits in LeastRequestedPriority and BalancedResourceAllocation(scheduler/algorithm/priorities/priorities.go)(PR #12718) - -Container Manager Changes -- Use requests to assign CPU shares for Docker (kubelet/dockertools/container_manager.go) -- RKT changes will be implemented in a later iteration - -### QoS Classes (PR #12182): - -For now, we will be implementing QoS classes using OOM scores. However, system OOM kills are expensive, and without kernel modifications we cannot rely on system OOM kills to enforce burstable class guarantees. Eventually, we will need to layer control loops on top of OOM score assignment. - -Add kubelet/qos/policy.go -- Decides which memory QoS class a container is in (based on the policy described above) -- Decides what OOM score all processes in a container should get - -Change memory overcommit mode -- Right now overcommit mode is off on the machines we set up, so if there isn’t enough memory malloc will return null. This prevents QoS, because best-effort containers won’t be killed. Instead, when there isn’t enough memory, and guaranteed containers call malloc, they may not get the memory they want. We want memory guaranteed containers to get the memory they request, and force out memory best-effort containers. -- Change the memory overcommit mode to 1, so that using excess memory starts the OOM killer. The implication is that malloc won't return null, a process will be killed instead. - -Container OOM score configuration -- We’re focusing on Docker in this implementation (not RKT) -- OOM scores - - Note that the OOM score of a process is 10 times the % of memory the process consumes, adjusted by OOM_SCORE_ADJ, barring exceptions (e.g. process is launched by root). Processes with higher OOM scores are killed. - - The base OOM score is between 0 and 1000, so if process A’s OOM_SCORE_ADJ - process B’s OOM_SCORE_ADJ is over a 1000, then process A will always be OOM killed before B. - - The final OOM score of a process is also between 0 and 1000 -- Memory best-effort - - Set OOM_SCORE_ADJ: 1000 - - So processes in best-effort containers will have an OOM_SCORE of 1000 -- Memory guaranteed - - Set OOM_SCORE_ADJ: -999 - - So processes in guaranteed containers will have an OOM_SCORE of 0 or 1 -- Memory burstable - - If total memory request > 99.8% of available memory, OOM_SCORE_ADJ: 2 - - Otherwise, set OOM_SCORE_ADJ to 1000 - 10 * (% of memory requested) - - This ensures that the OOM_SCORE of burstable containers is > 1 - - So burstable containers will be killed if they conflict with guaranteed containers - - If a burstable container uses less memory than requested, its OOM_SCORE < 1000 - - So best-effort containers will be killed if they conflict with burstable containers using less than requested memory - - If a process in a burstable container uses more memory than the container requested, its OOM_SCORE will be 1000, if not its OOM_SCORE will be < 1000 - - Assuming that a container typically has a single big process, if a burstable container that uses more memory than requested conflicts with a burstable container using less memory than requested, the former will be killed - - If burstable containers with multiple processes conflict, then the formula for OOM scores is a heuristic, it will not ensure "Request and Limit" guarantees. This is one reason why control loops will be added in subsequent iterations. -- Pod infrastructure container - - OOM_SCORE_ADJ: -999 -- Kubelet, Docker, Kube-Proxy - - OOM_SCORE_ADJ: -999 (won’t be OOM killed) - - Hack, because these critical tasks might die if they conflict with guaranteed containers. in the future, we should place all user-pods into a separate cgroup, and set a limit on the memory they can consume. - -Setting OOM_SCORE_ADJ for a container -- Refactor existing ApplyOOMScoreAdj to util/oom.go -- To set OOM_SCORE_ADJ of a container, we loop through all processes in the container, and set OOM_SCORE_ADJ -- We keep looping until the list of processes in the container stabilizes. This is sufficient because child processes inherit OOM_SCORE_ADJ. +## Under Development + +This feature is still under development. +Following are some of the primary issues. + +* Our current design supports QoS per-resource. + Given that unified hierarchy is in the horizon, a per-resource QoS cannot be supported. + [#14943](https://github.com/kubernetes/kubernetes/pull/14943) has more information. + +* Scheduler does not take usage into account. + The scheduler can pile up BestEffort tasks on a node and cause resource pressure. + [#14081](https://github.com/kubernetes/kubernetes/issues/14081) needs to be resolved for the scheduler to start utilizing node's usage. + +The semantics of this feature can change in subsequent releases. ## Implementation Issues and Extensions @@ -177,17 +133,8 @@ Maintaining CPU performance: - **CPU limits**: Enabling CPU limits can be problematic, because processes might be hard capped and might stall for a while. TODO: Enable CPU limits intelligently using CPU quota and core allocation. Documentation: -- **QoS Class Status**: TODO: Add code to ContainerStatus in the API, so that it shows which memory and CPU classes a container is in. - **Documentation**: TODO: add user docs for resource QoS -## Demo and Tests - -Possible demos/E2E tests: -- Launch a couple of memory guaranteed containers on a node. Barrage the node with memory best-effort containers. The memory guaranteed containers should survive the onslaught of memory best-effort containers. -- Fill up a node with memory best-effort containers. Barrage the node with memory guaranteed containers. All memory best-effort containers should be evicted. This is a hard test, because the Kubelet, Kube-proxy, etc need to be well protected. -- Launch a container with 0 CPU request. The container, when run in isolation, should get to use the entire CPU. Then add a container with non-zero request that tries to use up CPU. The 0-requst containers should be throttled, and given a small number of CPU shares. - - [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/resource-qos.md?pixel)]() -- cgit v1.2.3 From cb58afd814634c8053405d4a15c3d8a6040d05fe Mon Sep 17 00:00:00 2001 From: Isaac Hollander McCreery Date: Thu, 8 Oct 2015 16:57:05 -0700 Subject: Proposed versioning changes --- versioning.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/versioning.md b/versioning.md index ede6b450..c764a585 100644 --- a/versioning.md +++ b/versioning.md @@ -44,9 +44,9 @@ Legend: * Kube 1.0.0, 1.0.1 -- DONE! * Kube 1.0.X (X>1): Standard operating procedure. We patch the release-1.0 branch as needed and increment the patch number. -* Kube 1.1.0-alpha.X: Released roughly every two weeks by cutting from HEAD. No cherrypick releases. If there is a critical bugfix, a new release from HEAD can be created ahead of schedule. (This applies to the beta releases as well.) -* Kube 1.1.0-beta.X: When HEAD is feature-complete, we go into code freeze 2 weeks prior to the desired 1.1.0 date and only merge PRs essential to 1.1. Releases continue to be cut from HEAD until we're essentially done. -* Kube 1.1.0: Final release. Should occur between 3 and 4 months after 1.0. +* Kube 1.1.0-alpha.X: Released roughly every two weeks by cutting from HEAD. No cherrypick releases. If there is a critical bugfix, a new release from HEAD can be created ahead of schedule. +* Kube 1.1.0-beta: When HEAD is feature-complete, we will cut the release-1.1.0 branch 2 weeks prior to the desired 1.1.0 date and only merge PRs essential to 1.1. This cut will be marked as 1.1.0-beta, and HEAD will be revved to 1.2.0-alpha.0. +* Kube 1.1.0: Final release, cut from the release-1.1.0 branch cut two weeks prior. Should occur between 3 and 4 months after 1.0. 1.1.1-beta will be tagged at the same time on the same branch. ### Major version timeline -- cgit v1.2.3 From b0884c7373c19d12448ecc54675ac98e78c7bdc9 Mon Sep 17 00:00:00 2001 From: Brian Grant Date: Fri, 9 Oct 2015 02:13:28 +0000 Subject: Strengthen wording about status behavior. --- api-conventions.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/api-conventions.md b/api-conventions.md index fb7cbe10..99aa0cf8 100644 --- a/api-conventions.md +++ b/api-conventions.md @@ -33,7 +33,7 @@ Documentation for other releases can be found at API Conventions =============== -Updated: 9/20/2015 +Updated: 10/8/2015 *This document is oriented at users who want a deeper understanding of the Kubernetes API structure, and developers wanting to extend the Kubernetes API. An introduction to @@ -172,7 +172,7 @@ When a new version of an object is POSTed or PUT, the "spec" is updated and avai The Kubernetes API also serves as the foundation for the declarative configuration schema for the system. In order to facilitate level-based operation and expression of declarative configuration, fields in the specification should have declarative rather than imperative names and semantics -- they represent the desired state, not actions intended to yield the desired state. -The PUT and POST verbs on objects will ignore the "status" values. A `/status` subresource is provided to enable system components to update statuses of resources they manage. +The PUT and POST verbs on objects MUST ignore the "status" values, to avoid accidentally overwriting the status in read-modify-write scenarios. A `/status` subresource MUST be provided to enable system components to update statuses of resources they manage. Otherwise, PUT expects the whole object to be specified. Therefore, if a field is omitted it is assumed that the client wants to clear that field's value. The PUT verb does not accept partial updates. Modification of just part of an object may be achieved by GETting the resource, modifying part of the spec, labels, or annotations, and then PUTting it back. See [concurrency control](#concurrency-control-and-consistency), below, regarding read-modify-write consistency when using this pattern. Some objects may expose alternative resource representations that allow mutation of the status, or performing custom actions on the object. -- cgit v1.2.3 From 0942ca8f34a43393c1036ec4a5688fc6078c46d7 Mon Sep 17 00:00:00 2001 From: Mike Danese Date: Thu, 8 Oct 2015 15:52:11 -0700 Subject: simplify DaemonReaper by using NodeSelector --- daemon.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/daemon.md b/daemon.md index c88fcec7..a72b8755 100644 --- a/daemon.md +++ b/daemon.md @@ -97,7 +97,7 @@ The DaemonSet supports standard API features: - get (e.g. kubectl get daemonsets) - describe - Modifiers - - delete (if --cascade=true, then first the client turns down all the pods controlled by the DaemonSet (by setting the nodeName to a non-existant name); then it deletes the DaemonSet; then it deletes the pods) + - delete (if --cascade=true, then first the client turns down all the pods controlled by the DaemonSet (by setting the nodeSelector to a uuid pair that is unlikely to be set on any node); then it deletes the DaemonSet; then it deletes the pods) - label - annotate - update operations like patch and replace (only allowed to selector and to nodeSelector and nodeName of pod template) -- cgit v1.2.3 From 499f571b4ec667021e17f2abcf31637738b26b18 Mon Sep 17 00:00:00 2001 From: Clayton Coleman Date: Fri, 11 Sep 2015 16:09:51 -0400 Subject: Expose exec and logs via WebSockets Not all clients and systems can support SPDY protocols. This commit adds support for two new websocket protocols, one to handle streaming of pod logs from a pod, and the other to allow exec to be tunneled over websocket. Browser support for chunked encoding is still poor, and web consoles that wish to show pod logs may need to make compromises to display the output. The /pods//log endpoint now supports websocket upgrade to the 'binary.k8s.io' subprotocol, which sends chunks of logs as binary to the client. Messages are written as logs are streamed from the container daemon, so flushing should be unaffected. Browser support for raw communication over SDPY is not possible, and some languages lack libraries for it and HTTP/2. The Kubelet supports upgrade to WebSocket instead of SPDY, and will multiplex STDOUT/IN/ERR over websockets by prepending each binary message with a single byte representing the channel (0 for IN, 1 for OUT, and 2 for ERR). Because framing on WebSockets suffers from head-of-line blocking, clients and other server code should ensure that no particular stream blocks. An alternative subprotocol 'base64.channel.k8s.io' base64 encodes the body and uses '0'-'9' to represent the channel for ease of use in browsers. --- api-conventions.md | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/api-conventions.md b/api-conventions.md index 99aa0cf8..a23dc270 100644 --- a/api-conventions.md +++ b/api-conventions.md @@ -73,6 +73,7 @@ using resources with kubectl can be found in [Working with resources](../user-gu - [Events](#events) - [Naming conventions](#naming-conventions) - [Label, selector, and annotation conventions](#label-selector-and-annotation-conventions) + - [WebSockets and SPDY](#websockets-and-spdy) @@ -721,6 +722,22 @@ Other advice regarding use of labels, annotations, and other generic map keys by - Use annotations to store API extensions that the controller responsible for the resource doesn't need to know about, experimental fields that aren't intended to be generally used API fields, etc. Beware that annotations aren't automatically handled by the API conversion machinery. +## WebSockets and SPDY + +Some of the API operations exposed by Kubernetes involve transfer of binary streams between the client and a container, including attach, exec, portforward, and logging. The API therefore exposes certain operations over upgradeable HTTP connections ([described in RFC 2817](https://tools.ietf.org/html/rfc2817)) via the WebSocket and SPDY protocols. These actions are exposed as subresources with their associated verbs (exec, log, attach, and portforward) and are requested via a GET (to support JavaScript in a browser) and POST (semantically accurate). + +There are two primary protocols in use today: + +1. Streamed channels + + When dealing with multiple independent binary streams of data such as the remote execution of a shell command (writing to STDIN, reading from STDOUT and STDERR) or forwarding multiple ports the streams can be multiplexed onto a single TCP connection. Kubernetes supports a SPDY based framing protocol that leverages SPDY channels and a WebSocket framing protocol that multiplexes multiple channels onto the same stream by prefixing each binary chunk with a byte indicating its channel. The WebSocket protocol supports an optional subprotocol that handles base64-encoded bytes from the client and returns base64-encoded bytes from the server and character based channel prefixes ('0', '1', '2') for ease of use from JavaScript in a browser. + +2. Streaming response + + The default log output for a channel of streaming data is an HTTP Chunked Transfer-Encoding, which can return an arbitrary stream of binary data from the server. Browser-based JavaScript is limited in its ability to access the raw data from a chunked response, especially when very large amounts of logs are returned, and in future API calls it may be desirable to transfer large files. The streaming API endpoints support an optional WebSocket upgrade that provides a unidirectional channel from the server to the client and chunks data as binary WebSocket frames. An optional WebSocket subprotocol is exposed that base64 encodes the stream before returning it to the client. + +Clients should use the SPDY protocols if their clients have native support, or WebSockets as a fallback. Note that WebSockets is susceptible to Head-of-Line blocking and so clients must read and process each message sequentionally. In the future, an HTTP/2 implementation will be exposed that deprecates SPDY. + [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/api-conventions.md?pixel)]() -- cgit v1.2.3 From e2dd98e6052420a1198bcb632c5353f10d5b2894 Mon Sep 17 00:00:00 2001 From: Mike Danese Date: Mon, 12 Oct 2015 11:35:30 -0700 Subject: fix incorrect merge MIME type in api-conventions doc --- api-conventions.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/api-conventions.md b/api-conventions.md index a23dc270..7ad1dbc6 100644 --- a/api-conventions.md +++ b/api-conventions.md @@ -288,7 +288,7 @@ The API supports three different PATCH operations, determined by their correspon * JSON Patch, `Content-Type: application/json-patch+json` * As defined in [RFC6902](https://tools.ietf.org/html/rfc6902), a JSON Patch is a sequence of operations that are executed on the resource, e.g. `{"op": "add", "path": "/a/b/c", "value": [ "foo", "bar" ]}`. For more details on how to use JSON Patch, see the RFC. -* Merge Patch, `Content-Type: application/merge-json-patch+json` +* Merge Patch, `Content-Type: application/merge-patch+json` * As defined in [RFC7386](https://tools.ietf.org/html/rfc7386), a Merge Patch is essentially a partial representation of the resource. The submitted JSON is "merged" with the current resource to create a new one, then the new one is saved. For more details on how to use Merge Patch, see the RFC. * Strategic Merge Patch, `Content-Type: application/strategic-merge-patch+json` * Strategic Merge Patch is a custom implementation of Merge Patch. For a detailed explanation of how it works and why it needed to be introduced, see below. -- cgit v1.2.3 From 2149a990c573ec8eedd0d266c01577433f353a4b Mon Sep 17 00:00:00 2001 From: Chao Xu Date: Mon, 12 Oct 2015 13:56:55 -0700 Subject: remove code refernce --- api-group.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/api-group.md b/api-group.md index 53531d43..ec79efe0 100644 --- a/api-group.md +++ b/api-group.md @@ -122,7 +122,7 @@ Documentation for other releases can be found at 2. Spelling the URL: - The URL is in the form of prefix/group/version/. The prefix is hard-coded in the client/unversioned.Config (see [here](../../pkg/client/unversioned/experimental.go#L101)). The client should be able to figure out `group` and `version` using the RESTMapper. For a third-party client which does not have access to the RESTMapper, it should discover the mapping of `group`, `version` and `kind` by querying the server as described in point 2 of #server-side-implementation. + The URL is in the form of prefix/group/version/. The prefix is hard-coded in the client/unversioned.Config. The client should be able to figure out `group` and `version` using the RESTMapper. For a third-party client which does not have access to the RESTMapper, it should discover the mapping of `group`, `version` and `kind` by querying the server as described in point 2 of #server-side-implementation. 3. kubectl: -- cgit v1.2.3 From d587e3b9f965b6528fbe9b058056f4e7c70ef6f4 Mon Sep 17 00:00:00 2001 From: Jeff Grafton Date: Thu, 8 Oct 2015 15:57:24 -0700 Subject: Update test helpers and dev doc to use etcd v2.0.12. --- development.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/development.md b/development.md index 87fb02d5..4375d73e 100644 --- a/development.md +++ b/development.md @@ -264,7 +264,7 @@ Coverage results for the project can also be viewed on [Coveralls](https://cover ## Integration tests -You need an [etcd](https://github.com/coreos/etcd/releases/tag/v2.0.0) in your path, please make sure it is installed and in your ``$PATH``. +You need an [etcd](https://github.com/coreos/etcd/releases/tag/v2.0.12) in your path, please make sure it is installed and in your ``$PATH``. ```sh cd kubernetes -- cgit v1.2.3 From aee2383f9b350d0ea7b5d14b60b5c77fdef08391 Mon Sep 17 00:00:00 2001 From: Jeff Grafton Date: Thu, 8 Oct 2015 17:57:36 -0700 Subject: Update documentation to describe how to install etcd for testing --- development.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/development.md b/development.md index 4375d73e..0b778dd9 100644 --- a/development.md +++ b/development.md @@ -264,7 +264,9 @@ Coverage results for the project can also be viewed on [Coveralls](https://cover ## Integration tests -You need an [etcd](https://github.com/coreos/etcd/releases/tag/v2.0.12) in your path, please make sure it is installed and in your ``$PATH``. +You need an [etcd](https://github.com/coreos/etcd/releases) in your path. To download a copy of the latest version used by Kubernetes, either + * run `hack/install-etcd.sh`, which will download etcd to `third_party/etcd`, and then set your `PATH` to include `third_party/etcd`. + * inspect `cluster/saltbase/salt/etcd/etcd.manifest` for the correct version, and then manually download and install it to some place in your `PATH`. ```sh cd kubernetes -- cgit v1.2.3 From 576acdb7fb386e5f34104da4042dbe9f728a79b5 Mon Sep 17 00:00:00 2001 From: Eric Tune Date: Thu, 8 Oct 2015 16:29:02 -0700 Subject: Doc: apigroups, alpha, beta, experimental/v1alpha1 --- api_changes.md | 63 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 63 insertions(+) diff --git a/api_changes.md b/api_changes.md index 6b96b0e2..24430f26 100644 --- a/api_changes.md +++ b/api_changes.md @@ -535,6 +535,69 @@ The API spec changes should be in a commit separate from your other changes. TODO(smarterclayton): write this. +## Alpha, Beta, and Stable Versions + +New feature development proceeds through a series of stages of increasing maturity: + +- Development level + - Object Versioning: no convention + - Availability: not commited to main kubernetes repo, and thus not available in offical releases + - Audience: other developers closely collaborating on a feature or proof-of-concept + - Upgradeability, Reliability, Completeness, and Support: no requirements or guarantees +- Alpha level + - Object Versioning: API version name contains `alpha` (e.g. `v1alpha1`) + - Availability: committed to main kubernetes repo; appears in an official release; feature is + disabled by default, but may be enabled by flag + - Audience: developers and expert users interested in giving early feedback on features + - Completeness: some API operations, CLI commands, or UI support may not be implemented; the API + need not have had an *API review* (an intensive and targeted review of the API, on top of a normal + code review) + - Upgradeability: the object schema and semantics may change in a later software release, without + any provision for preserving objects in an existing cluster; + removing the upgradability concern allows developers to make rapid progress; in particular, + API versions can increment faster than the minor release cadence and the developer need not + maintain multiple versions; developers should still increment the API version when object schema + or semantics change in an [incompatible way](#on-compatibility) + - Cluster Reliability: because the feature is relatively new, and may lack complete end-to-end + tests, enabling the feature via a flag might expose bugs with destabilize the cluster (e.g. a + bug in a control loop might rapidly create excessive numbers of object, exhausting API storage). + - Support: there is *no commitment* from the project to complete the feature; the feature may be + dropped entirely in a later software release + - Recommended Use Cases: only in short-lived testing clusters, due to complexity of upgradeability + and lack of long-term support and lack of upgradability. +- Beta level: + - Object Versioning: API version name contains `beta` (e.g. `v2beta3`) + - Availability: in official Kubernetes releases, and enabled by default + - Audience: users interested in providing feedback on features + - Completeness: all API operations, CLI commands, and UI support should be implemented; end-to-end + tests complete; the API has had a thorough API review and is thought to be complete, though use + during beta may frequently turn up API issues not thought of during review + - Upgradeability: the object schema and semantics may change in a later software release; when + this happens, an upgrade path will be documentedr; in some cases, objects will be automatically + converted to the new version; in other cases, a manual upgrade may be necessary; a manual + upgrade may require downtime for anything relying on the new feature, and may require + manual conversion of objects to the new version; when manual conversion is necessary, the + project will provide documentation on the process (for an example, see [v1 conversion + tips](../api.md)) + - Cluster Reliability: since the feature has e2e tests, enabling the feature via a flag should not + create new bugs in unrelated features; because the feature is new, it may have minor bugs + - Support: the project commits to complete the feature, in some form, in a subsequent Stable + version; typically this will happen within 3 months, but sometimes longer; releases should + simultaneously support two consecutive versions (e.g. `v1beta1` and `v1beta2`; or `v1beta2` and + `v1`) for at least one minor release cycle (typically 3 months) so that users have enough time + to upgrade and migrate objects + - Recommended Use Cases: in short-lived testing clusters; in production clusters as part of a + short-lived evaluation of the feature in order to provide feedback +- Stable level: + - Object Versioning: API version `vX` where `X` is an integer (e.g. `v1`) + - Availability: in official Kubernetes releases, and enabled by default + - Audience: all users + - Completeness: same as beta + - Upgradeability: only [strictly compatible](#on-compatibility) changes allowed in subsequent + software releases + - Cluster Reliability: high + - Support: API version will continue to be present for many subsequent software releases; + - Recommended Use Cases: any [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/api_changes.md?pixel)]() -- cgit v1.2.3 From bb2aa8770ff269515fe5f83ce0620f634eb2cadc Mon Sep 17 00:00:00 2001 From: Isaac Hollander McCreery Date: Mon, 12 Oct 2015 16:11:12 -0700 Subject: Upgrades and upgrade tests take versions of the form release/stable instead of stable_release: - Refactor common and gce/upgrade.sh to use arbitrary published releases - Update hack/get-build to use cluster/common code - Use hack/get-build.sh in cluster upgrade test logic --- getting-builds.md | 24 +++++++++++++++++------- 1 file changed, 17 insertions(+), 7 deletions(-) diff --git a/getting-builds.md b/getting-builds.md index bcb981c4..3803c873 100644 --- a/getting-builds.md +++ b/getting-builds.md @@ -35,17 +35,27 @@ Documentation for other releases can be found at You can use [hack/get-build.sh](http://releases.k8s.io/HEAD/hack/get-build.sh) to or use as a reference on how to get the most recent builds with curl. With `get-build.sh` you can grab the most recent stable build, the most recent release candidate, or the most recent build to pass our ci and gce e2e tests (essentially a nightly build). +Run `./hack/get-build.sh -h` for its usage. + +For example, to get a build at a specific version (v1.0.2): + ```console -usage: - ./hack/get-build.sh [stable|release|latest|latest-green] +./hack/get-build.sh v1.0.2 +``` - stable: latest stable version - release: latest release candidate - latest: latest ci build - latest-green: latest ci build to pass gce e2e +Alternatively, to get the latest stable release: + +```console +./hack/get-build.sh release/stable +``` + +Finally, you can just print the latest or stable version: + +```console +./hack/get-build.sh -v ci/latest ``` -You can also use the gsutil tool to explore the Google Cloud Storage release bucket. Here are some examples: +You can also use the gsutil tool to explore the Google Cloud Storage release buckets. Here are some examples: ```sh gsutil cat gs://kubernetes-release/ci/latest.txt # output the latest ci version number -- cgit v1.2.3 From 7707173defcebeb95d061638e0dcfe0ace605d3a Mon Sep 17 00:00:00 2001 From: Tim Hockin Date: Fri, 9 Oct 2015 16:54:49 -0700 Subject: update docs on experimental annotations --- api-conventions.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/api-conventions.md b/api-conventions.md index 7ad1dbc6..2568d952 100644 --- a/api-conventions.md +++ b/api-conventions.md @@ -713,7 +713,7 @@ Therefore, resources supporting auto-generation of unique labels should have a ` Annotations have very different intended usage from labels. We expect them to be primarily generated and consumed by tooling and system extensions. I'm inclined to generalize annotations to permit them to directly store arbitrary json. Rigid names and name prefixes make sense, since they are analogous to API fields. -In fact, experimental API fields, including those used to represent fields of newer alpha/beta API versions in the older stable storage version, may be represented as annotations with the form `something.experimental.kubernetes.io/name`. For example `net.experimental.kubernetes.io/policy` might represent an experimental network policy field. +In fact, in-development API fields, including those used to represent fields of newer alpha/beta API versions in the older stable storage version, may be represented as annotations with the form `something.alpha.kubernetes.io/name` or `something.beta.kubernetes.io/name` (depending on our confidence in it). For example `net.alpha.kubernetes.io/policy` might represent an experimental network policy field. Other advice regarding use of labels, annotations, and other generic map keys by Kubernetes components and tools: - Key names should be all lowercase, with words separated by dashes, such as `desired-replicas` -- cgit v1.2.3 From 1f6336e0656d5f00e1b25e9e4810fea1f738e875 Mon Sep 17 00:00:00 2001 From: Chao Xu Date: Mon, 12 Oct 2015 17:47:16 -0700 Subject: refactor "experimental" to "extensions" in documents --- extending-api.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/extending-api.md b/extending-api.md index beb3d7ac..077d5530 100644 --- a/extending-api.md +++ b/extending-api.md @@ -114,7 +114,7 @@ For example, if a user creates: ```yaml metadata: name: cron-tab.stable.example.com -apiVersion: experimental/v1alpha1 +apiVersion: extensions/v1beta1 kind: ThirdPartyResource description: "A specification of a Pod to run on a cron style schedule" versions: -- cgit v1.2.3 From 21ea4045ce292f7594efdbe045f1b163ffad90e1 Mon Sep 17 00:00:00 2001 From: Wojciech Tyczynski Date: Mon, 19 Oct 2015 09:29:10 +0200 Subject: api_changes.md changes for json-related code autogeneration. --- api_changes.md | 30 +++++++++++++++++++++++------- 1 file changed, 23 insertions(+), 7 deletions(-) diff --git a/api_changes.md b/api_changes.md index 24430f26..53dfb014 100644 --- a/api_changes.md +++ b/api_changes.md @@ -38,7 +38,7 @@ with a number of existing API types and with the [API conventions](api-conventions.md). If creating a new API type/resource, we also recommend that you first send a PR containing just a proposal for the new API types, and that you initially target -the experimental API (pkg/apis/experimental). +the extensions API (pkg/apis/extensions). The Kubernetes API has two major components - the internal structures and the versioned APIs. The versioned APIs are intended to be stable, while the @@ -293,13 +293,13 @@ the release notes for the next release by labeling the PR with the "release-note If you found that your change accidentally broke clients, it should be reverted. In short, the expected API evolution is as follows: -* `experimental/v1alpha1` -> +* `extensions/v1alpha1` -> * `newapigroup/v1alpha1` -> ... -> `newapigroup/v1alphaN` -> * `newapigroup/v1beta1` -> ... -> `newapigroup/v1betaN` -> * `newapigroup/v1` -> * `newapigroup/v2alpha1` -> ... -While in experimental we have no obligation to move forward with the API at all and may delete or break it at any time. +While in extensions we have no obligation to move forward with the API at all and may delete or break it at any time. While in alpha we expect to move forward with it, but may break it. @@ -399,9 +399,9 @@ The conversion code resides with each versioned API. There are two files: functions - `pkg/api//conversion_generated.go` containing auto-generated conversion functions - - `pkg/apis/experimental//conversion.go` containing manually written + - `pkg/apis/extensions//conversion.go` containing manually written conversion functions - - `pkg/apis/experimental//conversion_generated.go` containing + - `pkg/apis/extensions//conversion_generated.go` containing auto-generated conversion functions Since auto-generated conversion functions are using manually written ones, @@ -437,7 +437,7 @@ of your versioned api objects. The deep copy code resides with each versioned API: - `pkg/api//deep_copy_generated.go` containing auto-generated copy functions - - `pkg/apis/experimental//deep_copy_generated.go` containing auto-generated copy functions + - `pkg/apis/extensions//deep_copy_generated.go` containing auto-generated copy functions To regenerate them: - run @@ -446,12 +446,28 @@ To regenerate them: hack/update-generated-deep-copies.sh ``` +## Edit json (un)marshaling code + +We are auto-generating code for marshaling and unmarshaling json representation +of api objects - this is to improve the overall system performance. + +The auto-generated code resides with each versioned API: + - `pkg/api//types.generated.go` + - `pkg/apis/extensions//types.generated.go` + +To regenerate them: + - run + +```sh +hack/update-codecgen.sh +``` + ## Making a new API Group This section is under construction, as we make the tooling completely generic. At the moment, you'll have to make a new directory under pkg/apis/; copy the -directory structure from pkg/apis/experimental. Add the new group/version to all +directory structure from pkg/apis/extensions. Add the new group/version to all of the hack/{verify,update}-generated-{deep-copy,conversions,swagger}.sh files in the appropriate places--it should just require adding your new group/version to a bash array. You will also need to make sure your new types are imported by -- cgit v1.2.3 From ad3c044039b6b08a491de7c7921cdc8948a977f0 Mon Sep 17 00:00:00 2001 From: Justin Santa Barbara Date: Tue, 28 Jul 2015 14:18:50 -0400 Subject: AWS "under the hood" document Document how we implement kubernetes on AWS, so that configuration tools other than kube-up can have a reference for what they should do, and generally to help developers get up to speed. --- aws_under_the_hood.md | 271 ++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 271 insertions(+) create mode 100644 aws_under_the_hood.md diff --git a/aws_under_the_hood.md b/aws_under_the_hood.md new file mode 100644 index 00000000..eece5dfb --- /dev/null +++ b/aws_under_the_hood.md @@ -0,0 +1,271 @@ + + + + +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest 1.0.x release of this document can be found +[here](http://releases.k8s.io/release-1.0/docs/design/aws_under_the_hood.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- + + + + + +## Peeking under the hood of kubernetes on AWS + +We encourage you to use kube-up (or CloudFormation) to create a cluster. But +it is useful to know what is being created: for curiosity, to understand any +problems that may arise, or if you have to create things manually because the +scripts are unsuitable for any reason. We don't recommend manual configuration +(please file an issue and let us know what's missing if there's something you +need) but sometimes it is the only option. + +This document sets out to document how kubernetes on AWS maps to AWS objects. +Familiarity with AWS is assumed. + +### Top-level + +Kubernetes consists of a single master node, and a collection of minion nodes. +Other documents describe the general architecture of Kubernetes (all nodes run +Docker; the kubelet agent runs on each node and launches containers; the +kube-proxy relays traffic between the nodes etc). + +By default on AWS: + +* Instances run Ubuntu 15.04 (the official AMI). It includes a sufficiently + modern kernel to give a good experience with Docker, it doesn't require a + reboot. (The default SSH user is `ubuntu` for this and other ubuntu images) +* By default we run aufs over ext4 as the filesystem / container storage on the + nodes (mostly because this is what GCE uses). + +These defaults can be changed by passing different environment variables to +kube-up. + +### Storage + +AWS does support persistent volumes via EBS. These can then be attached to +pods that should store persistent data (e.g. if you're running a database). + +Minions do not have persistent volumes otherwise. In general, kubernetes +containers do not have persistent storage unless you attach a persistent +volume, and so minions on AWS use instance storage. Instance storage is +cheaper, often faster, and historically more reliable. This does mean that you +should pick an instance type that has sufficient instance storage, unless you +can make do with whatever space is left on your root partition. + +The master _does_ have a persistent volume attached to it. Containers are +mostly run against instance storage, just like the minions, except that we +repoint some important data onto the peristent volume. + +By default we use aufs over ext4. `DOCKER_STORAGE=btrfs` is also a good choice +for a filesystem: it is relatively reliable with Docker; btrfs itself is much +more reliable than it used to be with modern kernels. It can easily span +multiple volumes, which is particularly useful when we are using an instance +type with multiple ephemeral instance disks. + +### AutoScaling + +We run the minions in an AutoScalingGroup. Currently auto-scaling (e.g. based +on CPU) is not actually enabled (#11935). Instead, the auto-scaling group +means that AWS will relaunch any minions that are terminated. + +We do not currently run the master in an AutoScalingGroup, but we should +(#11934) + +### Networking + +Kubernetes uses an IP-per-pod model. This means that a node, which runs many +pods, must have many IPs. The way we implement this on AWS is to use VPCs and +the advanced routing support that it allows. Each pod is assigned a /24 CIDR; +then this CIDR is configured to route to an instance in the VPC routing table. + +It is also possible to use overlay networking on AWS, but the default kube-up +configuration does not. + +### NodePort & LoadBalancing + +Kubernetes on AWS integrates with ELB. When you create a service with +Type=LoadBalancer, kubernetes (the kube-controller-manager) will create an ELB, +create a security group for the ELB which allows access on the service ports, +attach all the minions to the ELB, and modify the security group for the +minions to allow traffic from the ELB to the minions. This traffic reaches +kube-proxy where it is then forwarded to the pods. + +ELB requires that all minions listen on a single port, and it acts as a layer-7 +forwarding proxy (i.e. the source IP is not preserved). It is not trivial for +kube-proxy to recognize the traffic therefore. So, LoadBalancer services are +also exposed as NodePort services. For NodePort services, a cluster-wide port +is assigned by kubernetes to the service, and kube-proxy listens externally on +that port on every minion, and forwards traffic to the pods. So for a +load-balanced service, ELB is configured to proxy traffic on the public port +(e.g. port 80) to the NodePort assigned to the service (e.g. 31234), kube-proxy +recognizes the traffic coming to the NodePort by the inbound port number, and +send it to the correct pods for the service. + +Note that we do not automatically open NodePort services in the AWS firewall +(although we do open LoadBalancer services). This is because we expect that +NodePort services are more of a building block for things like inter-cluster +services or for LoadBalancer. To consume a NodePort service externally, you +will likely have to open the port in the minion security group +(`kubernetes-minion-`). + +### IAM + +kube-proxy sets up two IAM roles, one for the master called +(kubernetes-master)[cluster/aws/templates/iam/kubernetes-master-policy.json] +and one for the minions called +(kubernetes-minion)[cluster/aws/templates/iam/kubernetes-minion-policy.json]. + +The master is responsible for creating ELBs and configuring them, as well as +setting up advanced VPC routing. Currently it has blanket permissions on EC2, +along with rights to create and destroy ELBs. + +The minion does not need a lot of access to the AWS APIs. It needs to download +a distribution file, and then it is responsible for attaching and detaching EBS +volumes to itself. + +The minion policy is relatively minimal. The master policy is probably overly +permissive. The security concious may want to lock-down the IAM policies +further (#11936) + +We should make it easier to extend IAM permissions and also ensure that they +are correctly configured (#???) + +### Tagging + +All AWS resources are tagged with a tag named "KuberentesCluster". This tag is +used to identify a particular 'instance' of Kubernetes, even if two clusters +are deployed into the same VPC. (The script doesn't do this by default, but it +can be done.) + +Within the AWS cloud provider logic, we filter requests to the AWS APIs to +match resources with our cluster tag. So we only see our own AWS objects. + +If you choose not to use kube-up, you must tag everything with a +KubernetesCluster tag with a unique per-cluster value. + + +# AWS Objects + +The kube-up script does a number of things in AWS: + +* Creates an S3 bucket (`AWS_S3_BUCKET`) and copy the kubernetes distribution + and the salt scripts into it. They are made world-readable and the HTTP URLs +are passed to instances; this is how kubernetes code gets onto the machines. +* Creates two IAM profiles based on templates in `cluster/aws/templates/iam`. + `kubernetes-master` is used by the master node; `kubernetes-minion` is used +by minion nodes. +* Creates an AWS SSH key named `kubernetes-`. Fingerprint here is + the OpenSSH key fingerprint, so that multiple users can run the script with +different keys and their keys will not collide (with near-certainty) It will +use an existing key if one is found at `AWS_SSH_KEY`, otherwise it will create +one there. (With the default ubuntu images, if you have to SSH in: the user is +`ubuntu` and that user can `sudo`) +* Creates a VPC for use with the cluster (with a CIDR of 172.20.0.0/16)., and + enables the `dns-support` and `dns-hostnames` options. +* Creates an internet gateway for the VPC. +* Creates a route table for the VPC, with the internet gateway as the default + route +* Creates a subnet (with a CIDR of 172.20.0.0/24) in the AZ `KUBE_AWS_ZONE` + (defaults to us-west-2a). Currently kubernetes runs in a single AZ; there +are two philosophies on how to achieve HA: cluster-per-AZ and +cross-AZ-clusters. cluster-per-AZ says you should have an independent cluster +for each AZ, they are entirely separate. cross-AZ-clusters allows a single +cluster to span multiple AZs. The debate is open here: cluster-per-AZ is more +robust but cross-AZ-clusters are more convenient. For now though, each AWS +kuberentes cluster lives in one AZ. +* Associates the subnet to the route table +* Creates security groups for the master node (`kubernetes-master-`) + and the minion nodes (`kubernetes-minion-`) +* Configures security groups so that masters & minions can intercommunicate, + and opens SSH to the world on master & minions, and opens port 443 to the +world on the master (for the HTTPS API endpoint) +* Creates an EBS volume for the master node of size `MASTER_DISK_SIZE` and type + `MASTER_DISK_TYPE` +* Launches a master node with a fixed IP address (172.20.0.9), with the + security group, IAM credentials etc. An instance script is used to pass +vital configuration information to Salt. The hope is that over time we can +reduce the amount of configuration information that must be passed in this way. +* Once the instance is up, it attaches the EBS volume & sets up a manual + routing rule for the internal network range (`MASTER_IP_RANGE`, defaults to +10.246.0.0/24) +* Creates an auto-scaling launch-configuration and group for the minions. The + name for both is `-minion-group`, defaults to +`kubernetes-minion-group`. The auto-scaling group has size min & max both set +to `NUM_MINIONS`. You can change the size of the auto-scaling group to add or +remove minions (directly though the AWS API/Console). The minion nodes +self-configure: they come up, run Salt with the stored configuration; connect +to the master and are assigned an internal CIDR; the master configures the +route-table with the minion CIDR. The script does health-check the minions, +but this is a self-check, it is not required. + +If attempting this configuration manually, I highly recommend following along +with the kube-up script, and being sure to tag everything with a +`KubernetesCluster`=`` tag. Also, passing the right configuration +options to Salt when not using the script is tricky: the plan here is to +simplify this by having Kubernetes take on more node configuration, and even +potentially remove Salt altogether. + + +## Manual infrastructure creation + +While this work is not yet complete, advanced users may choose to create (some) +AWS objects themselves, and still make use of the kube-up script (to configure +Salt, for example). + +* `AWS_S3_BUCKET` will use an existing S3 bucket +* `VPC_ID` will reuse an existing VPC +* `SUBNET_ID` will reuse an existing subnet +* If your route table is tagged with the correct `KubernetesCluster`, it will + be reused +* If your security groups are appropriately named, they will be reused. + +Currently there is no way to do the following with kube-up. If these affect +you, please open an issue with a description of what you're trying to do (your +use-case) and we'll see what we can do: + +* Use an existing AWS SSH key with an arbitrary name +* Override the IAM credentials in a sensible way (but this is in-progress) +* Use different security group permissions +* Configure your own auto-scaling groups + +# Instance boot + +The instance boot procedure is currently pretty complicated, primarily because +we must marshal configuration from Bash to Salt via the AWS instance script. +As we move more post-boot configuration out of Salt and into Kubernetes, we +will hopefully be able to simplify this. + +When the kube-up script launches instances, it builds an instance startup +script which includes some configuration options passed to kube-up, and +concatenates some of the scripts found in the cluster/aws/templates directory. +These scripts are responsible for mounting and formatting volumes, downloading +Salt & Kubernetes from the S3 bucket, and then triggering Salt to actually +install Kubernetes. + + + + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/aws_under_the_hood.md?pixel)]() + -- cgit v1.2.3 From 1d67698c10c7f102f41ebb46ead1304949f04e82 Mon Sep 17 00:00:00 2001 From: Justin Santa Barbara Date: Sat, 19 Sep 2015 12:53:19 -0400 Subject: Changes per reviews --- aws_under_the_hood.md | 300 ++++++++++++++++++++++++++++---------------------- 1 file changed, 167 insertions(+), 133 deletions(-) diff --git a/aws_under_the_hood.md b/aws_under_the_hood.md index eece5dfb..17ac1543 100644 --- a/aws_under_the_hood.md +++ b/aws_under_the_hood.md @@ -31,21 +31,31 @@ Documentation for other releases can be found at -## Peeking under the hood of kubernetes on AWS +# Peeking under the hood of Kubernetes on AWS -We encourage you to use kube-up (or CloudFormation) to create a cluster. But -it is useful to know what is being created: for curiosity, to understand any -problems that may arise, or if you have to create things manually because the -scripts are unsuitable for any reason. We don't recommend manual configuration -(please file an issue and let us know what's missing if there's something you -need) but sometimes it is the only option. +This document provides high-level insight into how Kubernetes works on AWS and +maps to AWS objects. We assume that you are familiar with AWS. -This document sets out to document how kubernetes on AWS maps to AWS objects. -Familiarity with AWS is assumed. +We encourage you to use [kube-up](../getting-started-guides/aws.md) (or +[CloudFormation](../getting-started-guides/aws-coreos.md) to create clusters on +AWS. We recommend that you avoid manual configuration but are aware that +sometimes it's the only option. -### Top-level +Tip: You should open an issue and let us know what enhancements can be made to +the scripts to better suit your needs. + +That said, it's also useful to know what's happening under the hood when +Kubernetes clusters are created on AWS. This can be particularly useful if +problems arise or in circumstances where the provided scripts are lacking and +you manually created or configured your cluster. + +### Architecture overview + +Kubernetes is a cluster of several machines that consists of a Kubernetes +master and a set number of nodes (previously known as 'minions') for which the +master which is responsible. See the [Architecture](architecture.md) topic for +more details. -Kubernetes consists of a single master node, and a collection of minion nodes. Other documents describe the general architecture of Kubernetes (all nodes run Docker; the kubelet agent runs on each node and launches containers; the kube-proxy relays traffic between the nodes etc). @@ -53,171 +63,192 @@ kube-proxy relays traffic between the nodes etc). By default on AWS: * Instances run Ubuntu 15.04 (the official AMI). It includes a sufficiently - modern kernel to give a good experience with Docker, it doesn't require a - reboot. (The default SSH user is `ubuntu` for this and other ubuntu images) + modern kernel that parise well with Docker and doesn't require a + reboot. (The default SSH user is `ubuntu` for this and other ubuntu images.) * By default we run aufs over ext4 as the filesystem / container storage on the nodes (mostly because this is what GCE uses). -These defaults can be changed by passing different environment variables to +You can override these defaults by passing different environment variables to kube-up. ### Storage -AWS does support persistent volumes via EBS. These can then be attached to -pods that should store persistent data (e.g. if you're running a database). - -Minions do not have persistent volumes otherwise. In general, kubernetes -containers do not have persistent storage unless you attach a persistent -volume, and so minions on AWS use instance storage. Instance storage is -cheaper, often faster, and historically more reliable. This does mean that you -should pick an instance type that has sufficient instance storage, unless you -can make do with whatever space is left on your root partition. - -The master _does_ have a persistent volume attached to it. Containers are -mostly run against instance storage, just like the minions, except that we -repoint some important data onto the peristent volume. - -By default we use aufs over ext4. `DOCKER_STORAGE=btrfs` is also a good choice -for a filesystem: it is relatively reliable with Docker; btrfs itself is much -more reliable than it used to be with modern kernels. It can easily span -multiple volumes, which is particularly useful when we are using an instance -type with multiple ephemeral instance disks. +AWS supports persistent volumes by using [Elastic Block Store +(EBS)](../user-guide/volumes.md#awselasticblockstore). These can then be +attached to pods that should store persistent data (e.g. if you're running a +database). + +By default, nodes in AWS use `[instance +storage](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/InstanceStorage.html)' +unless you create pods with persistent volumes +`[(EBS)](../user-guide/volumes.md#awselasticblockstore)`. In general, +Kubernetes containers do not have persistent storage unless you attach a +persistent volume, and so nodes on AWS use instance storage. Instance +storage is cheaper, often faster, and historically more reliable. This does +mean that you should pick an instance type that has sufficient instance +storage, unless you can make do with whatever space is left on your root +partition. + +Note: Master uses a persistent volume ([etcd](architecture.html#etcd)) to track +its state but similar to the nodes, container are mostly run against instance +storage, except that we repoint some important data onto the peristent volume. + +The default storage driver for Docker images is aufs. Passing the environment +variable `DOCKER_STORAGE=btrfs` is also a good choice for a filesystem. btrfs +is relatively reliable with Docker and has improved its reliability with modern +kernels. It can easily span multiple volumes, which is particularly useful +when we are using an instance type with multiple ephemeral instance disks. ### AutoScaling -We run the minions in an AutoScalingGroup. Currently auto-scaling (e.g. based -on CPU) is not actually enabled (#11935). Instead, the auto-scaling group -means that AWS will relaunch any minions that are terminated. +Nodes (except for the master) are run in an +`[AutoScalingGroup](http://docs.aws.amazon.com/AutoScaling/latest/DeveloperGuide/AutoScalingGroup.html) +on AWS. Currently auto-scaling (e.g. based on CPU) is not actually enabled +([#11935](http://issues.k8s.io/11935)). Instead, the auto-scaling group means +that AWS will relaunch any non-master nodes that are terminated. We do not currently run the master in an AutoScalingGroup, but we should -(#11934) +([#11934](http://issues.k8s.io/11934)). ### Networking Kubernetes uses an IP-per-pod model. This means that a node, which runs many -pods, must have many IPs. The way we implement this on AWS is to use VPCs and -the advanced routing support that it allows. Each pod is assigned a /24 CIDR; -then this CIDR is configured to route to an instance in the VPC routing table. - -It is also possible to use overlay networking on AWS, but the default kube-up -configuration does not. - -### NodePort & LoadBalancing - -Kubernetes on AWS integrates with ELB. When you create a service with -Type=LoadBalancer, kubernetes (the kube-controller-manager) will create an ELB, -create a security group for the ELB which allows access on the service ports, -attach all the minions to the ELB, and modify the security group for the -minions to allow traffic from the ELB to the minions. This traffic reaches -kube-proxy where it is then forwarded to the pods. - -ELB requires that all minions listen on a single port, and it acts as a layer-7 -forwarding proxy (i.e. the source IP is not preserved). It is not trivial for -kube-proxy to recognize the traffic therefore. So, LoadBalancer services are -also exposed as NodePort services. For NodePort services, a cluster-wide port -is assigned by kubernetes to the service, and kube-proxy listens externally on -that port on every minion, and forwards traffic to the pods. So for a -load-balanced service, ELB is configured to proxy traffic on the public port -(e.g. port 80) to the NodePort assigned to the service (e.g. 31234), kube-proxy -recognizes the traffic coming to the NodePort by the inbound port number, and -send it to the correct pods for the service. +pods, must have many IPs. AWS uses virtual private clouds (VPCs) and advanced +routing support so each pod is assigned a /24 CIDR. Each pod is assigned a /24 +CIDR; the assigned CIDR is then configured to route to an instance in the VPC +routing table. + +It is also possible to use overlay networking on AWS, but that is not the +configuration of the kube-up script. + +### NodePort and LoadBalancing + +Kubernetes on AWS integrates with [Elastic Load Balancing +(ELB)](http://docs.aws.amazon.com/AutoScaling/latest/DeveloperGuide/US_SetUpASLBApp.html). +When you create a service with `Type=LoadBalancer`, Kubernetes (the +kube-controller-manager) will create an ELB, create a security group for the +ELB which allows access on the service ports, attach all the nodes to the ELB, +and modify the security group for the nodes to allow traffic from the ELB to +the nodes. This traffic reaches kube-proxy where it is then forwarded to the +pods. + +ELB has some restrictions: it requires that all nodes listen on a single port, +and it acts as a forwarding proxy (i.e. the source IP is not preserved). To +work with these restrictions, in Kubernetes, `[LoadBalancer +services](../user-guide/services.html#type-loadbalancer)` are exposed as +`[NodePort services](../user-guide/services.html#type-nodeport)`. Then +kube-proxy listens externally on the cluster-wide port that's assigned to +NodePort services and forwards traffic to the corresponding pods. So ELB is +configured to proxy traffic on the public port (e.g. port 80) to the NodePort +that is assigned to the service (e.g. 31234). Any in-coming traffic sent to +the NodePort (e.g. port 31234) is recognized by kube-proxy and then sent to the +correct pods for that service. Note that we do not automatically open NodePort services in the AWS firewall (although we do open LoadBalancer services). This is because we expect that NodePort services are more of a building block for things like inter-cluster services or for LoadBalancer. To consume a NodePort service externally, you -will likely have to open the port in the minion security group +will likely have to open the port in the node security group (`kubernetes-minion-`). -### IAM +### Identity and Access Management (IAM) kube-proxy sets up two IAM roles, one for the master called -(kubernetes-master)[cluster/aws/templates/iam/kubernetes-master-policy.json] -and one for the minions called -(kubernetes-minion)[cluster/aws/templates/iam/kubernetes-minion-policy.json]. +[kubernetes-master](../../cluster/aws/templates/iam/kubernetes-master-policy.json) +and one for the non-master nodes called +[kubernetes-minion](../../cluster/aws/templates/iam/kubernetes-minion-policy.json). The master is responsible for creating ELBs and configuring them, as well as setting up advanced VPC routing. Currently it has blanket permissions on EC2, along with rights to create and destroy ELBs. -The minion does not need a lot of access to the AWS APIs. It needs to download -a distribution file, and then it is responsible for attaching and detaching EBS -volumes to itself. +The (non-master) nodes do not need a lot of access to the AWS APIs. They need to download +a distribution file, and then are responsible for attaching and detaching EBS +volumes from itself. -The minion policy is relatively minimal. The master policy is probably overly +The (non-master) node policy is relatively minimal. The master policy is probably overly permissive. The security concious may want to lock-down the IAM policies -further (#11936) +further ([#11936](http://issues.k8s.io/11936)). We should make it easier to extend IAM permissions and also ensure that they -are correctly configured (#???) +are correctly configured ([#14226](http://issues.k8s.io/14226)). ### Tagging -All AWS resources are tagged with a tag named "KuberentesCluster". This tag is -used to identify a particular 'instance' of Kubernetes, even if two clusters -are deployed into the same VPC. (The script doesn't do this by default, but it -can be done.) +All AWS resources are tagged with a tag named "KuberentesCluster", with a value +that is the unique cluster-id. This tag is used to identify a particular +'instance' of Kubernetes, even if two clusters are deployed into the same VPC. +Resources are considered to belong to the same cluster if and only if they have +the same value in the tag named "KubernetesCluster". (The kube-up script is +not configured to create multiple clusters in the same VPC by default, but it +is possible to create another cluster in the same VPC.) Within the AWS cloud provider logic, we filter requests to the AWS APIs to -match resources with our cluster tag. So we only see our own AWS objects. - -If you choose not to use kube-up, you must tag everything with a -KubernetesCluster tag with a unique per-cluster value. +match resources with our cluster tag. By filtering the requests, we ensure +that we see only our own AWS objects. +Important: If you choose not to use kube-up, you must pick a unique cluster-id +value, and ensure that all AWS resources have a tag with +`Name=KubernetesCluster,Value=`. -# AWS Objects +### AWS Objects The kube-up script does a number of things in AWS: -* Creates an S3 bucket (`AWS_S3_BUCKET`) and copy the kubernetes distribution +* Creates an S3 bucket (`AWS_S3_BUCKET`) and then copies the Kubernetes distribution and the salt scripts into it. They are made world-readable and the HTTP URLs -are passed to instances; this is how kubernetes code gets onto the machines. -* Creates two IAM profiles based on templates in `cluster/aws/templates/iam`. - `kubernetes-master` is used by the master node; `kubernetes-minion` is used -by minion nodes. +are passed to instances; this is how Kubernetes code gets onto the machines. +* Creates two IAM profiles based on templates in `cluster/aws/templates/iam`: + * `kubernetes-master` is used by the master node + * `kubernetes-minion` is used by non-master nodes. * Creates an AWS SSH key named `kubernetes-`. Fingerprint here is the OpenSSH key fingerprint, so that multiple users can run the script with -different keys and their keys will not collide (with near-certainty) It will +different keys and their keys will not collide (with near-certainty). It will use an existing key if one is found at `AWS_SSH_KEY`, otherwise it will create one there. (With the default ubuntu images, if you have to SSH in: the user is `ubuntu` and that user can `sudo`) -* Creates a VPC for use with the cluster (with a CIDR of 172.20.0.0/16)., and +* Creates a VPC for use with the cluster (with a CIDR of 172.20.0.0/16) and enables the `dns-support` and `dns-hostnames` options. * Creates an internet gateway for the VPC. * Creates a route table for the VPC, with the internet gateway as the default route * Creates a subnet (with a CIDR of 172.20.0.0/24) in the AZ `KUBE_AWS_ZONE` - (defaults to us-west-2a). Currently kubernetes runs in a single AZ; there -are two philosophies on how to achieve HA: cluster-per-AZ and -cross-AZ-clusters. cluster-per-AZ says you should have an independent cluster -for each AZ, they are entirely separate. cross-AZ-clusters allows a single -cluster to span multiple AZs. The debate is open here: cluster-per-AZ is more -robust but cross-AZ-clusters are more convenient. For now though, each AWS -kuberentes cluster lives in one AZ. + (defaults to us-west-2a). Currently, each Kubernetes cluster runs in a +single AZ on AWS. Although, there are two philosophies in discussion on how to +achieve High Availability (HA): + * cluster-per-AZ: An independent cluster for each AZ, where each cluster + is entirely separate. + * cross-AZ-clusters: A single cluster spans multiple AZs. +The debate is open here, where cluster-per-AZ is discussed as more robust but +cross-AZ-clusters are more convenient. * Associates the subnet to the route table * Creates security groups for the master node (`kubernetes-master-`) - and the minion nodes (`kubernetes-minion-`) -* Configures security groups so that masters & minions can intercommunicate, - and opens SSH to the world on master & minions, and opens port 443 to the -world on the master (for the HTTPS API endpoint) + and the non-master nodes (`kubernetes-minion-`) +* Configures security groups so that masters and nodes can communicate. This + includes intercommunication between masters and nodes, opening SSH publicly +for both masters and nodes, and opening port 443 on the master for the HTTPS +API endpoints. * Creates an EBS volume for the master node of size `MASTER_DISK_SIZE` and type `MASTER_DISK_TYPE` -* Launches a master node with a fixed IP address (172.20.0.9), with the - security group, IAM credentials etc. An instance script is used to pass -vital configuration information to Salt. The hope is that over time we can -reduce the amount of configuration information that must be passed in this way. -* Once the instance is up, it attaches the EBS volume & sets up a manual +* Launches a master node with a fixed IP address (172.20.0.9) that is also + configured for the security group and all the necessary IAM credentials. An +instance script is used to pass vital configuration information to Salt. Note: +The hope is that over time we can reduce the amount of configuration +information that must be passed in this way. +* Once the instance is up, it attaches the EBS volume and sets up a manual routing rule for the internal network range (`MASTER_IP_RANGE`, defaults to 10.246.0.0/24) -* Creates an auto-scaling launch-configuration and group for the minions. The - name for both is `-minion-group`, defaults to -`kubernetes-minion-group`. The auto-scaling group has size min & max both set -to `NUM_MINIONS`. You can change the size of the auto-scaling group to add or -remove minions (directly though the AWS API/Console). The minion nodes -self-configure: they come up, run Salt with the stored configuration; connect -to the master and are assigned an internal CIDR; the master configures the -route-table with the minion CIDR. The script does health-check the minions, -but this is a self-check, it is not required. +* For auto-scaling, on each nodes it creates a launch configuration and group. + The name for both is <*KUBE_AWS_INSTANCE_PREFIX*>-minion-group. The default +name is kubernetes-minion-group. The auto-scaling group has a min and max size +that are both set to NUM_MINIONS. You can change the size of the auto-scaling +group to add or remove the total number of nodes from within the AWS API or +Console. Each nodes self-configures, meaning that they come up; run Salt with +the stored configuration; connect to the master; are assigned an internal CIDR; +and then the master configures the route-table with the assigned CIDR. The +kube-up script performs a health-check on the nodes but it's a self-check that +is not required. + If attempting this configuration manually, I highly recommend following along with the kube-up script, and being sure to tag everything with a @@ -227,29 +258,32 @@ simplify this by having Kubernetes take on more node configuration, and even potentially remove Salt altogether. -## Manual infrastructure creation +### Manual infrastructure creation -While this work is not yet complete, advanced users may choose to create (some) -AWS objects themselves, and still make use of the kube-up script (to configure -Salt, for example). +While this work is not yet complete, advanced users might choose to manually +certain AWS objects while still making use of the kube-up script (to configure +Salt, for example). These objects can currently be manually created: -* `AWS_S3_BUCKET` will use an existing S3 bucket -* `VPC_ID` will reuse an existing VPC -* `SUBNET_ID` will reuse an existing subnet -* If your route table is tagged with the correct `KubernetesCluster`, it will - be reused +* Set the `AWS_S3_BUCKET` environment variable to use an existing S3 bucket. +* Set the `VPC_ID` environment variable to reuse an existing VPC. +* Set the `SUBNET_ID` environemnt variable to reuse an existing subnet. +* If your route table has a matching `KubernetesCluster` tag, it will + be reused. * If your security groups are appropriately named, they will be reused. -Currently there is no way to do the following with kube-up. If these affect -you, please open an issue with a description of what you're trying to do (your -use-case) and we'll see what we can do: +Currently there is no way to do the following with kube-up: + +* Use an existing AWS SSH key with an arbitrary name. +* Override the IAM credentials in a sensible way + ([#14226](http://issues.k8s.io/14226)). +* Use different security group permissions. +* Configure your own auto-scaling groups. -* Use an existing AWS SSH key with an arbitrary name -* Override the IAM credentials in a sensible way (but this is in-progress) -* Use different security group permissions -* Configure your own auto-scaling groups +If any of the above items apply to your situation, open an issue to request an +enhancement to the kube-up script. You should provide a complete description of +the use-case, including all the details around what you want to accomplish. -# Instance boot +### Instance boot The instance boot procedure is currently pretty complicated, primarily because we must marshal configuration from Bash to Salt via the AWS instance script. @@ -260,7 +294,7 @@ When the kube-up script launches instances, it builds an instance startup script which includes some configuration options passed to kube-up, and concatenates some of the scripts found in the cluster/aws/templates directory. These scripts are responsible for mounting and formatting volumes, downloading -Salt & Kubernetes from the S3 bucket, and then triggering Salt to actually +Salt and Kubernetes from the S3 bucket, and then triggering Salt to actually install Kubernetes. -- cgit v1.2.3 From a8965e59a1235e89ede3ea8c1aaa4f92ec98303f Mon Sep 17 00:00:00 2001 From: Justin Santa Barbara Date: Sat, 19 Sep 2015 13:16:52 -0400 Subject: Fix some typos from my read-through --- aws_under_the_hood.md | 85 +++++++++++++++++++++++++-------------------------- 1 file changed, 41 insertions(+), 44 deletions(-) diff --git a/aws_under_the_hood.md b/aws_under_the_hood.md index 17ac1543..6c54dcc4 100644 --- a/aws_under_the_hood.md +++ b/aws_under_the_hood.md @@ -37,7 +37,7 @@ This document provides high-level insight into how Kubernetes works on AWS and maps to AWS objects. We assume that you are familiar with AWS. We encourage you to use [kube-up](../getting-started-guides/aws.md) (or -[CloudFormation](../getting-started-guides/aws-coreos.md) to create clusters on +[CloudFormation](../getting-started-guides/aws-coreos.md)) to create clusters on AWS. We recommend that you avoid manual configuration but are aware that sometimes it's the only option. @@ -63,7 +63,7 @@ kube-proxy relays traffic between the nodes etc). By default on AWS: * Instances run Ubuntu 15.04 (the official AMI). It includes a sufficiently - modern kernel that parise well with Docker and doesn't require a + modern kernel that pairs well with Docker and doesn't require a reboot. (The default SSH user is `ubuntu` for this and other ubuntu images.) * By default we run aufs over ext4 as the filesystem / container storage on the nodes (mostly because this is what GCE uses). @@ -73,39 +73,36 @@ kube-up. ### Storage -AWS supports persistent volumes by using [Elastic Block Store -(EBS)](../user-guide/volumes.md#awselasticblockstore). These can then be +AWS supports persistent volumes by using [Elastic Block Store (EBS)](../user-guide/volumes.md#awselasticblockstore). These can then be attached to pods that should store persistent data (e.g. if you're running a database). -By default, nodes in AWS use `[instance -storage](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/InstanceStorage.html)' +By default, nodes in AWS use [instance storage](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/InstanceStorage.html) unless you create pods with persistent volumes -`[(EBS)](../user-guide/volumes.md#awselasticblockstore)`. In general, -Kubernetes containers do not have persistent storage unless you attach a -persistent volume, and so nodes on AWS use instance storage. Instance -storage is cheaper, often faster, and historically more reliable. This does -mean that you should pick an instance type that has sufficient instance -storage, unless you can make do with whatever space is left on your root -partition. - -Note: Master uses a persistent volume ([etcd](architecture.html#etcd)) to track -its state but similar to the nodes, container are mostly run against instance +[(EBS)](../user-guide/volumes.md#awselasticblockstore). In general, Kubernetes +containers do not have persistent storage unless you attach a persistent +volume, and so nodes on AWS use instance storage. Instance storage is cheaper, +often faster, and historically more reliable. This does mean that you should +pick an instance type that has sufficient instance storage, unless you can make +do with whatever space is left on your root partition. + +Note: The master uses a persistent volume ([etcd](architecture.md#etcd)) to track +its state but similar to the nodes, containers are mostly run against instance storage, except that we repoint some important data onto the peristent volume. -The default storage driver for Docker images is aufs. Passing the environment -variable `DOCKER_STORAGE=btrfs` is also a good choice for a filesystem. btrfs +The default storage driver for Docker images is aufs. Specifying btrfs (by passing the environment +variable `DOCKER_STORAGE=btrfs` to kube-up) is also a good choice for a filesystem. btrfs is relatively reliable with Docker and has improved its reliability with modern kernels. It can easily span multiple volumes, which is particularly useful when we are using an instance type with multiple ephemeral instance disks. ### AutoScaling -Nodes (except for the master) are run in an -`[AutoScalingGroup](http://docs.aws.amazon.com/AutoScaling/latest/DeveloperGuide/AutoScalingGroup.html) +Nodes (but not the master) are run in an +[AutoScalingGroup](http://docs.aws.amazon.com/AutoScaling/latest/DeveloperGuide/AutoScalingGroup.html) on AWS. Currently auto-scaling (e.g. based on CPU) is not actually enabled ([#11935](http://issues.k8s.io/11935)). Instead, the auto-scaling group means -that AWS will relaunch any non-master nodes that are terminated. +that AWS will relaunch any nodes that are terminated. We do not currently run the master in an AutoScalingGroup, but we should ([#11934](http://issues.k8s.io/11934)). @@ -134,9 +131,9 @@ pods. ELB has some restrictions: it requires that all nodes listen on a single port, and it acts as a forwarding proxy (i.e. the source IP is not preserved). To -work with these restrictions, in Kubernetes, `[LoadBalancer -services](../user-guide/services.html#type-loadbalancer)` are exposed as -`[NodePort services](../user-guide/services.html#type-nodeport)`. Then +work with these restrictions, in Kubernetes, [LoadBalancer +services](../user-guide/services.html#type-loadbalancer) are exposed as +[NodePort services](../user-guide/services.html#type-nodeport). Then kube-proxy listens externally on the cluster-wide port that's assigned to NodePort services and forwards traffic to the corresponding pods. So ELB is configured to proxy traffic on the public port (e.g. port 80) to the NodePort @@ -155,18 +152,18 @@ will likely have to open the port in the node security group kube-proxy sets up two IAM roles, one for the master called [kubernetes-master](../../cluster/aws/templates/iam/kubernetes-master-policy.json) -and one for the non-master nodes called +and one for the nodes called [kubernetes-minion](../../cluster/aws/templates/iam/kubernetes-minion-policy.json). The master is responsible for creating ELBs and configuring them, as well as setting up advanced VPC routing. Currently it has blanket permissions on EC2, along with rights to create and destroy ELBs. -The (non-master) nodes do not need a lot of access to the AWS APIs. They need to download +The nodes do not need a lot of access to the AWS APIs. They need to download a distribution file, and then are responsible for attaching and detaching EBS volumes from itself. -The (non-master) node policy is relatively minimal. The master policy is probably overly +The node policy is relatively minimal. The master policy is probably overly permissive. The security concious may want to lock-down the IAM policies further ([#11936](http://issues.k8s.io/11936)). @@ -198,9 +195,9 @@ The kube-up script does a number of things in AWS: * Creates an S3 bucket (`AWS_S3_BUCKET`) and then copies the Kubernetes distribution and the salt scripts into it. They are made world-readable and the HTTP URLs are passed to instances; this is how Kubernetes code gets onto the machines. -* Creates two IAM profiles based on templates in `cluster/aws/templates/iam`: - * `kubernetes-master` is used by the master node - * `kubernetes-minion` is used by non-master nodes. +* Creates two IAM profiles based on templates in [cluster/aws/templates/iam](../../cluster/aws/templates/iam): + * `kubernetes-master` is used by the master + * `kubernetes-minion` is used by nodes. * Creates an AWS SSH key named `kubernetes-`. Fingerprint here is the OpenSSH key fingerprint, so that multiple users can run the script with different keys and their keys will not collide (with near-certainty). It will @@ -215,22 +212,22 @@ one there. (With the default ubuntu images, if you have to SSH in: the user is * Creates a subnet (with a CIDR of 172.20.0.0/24) in the AZ `KUBE_AWS_ZONE` (defaults to us-west-2a). Currently, each Kubernetes cluster runs in a single AZ on AWS. Although, there are two philosophies in discussion on how to -achieve High Availability (HA): +achieve High Availability (HA): * cluster-per-AZ: An independent cluster for each AZ, where each cluster - is entirely separate. - * cross-AZ-clusters: A single cluster spans multiple AZs. + is entirely separate. + * cross-AZ-clusters: A single cluster spans multiple AZs. The debate is open here, where cluster-per-AZ is discussed as more robust but -cross-AZ-clusters are more convenient. +cross-AZ-clusters are more convenient. * Associates the subnet to the route table -* Creates security groups for the master node (`kubernetes-master-`) - and the non-master nodes (`kubernetes-minion-`) +* Creates security groups for the master (`kubernetes-master-`) + and the nodes (`kubernetes-minion-`) * Configures security groups so that masters and nodes can communicate. This includes intercommunication between masters and nodes, opening SSH publicly for both masters and nodes, and opening port 443 on the master for the HTTPS API endpoints. -* Creates an EBS volume for the master node of size `MASTER_DISK_SIZE` and type +* Creates an EBS volume for the master of size `MASTER_DISK_SIZE` and type `MASTER_DISK_TYPE` -* Launches a master node with a fixed IP address (172.20.0.9) that is also +* Launches a master with a fixed IP address (172.20.0.9) that is also configured for the security group and all the necessary IAM credentials. An instance script is used to pass vital configuration information to Salt. Note: The hope is that over time we can reduce the amount of configuration @@ -251,17 +248,17 @@ is not required. If attempting this configuration manually, I highly recommend following along -with the kube-up script, and being sure to tag everything with a -`KubernetesCluster`=`` tag. Also, passing the right configuration -options to Salt when not using the script is tricky: the plan here is to -simplify this by having Kubernetes take on more node configuration, and even -potentially remove Salt altogether. +with the kube-up script, and being sure to tag everything with a tag with name +`KubernetesCluster` and value set to a unique cluster-id. Also, passing the +right configuration options to Salt when not using the script is tricky: the +plan here is to simplify this by having Kubernetes take on more node +configuration, and even potentially remove Salt altogether. ### Manual infrastructure creation While this work is not yet complete, advanced users might choose to manually -certain AWS objects while still making use of the kube-up script (to configure +create certain AWS objects while still making use of the kube-up script (to configure Salt, for example). These objects can currently be manually created: * Set the `AWS_S3_BUCKET` environment variable to use an existing S3 bucket. -- cgit v1.2.3 From 7582e453bcb112311b010b25dd7a038aecbff9bf Mon Sep 17 00:00:00 2001 From: Justin Santa Barbara Date: Sat, 19 Sep 2015 15:20:20 -0400 Subject: Two small fixes (to keep doc-gen happy) --- aws_under_the_hood.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/aws_under_the_hood.md b/aws_under_the_hood.md index 6c54dcc4..ac9efe55 100644 --- a/aws_under_the_hood.md +++ b/aws_under_the_hood.md @@ -133,7 +133,7 @@ ELB has some restrictions: it requires that all nodes listen on a single port, and it acts as a forwarding proxy (i.e. the source IP is not preserved). To work with these restrictions, in Kubernetes, [LoadBalancer services](../user-guide/services.html#type-loadbalancer) are exposed as -[NodePort services](../user-guide/services.html#type-nodeport). Then +[NodePort services](../user-guide/services.md#type-nodeport). Then kube-proxy listens externally on the cluster-wide port that's assigned to NodePort services and forwards traffic to the corresponding pods. So ELB is configured to proxy traffic on the public port (e.g. port 80) to the NodePort @@ -195,7 +195,7 @@ The kube-up script does a number of things in AWS: * Creates an S3 bucket (`AWS_S3_BUCKET`) and then copies the Kubernetes distribution and the salt scripts into it. They are made world-readable and the HTTP URLs are passed to instances; this is how Kubernetes code gets onto the machines. -* Creates two IAM profiles based on templates in [cluster/aws/templates/iam](../../cluster/aws/templates/iam): +* Creates two IAM profiles based on templates in [cluster/aws/templates/iam](../../cluster/aws/templates/iam/): * `kubernetes-master` is used by the master * `kubernetes-minion` is used by nodes. * Creates an AWS SSH key named `kubernetes-`. Fingerprint here is -- cgit v1.2.3 From 9c150b14df3ffdc37582bd0c6b887171da233974 Mon Sep 17 00:00:00 2001 From: Justin Santa Barbara Date: Mon, 19 Oct 2015 13:55:43 -0400 Subject: More fixes based on commments --- aws_under_the_hood.md | 119 ++++++++++++++++++++++++++++---------------------- 1 file changed, 66 insertions(+), 53 deletions(-) diff --git a/aws_under_the_hood.md b/aws_under_the_hood.md index ac9efe55..845964f2 100644 --- a/aws_under_the_hood.md +++ b/aws_under_the_hood.md @@ -49,6 +49,18 @@ Kubernetes clusters are created on AWS. This can be particularly useful if problems arise or in circumstances where the provided scripts are lacking and you manually created or configured your cluster. +**Table of contents:** + * [Architecture overview](#architecture-overview) + * [Storage](#storage) + * [Auto Scaling group](#auto-scaling-group) + * [Networking](#networking) + * [NodePort and LoadBalancing services](#nodeport-and-loadbalancing-services) + * [Identity and access management (IAM)](#identity-and-access-management-iam) + * [Tagging](#tagging) + * [AWS objects](#aws-objects) + * [Manual infrastructure creation](#manual-infrastructure-creation) + * [Instance boot](#instance-boot) + ### Architecture overview Kubernetes is a cluster of several machines that consists of a Kubernetes @@ -56,17 +68,13 @@ master and a set number of nodes (previously known as 'minions') for which the master which is responsible. See the [Architecture](architecture.md) topic for more details. -Other documents describe the general architecture of Kubernetes (all nodes run -Docker; the kubelet agent runs on each node and launches containers; the -kube-proxy relays traffic between the nodes etc). - By default on AWS: * Instances run Ubuntu 15.04 (the official AMI). It includes a sufficiently modern kernel that pairs well with Docker and doesn't require a reboot. (The default SSH user is `ubuntu` for this and other ubuntu images.) -* By default we run aufs over ext4 as the filesystem / container storage on the - nodes (mostly because this is what GCE uses). +* Nodes use aufs instead of ext4 as the filesystem / container storage (mostly + because this is what Google Compute Engine uses). You can override these defaults by passing different environment variables to kube-up. @@ -82,12 +90,12 @@ unless you create pods with persistent volumes [(EBS)](../user-guide/volumes.md#awselasticblockstore). In general, Kubernetes containers do not have persistent storage unless you attach a persistent volume, and so nodes on AWS use instance storage. Instance storage is cheaper, -often faster, and historically more reliable. This does mean that you should -pick an instance type that has sufficient instance storage, unless you can make -do with whatever space is left on your root partition. +often faster, and historically more reliable. Unless you can make do with whatever +space is left on your root partition, you must choose an instance type that provides +you with sufficient instance storage for your needs. Note: The master uses a persistent volume ([etcd](architecture.md#etcd)) to track -its state but similar to the nodes, containers are mostly run against instance +its state. Similar to nodes, containers are mostly run against instance storage, except that we repoint some important data onto the peristent volume. The default storage driver for Docker images is aufs. Specifying btrfs (by passing the environment @@ -96,12 +104,12 @@ is relatively reliable with Docker and has improved its reliability with modern kernels. It can easily span multiple volumes, which is particularly useful when we are using an instance type with multiple ephemeral instance disks. -### AutoScaling +### Auto Scaling group Nodes (but not the master) are run in an -[AutoScalingGroup](http://docs.aws.amazon.com/AutoScaling/latest/DeveloperGuide/AutoScalingGroup.html) +[Auto Scaling group](http://docs.aws.amazon.com/AutoScaling/latest/DeveloperGuide/AutoScalingGroup.html) on AWS. Currently auto-scaling (e.g. based on CPU) is not actually enabled -([#11935](http://issues.k8s.io/11935)). Instead, the auto-scaling group means +([#11935](http://issues.k8s.io/11935)). Instead, the Auto Scaling group means that AWS will relaunch any nodes that are terminated. We do not currently run the master in an AutoScalingGroup, but we should @@ -111,14 +119,13 @@ We do not currently run the master in an AutoScalingGroup, but we should Kubernetes uses an IP-per-pod model. This means that a node, which runs many pods, must have many IPs. AWS uses virtual private clouds (VPCs) and advanced -routing support so each pod is assigned a /24 CIDR. Each pod is assigned a /24 -CIDR; the assigned CIDR is then configured to route to an instance in the VPC -routing table. +routing support so each pod is assigned a /24 CIDR. The assigned CIDR is then +configured to route to an instance in the VPC routing table. -It is also possible to use overlay networking on AWS, but that is not the +It is also possible to use overlay networking on AWS, but that is not the default configuration of the kube-up script. -### NodePort and LoadBalancing +### NodePort and LoadBalancing services Kubernetes on AWS integrates with [Elastic Load Balancing (ELB)](http://docs.aws.amazon.com/AutoScaling/latest/DeveloperGuide/US_SetUpASLBApp.html). @@ -129,17 +136,23 @@ and modify the security group for the nodes to allow traffic from the ELB to the nodes. This traffic reaches kube-proxy where it is then forwarded to the pods. -ELB has some restrictions: it requires that all nodes listen on a single port, -and it acts as a forwarding proxy (i.e. the source IP is not preserved). To -work with these restrictions, in Kubernetes, [LoadBalancer -services](../user-guide/services.html#type-loadbalancer) are exposed as +ELB has some restrictions: +* it requires that all nodes listen on a single port, +* it acts as a forwarding proxy (i.e. the source IP is not preserved). + +To work with these restrictions, in Kubernetes, [LoadBalancer +services](../user-guide/services.md#type-loadbalancer) are exposed as [NodePort services](../user-guide/services.md#type-nodeport). Then kube-proxy listens externally on the cluster-wide port that's assigned to -NodePort services and forwards traffic to the corresponding pods. So ELB is -configured to proxy traffic on the public port (e.g. port 80) to the NodePort -that is assigned to the service (e.g. 31234). Any in-coming traffic sent to -the NodePort (e.g. port 31234) is recognized by kube-proxy and then sent to the -correct pods for that service. +NodePort services and forwards traffic to the corresponding pods. + +So for example, if we configure a service of Type LoadBalancer with a +public port of 80: +* Kubernetes will assign a NodePort to the service (e.g. 31234) +* ELB is configured to proxy traffic on the public port 80 to the NodePort + that is assigned to the service (31234). +* Then any in-coming traffic that ELB forwards to the NodePort (e.g. port 31234) + is recognized by kube-proxy and sent to the correct pods for that service. Note that we do not automatically open NodePort services in the AWS firewall (although we do open LoadBalancer services). This is because we expect that @@ -188,31 +201,31 @@ Important: If you choose not to use kube-up, you must pick a unique cluster-id value, and ensure that all AWS resources have a tag with `Name=KubernetesCluster,Value=`. -### AWS Objects +### AWS objects The kube-up script does a number of things in AWS: * Creates an S3 bucket (`AWS_S3_BUCKET`) and then copies the Kubernetes distribution and the salt scripts into it. They are made world-readable and the HTTP URLs -are passed to instances; this is how Kubernetes code gets onto the machines. + are passed to instances; this is how Kubernetes code gets onto the machines. * Creates two IAM profiles based on templates in [cluster/aws/templates/iam](../../cluster/aws/templates/iam/): - * `kubernetes-master` is used by the master + * `kubernetes-master` is used by the master. * `kubernetes-minion` is used by nodes. * Creates an AWS SSH key named `kubernetes-`. Fingerprint here is the OpenSSH key fingerprint, so that multiple users can run the script with -different keys and their keys will not collide (with near-certainty). It will -use an existing key if one is found at `AWS_SSH_KEY`, otherwise it will create -one there. (With the default ubuntu images, if you have to SSH in: the user is -`ubuntu` and that user can `sudo`) + different keys and their keys will not collide (with near-certainty). It will + use an existing key if one is found at `AWS_SSH_KEY`, otherwise it will create + one there. (With the default Ubuntu images, if you have to SSH in: the user is + `ubuntu` and that user can `sudo`). * Creates a VPC for use with the cluster (with a CIDR of 172.20.0.0/16) and enables the `dns-support` and `dns-hostnames` options. * Creates an internet gateway for the VPC. * Creates a route table for the VPC, with the internet gateway as the default - route + route. * Creates a subnet (with a CIDR of 172.20.0.0/24) in the AZ `KUBE_AWS_ZONE` (defaults to us-west-2a). Currently, each Kubernetes cluster runs in a -single AZ on AWS. Although, there are two philosophies in discussion on how to -achieve High Availability (HA): + single AZ on AWS. Although, there are two philosophies in discussion on how to + achieve High Availability (HA): * cluster-per-AZ: An independent cluster for each AZ, where each cluster is entirely separate. * cross-AZ-clusters: A single cluster spans multiple AZs. @@ -220,31 +233,31 @@ The debate is open here, where cluster-per-AZ is discussed as more robust but cross-AZ-clusters are more convenient. * Associates the subnet to the route table * Creates security groups for the master (`kubernetes-master-`) - and the nodes (`kubernetes-minion-`) + and the nodes (`kubernetes-minion-`). * Configures security groups so that masters and nodes can communicate. This includes intercommunication between masters and nodes, opening SSH publicly -for both masters and nodes, and opening port 443 on the master for the HTTPS -API endpoints. + for both masters and nodes, and opening port 443 on the master for the HTTPS + API endpoints. * Creates an EBS volume for the master of size `MASTER_DISK_SIZE` and type - `MASTER_DISK_TYPE` + `MASTER_DISK_TYPE`. * Launches a master with a fixed IP address (172.20.0.9) that is also configured for the security group and all the necessary IAM credentials. An -instance script is used to pass vital configuration information to Salt. Note: -The hope is that over time we can reduce the amount of configuration -information that must be passed in this way. + instance script is used to pass vital configuration information to Salt. Note: + The hope is that over time we can reduce the amount of configuration + information that must be passed in this way. * Once the instance is up, it attaches the EBS volume and sets up a manual routing rule for the internal network range (`MASTER_IP_RANGE`, defaults to -10.246.0.0/24) + 10.246.0.0/24). * For auto-scaling, on each nodes it creates a launch configuration and group. The name for both is <*KUBE_AWS_INSTANCE_PREFIX*>-minion-group. The default -name is kubernetes-minion-group. The auto-scaling group has a min and max size -that are both set to NUM_MINIONS. You can change the size of the auto-scaling -group to add or remove the total number of nodes from within the AWS API or -Console. Each nodes self-configures, meaning that they come up; run Salt with -the stored configuration; connect to the master; are assigned an internal CIDR; -and then the master configures the route-table with the assigned CIDR. The -kube-up script performs a health-check on the nodes but it's a self-check that -is not required. + name is kubernetes-minion-group. The auto-scaling group has a min and max size + that are both set to NUM_MINIONS. You can change the size of the auto-scaling + group to add or remove the total number of nodes from within the AWS API or + Console. Each nodes self-configures, meaning that they come up; run Salt with + the stored configuration; connect to the master; are assigned an internal CIDR; + and then the master configures the route-table with the assigned CIDR. The + kube-up script performs a health-check on the nodes but it's a self-check that + is not required. If attempting this configuration manually, I highly recommend following along -- cgit v1.2.3 From 5d938fc28fe94a988711f8cdc68c08451e05bab3 Mon Sep 17 00:00:00 2001 From: Justin Santa Barbara Date: Mon, 19 Oct 2015 14:06:32 -0400 Subject: Remove broken link to CloudFormation setup --- aws_under_the_hood.md | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/aws_under_the_hood.md b/aws_under_the_hood.md index 845964f2..3eaf20cf 100644 --- a/aws_under_the_hood.md +++ b/aws_under_the_hood.md @@ -36,10 +36,9 @@ Documentation for other releases can be found at This document provides high-level insight into how Kubernetes works on AWS and maps to AWS objects. We assume that you are familiar with AWS. -We encourage you to use [kube-up](../getting-started-guides/aws.md) (or -[CloudFormation](../getting-started-guides/aws-coreos.md)) to create clusters on -AWS. We recommend that you avoid manual configuration but are aware that -sometimes it's the only option. +We encourage you to use [kube-up](../getting-started-guides/aws.md) to create +clusters on AWS. We recommend that you avoid manual configuration but are aware +that sometimes it's the only option. Tip: You should open an issue and let us know what enhancements can be made to the scripts to better suit your needs. -- cgit v1.2.3 From 0b10e0b16a0477314a55161fe7422da441c7379d Mon Sep 17 00:00:00 2001 From: Eric Tune Date: Tue, 13 Oct 2015 12:42:49 -0700 Subject: Documented required/optional fields. --- api-conventions.md | 33 +++++++++++++++++++++++++++++++++ 1 file changed, 33 insertions(+) diff --git a/api-conventions.md b/api-conventions.md index 7ad1dbc6..710fff51 100644 --- a/api-conventions.md +++ b/api-conventions.md @@ -59,6 +59,7 @@ using resources with kubectl can be found in [Working with resources](../user-gu - [List Operations](#list-operations) - [Map Operations](#map-operations) - [Idempotency](#idempotency) + - [Optional vs Required](#optional-vs-required) - [Defaulting](#defaulting) - [Late Initialization](#late-initialization) - [Concurrency Control and Consistency](#concurrency-control-and-consistency) @@ -370,6 +371,38 @@ All compatible Kubernetes APIs MUST support "name idempotency" and respond with Names generated by the system may be requested using `metadata.generateName`. GenerateName indicates that the name should be made unique by the server prior to persisting it. A non-empty value for the field indicates the name will be made unique (and the name returned to the client will be different than the name passed). The value of this field will be combined with a unique suffix on the server if the Name field has not been provided. The provided value must be valid within the rules for Name, and may be truncated by the length of the suffix required to make the value unique on the server. If this field is specified, and Name is not present, the server will NOT return a 409 if the generated name exists - instead, it will either return 201 Created or 504 with Reason `ServerTimeout` indicating a unique name could not be found in the time allotted, and the client should retry (optionally after the time indicated in the Retry-After header). +## Optional vs Required + +Fields must be either optional or required. + +Optional fields have the following properties: + +- They have `omitempty` struct tag in Go. +- They are a pointer type in the Go definition (e.g. `bool *awesomeFlag`). +- The API server should allow POSTing and PUTing a resource with this field unset. + +Required fields have the opposite properties, namely: + +- They do not have an `omitempty` struct tag. +- They are not a pointer type in the Go definition (e.g. `bool otherFlag`). +- The API server should not allow POSTing or PUTing a resource with this field unset. + +Using the `omitempty` tag causes swagger documentation to reflect that the field is optional. + +Using a pointer allows distinguishing unset from the zero value for that type. +There are some cases where, in principle, a pointer is not needed for an optional field +since the zero value is forbidden, and thus imples unset. There are examples of this in the +codebase. However: + +- it can be difficult for implementors to anticipate all cases where an empty value might need to be + distinguished from a zero value +- structs are not omitted from encoder output even where omitempty is specified, which is messy; +- having a pointer consistently imply optional is clearer for users of the Go language client, and any + other clients that use corresponding types + +Therefore, we ask that pointers always be used with optional fields. + + ## Defaulting Default resource values are API version-specific, and they are applied during -- cgit v1.2.3 From 830e70f0da5a28b1cc93209f511c0be6e7f47aab Mon Sep 17 00:00:00 2001 From: Justin Santa Barbara Date: Mon, 19 Oct 2015 15:43:41 -0400 Subject: Rename LoadBalancing -> LoadBalancer To match the Type value --- aws_under_the_hood.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/aws_under_the_hood.md b/aws_under_the_hood.md index 3eaf20cf..ec8a31c2 100644 --- a/aws_under_the_hood.md +++ b/aws_under_the_hood.md @@ -53,7 +53,7 @@ you manually created or configured your cluster. * [Storage](#storage) * [Auto Scaling group](#auto-scaling-group) * [Networking](#networking) - * [NodePort and LoadBalancing services](#nodeport-and-loadbalancing-services) + * [NodePort and LoadBalancer services](#nodeport-and-loadbalancer-services) * [Identity and access management (IAM)](#identity-and-access-management-iam) * [Tagging](#tagging) * [AWS objects](#aws-objects) @@ -124,7 +124,7 @@ configured to route to an instance in the VPC routing table. It is also possible to use overlay networking on AWS, but that is not the default configuration of the kube-up script. -### NodePort and LoadBalancing services +### NodePort and LoadBalancer services Kubernetes on AWS integrates with [Elastic Load Balancing (ELB)](http://docs.aws.amazon.com/AutoScaling/latest/DeveloperGuide/US_SetUpASLBApp.html). -- cgit v1.2.3 From 6579078d9ddafb933fd765304208a4a67af3d32e Mon Sep 17 00:00:00 2001 From: dingh Date: Fri, 23 Oct 2015 13:46:32 +0800 Subject: fix typo in api-converntions.md --- api-conventions.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/api-conventions.md b/api-conventions.md index 75c1cf51..35ae7bb6 100644 --- a/api-conventions.md +++ b/api-conventions.md @@ -728,7 +728,7 @@ Accumulate repeated events in the client, especially for frequent events, to red ## Label, selector, and annotation conventions -Labels are the domain of users. They are intended to facilitate organization and management of API resources using attributes that are meaningful to users, as opposed to meaningful to the system. Think of them as user-created mp3 or email inbox labels, as opposed to the directory structure used by a program to store its data. The former is enables the user to apply an arbitrary ontology, whereas the latter is implementation-centric and inflexible. Users will use labels to select resources to operate on, display label values in CLI/UI columns, etc. Users should always retain full power and flexibility over the label schemas they apply to labels in their namespaces. +Labels are the domain of users. They are intended to facilitate organization and management of API resources using attributes that are meaningful to users, as opposed to meaningful to the system. Think of them as user-created mp3 or email inbox labels, as opposed to the directory structure used by a program to store its data. The former enables the user to apply an arbitrary ontology, whereas the latter is implementation-centric and inflexible. Users will use labels to select resources to operate on, display label values in CLI/UI columns, etc. Users should always retain full power and flexibility over the label schemas they apply to labels in their namespaces. However, we should support conveniences for common cases by default. For example, what we now do in ReplicationController is automatically set the RC's selector and labels to the labels in the pod template by default, if they are not already set. That ensures that the selector will match the template, and that the RC can be managed using the same labels as the pods it creates. Note that once we generalize selectors, it won't necessarily be possible to unambiguously generate labels that match an arbitrary selector. -- cgit v1.2.3 From 27b87616c75fe7506feff77dbb2de29b327652c6 Mon Sep 17 00:00:00 2001 From: Tim Hockin Date: Fri, 23 Oct 2015 15:08:27 -0700 Subject: syntax is 'go' not 'golang' --- api-conventions.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/api-conventions.md b/api-conventions.md index 75c1cf51..69d1f740 100644 --- a/api-conventions.md +++ b/api-conventions.md @@ -188,7 +188,7 @@ Objects that contain both spec and status should not contain additional top-leve The `FooCondition` type for some resource type `Foo` may include a subset of the following fields, but must contain at least `type` and `status` fields: -```golang +```go Type FooConditionType `json:"type" description:"type of Foo condition"` Status ConditionStatus `json:"status" description:"status of the condition, one of True, False, Unknown"` LastHeartbeatTime unversioned.Time `json:"lastHeartbeatTime,omitempty" description:"last time we got an update on a given condition"` -- cgit v1.2.3 From 08d5de5c8da7a41e695b18c48d7755668d4c95e9 Mon Sep 17 00:00:00 2001 From: Tim Hockin Date: Fri, 23 Oct 2015 15:18:09 -0700 Subject: fix another bad backticks usage --- horizontal-pod-autoscaler.md | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/horizontal-pod-autoscaler.md b/horizontal-pod-autoscaler.md index 9417ee57..efefb800 100644 --- a/horizontal-pod-autoscaler.md +++ b/horizontal-pod-autoscaler.md @@ -65,9 +65,8 @@ The wider discussion regarding Scale took place in [#1629](https://github.com/ku Scale subresource will be present in API for replication controller or deployment under the following paths: -```api/vX/replicationcontrollers/myrc/scale``` - -```api/vX/deployments/mydeployment/scale``` + * `api/vX/replicationcontrollers/myrc/scale` + * `api/vX/deployments/mydeployment/scale` It will have the following structure: -- cgit v1.2.3 From 2df426d3f2657c059d0dfa99863dd1bacbaba323 Mon Sep 17 00:00:00 2001 From: Eric Tune Date: Fri, 23 Oct 2015 15:41:49 -0700 Subject: In devel docs, refer to .kube/config not .kubernetes_auth --- e2e-tests.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/e2e-tests.md b/e2e-tests.md index ca55b901..882da396 100644 --- a/e2e-tests.md +++ b/e2e-tests.md @@ -74,7 +74,7 @@ For the purposes of brevity, we will look at a subset of the options, which are -repo-root="../../": Root directory of kubernetes repository, for finding test files. ``` -Prior to running the tests, it is recommended that you first create a simple auth file in your home directory, e.g. `$HOME/.kubernetes_auth` , with the following: +Prior to running the tests, it is recommended that you first create a simple auth file in your home directory, e.g. `$HOME/.kube/config` , with the following: ``` { @@ -85,7 +85,7 @@ Prior to running the tests, it is recommended that you first create a simple aut Next, you will need a cluster that you can test against. As mentioned earlier, you will want to execute `sudo ./hack/local-up-cluster.sh`. To get a sense of what tests exist, you may want to run: -`e2e.test --host="127.0.0.1:8080" --provider="local" --ginkgo.v=true -ginkgo.dryRun=true --kubeconfig="$HOME/.kubernetes_auth" --repo-root="$KUBERNETES_SRC_PATH"` +`e2e.test --host="127.0.0.1:8080" --provider="local" --ginkgo.v=true -ginkgo.dryRun=true --kubeconfig="$HOME/.kube/config" --repo-root="$KUBERNETES_SRC_PATH"` If you wish to execute a specific set of tests you can use the `-ginkgo.focus=` regex, e.g.: -- cgit v1.2.3 From 202e7b6567f7aefd2abd9058ae3cdc7ffb4bfe5d Mon Sep 17 00:00:00 2001 From: Robert Wehner Date: Sat, 24 Oct 2015 20:02:54 -0600 Subject: Fix dead links to submit-queue * https://github.com/kubernetes/contrib/pull/122 merged submit-queue into mungegithub. This fixes links to the old submit-queue location. * Standardized to use "submit-queue" instead of "submit queue". Just picked one since both were used. * Fixes dead link to on-call wiki. --- automation.md | 4 ++-- pull-requests.md | 6 +++--- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/automation.md b/automation.md index eb36cc63..f01b6158 100644 --- a/automation.md +++ b/automation.md @@ -46,8 +46,8 @@ In an effort to * maintain e2e stability * load test githubs label feature -We have added an automated [submit-queue](https://github.com/kubernetes/contrib/tree/master/submit-queue) -for kubernetes. +We have added an automated [submit-queue](https://github.com/kubernetes/contrib/blob/master/mungegithub/pulls/submit-queue.go) to the +[github "munger"](https://github.com/kubernetes/contrib/tree/master/mungegithub) for kubernetes. The submit-queue does the following: diff --git a/pull-requests.md b/pull-requests.md index 7b955b3d..15a0f447 100644 --- a/pull-requests.md +++ b/pull-requests.md @@ -52,14 +52,14 @@ Life of a Pull Request Unless in the last few weeks of a milestone when we need to reduce churn and stabilize, we aim to be always accepting pull requests. -Either the [on call](https://github.com/kubernetes/kubernetes/wiki/Kubernetes-on-call-rotation) manually or the [submit queue](https://github.com/kubernetes/contrib/tree/master/submit-queue) automatically will manage merging PRs. +Either the [on call](https://github.com/kubernetes/kubernetes/wiki/Kubernetes-on-call-rotations) manually or the [github "munger"](https://github.com/kubernetes/contrib/tree/master/mungegithub) submit-queue plugin automatically will manage merging PRs. -There are several requirements for the submit queue to work: +There are several requirements for the submit-queue to work: * Author must have signed CLA ("cla: yes" label added to PR) * No changes can be made since last lgtm label was applied * k8s-bot must have reported the GCE E2E build and test steps passed (Travis, Shippable and Jenkins build) -Additionally, for infrequent or new contributors, we require the on call to apply the "ok-to-merge" label manually. This is gated by the [whitelist](https://github.com/kubernetes/contrib/tree/master/submit-queue/whitelist.txt). +Additionally, for infrequent or new contributors, we require the on call to apply the "ok-to-merge" label manually. This is gated by the [whitelist](https://github.com/kubernetes/contrib/blob/master/mungegithub/whitelist.txt). Automation ---------- -- cgit v1.2.3 From 0880787aa4ba01a636a4da90e75dde425360dd39 Mon Sep 17 00:00:00 2001 From: Jerzy Szczepkowski Date: Fri, 16 Oct 2015 12:04:43 +0200 Subject: Proposal for horizontal pod autoscaler updated and moved to design. Proposal for horizontal pod autoscaler updated and moved to design. Related to #15652. --- horizontal-pod-autoscaler.md | 272 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 272 insertions(+) create mode 100644 horizontal-pod-autoscaler.md diff --git a/horizontal-pod-autoscaler.md b/horizontal-pod-autoscaler.md new file mode 100644 index 00000000..35991847 --- /dev/null +++ b/horizontal-pod-autoscaler.md @@ -0,0 +1,272 @@ + + + + +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest 1.0.x release of this document can be found +[here](http://releases.k8s.io/release-1.0/docs/design/horizontal-pod-autoscaler.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- + + + + + +# Horizontal Pod Autoscaling + +## Preface + +This document briefly describes the design of the horizontal autoscaler for pods. +The autoscaler (implemented as a Kubernetes API resource and controller) is responsible for dynamically controlling +the number of replicas of some collection (e.g. the pods of a ReplicationController) to meet some objective(s), +for example a target per-pod CPU utilization. + +This design supersedes [autoscaling.md](http://releases.k8s.io/release-1.0/docs/proposals/autoscaling.md). + +## Overview + +The resource usage of a serving application usually varies over time: sometimes the demand for the application rises, +and sometimes it drops. +In Kubernetes version 1.0, a user can only manually set the number of serving pods. +Our aim is to provide a mechanism for the automatic adjustment of the number of pods based on CPU utilization statistics +(a future version will allow autoscaling based on other resources/metrics). + +## Scale Subresource + +In Kubernetes version 1.1, we are introducing Scale subresource and implementing horizontal autoscaling of pods based on it. +Scale subresource is supported for replication controllers and deployments. +Scale subresource is a Virtual Resource (does not correspond to an object stored in etcd). +It is only present in the API as an interface that a controller (in this case the HorizontalPodAutoscaler) can use to dynamically scale +the number of replicas controlled by some other API object (currently ReplicationController and Deployment) and to learn the current number of replicas. +Scale is a subresource of the API object that it serves as the interface for. +The Scale subresource is useful because whenever we introduce another type we want to autoscale, we just need to implement the Scale subresource for it. +The wider discussion regarding Scale took place in [#1629](https://github.com/kubernetes/kubernetes/issues/1629). + +Scale subresource is in API for replication controller or deployment under the following paths: + +`apis/extensions/v1beta1/replicationcontrollers/myrc/scale` + +`apis/extensions/v1beta1/deployments/mydeployment/scale` + +It has the following structure: + +```go +// represents a scaling request for a resource. +type Scale struct { + unversioned.TypeMeta + api.ObjectMeta + + // defines the behavior of the scale. + Spec ScaleSpec + + // current status of the scale. + Status ScaleStatus +} + +// describes the attributes of a scale subresource +type ScaleSpec struct { + // desired number of instances for the scaled object. + Replicas int `json:"replicas,omitempty"` +} + +// represents the current status of a scale subresource. +type ScaleStatus struct { + // actual number of observed instances of the scaled object. + Replicas int `json:"replicas"` + + // label query over pods that should match the replicas count. + Selector map[string]string `json:"selector,omitempty"` +} +``` + +Writing to `ScaleSpec.Replicas` resizes the replication controller/deployment associated with +the given Scale subresource. +`ScaleStatus.Replicas` reports how many pods are currently running in the replication controller/deployment, +and `ScaleStatus.Selector` returns selector for the pods. + +## HorizontalPodAutoscaler Object + +In Kubernetes version 1.1, we are introducing HorizontalPodAutoscaler object. It is accessible under: + +`apis/extensions/v1beta1/horizontalpodautoscalers/myautoscaler` + +It has the following structure: + +```go +// configuration of a horizontal pod autoscaler. +type HorizontalPodAutoscaler struct { + unversioned.TypeMeta + api.ObjectMeta + + // behavior of autoscaler. + Spec HorizontalPodAutoscalerSpec + + // current information about the autoscaler. + Status HorizontalPodAutoscalerStatus +} + +// specification of a horizontal pod autoscaler. +type HorizontalPodAutoscalerSpec struct { + // reference to Scale subresource; horizontal pod autoscaler will learn the current resource + // consumption from its status,and will set the desired number of pods by modifying its spec. + ScaleRef SubresourceReference + // lower limit for the number of pods that can be set by the autoscaler, default 1. + MinReplicas *int + // upper limit for the number of pods that can be set by the autoscaler. + // It cannot be smaller than MinReplicas. + MaxReplicas int + // target average CPU utilization (represented as a percentage of requested CPU) over all the pods; + // if not specified it defaults to the target CPU utilization at 80% of the requested resources. + CPUUtilization *CPUTargetUtilization +} + +type CPUTargetUtilization struct { + // fraction of the requested CPU that should be utilized/used, + // e.g. 70 means that 70% of the requested CPU should be in use. + TargetPercentage int +} + +// current status of a horizontal pod autoscaler +type HorizontalPodAutoscalerStatus struct { + // most recent generation observed by this autoscaler. + ObservedGeneration *int64 + + // last time the HorizontalPodAutoscaler scaled the number of pods; + // used by the autoscaler to control how often the number of pods is changed. + LastScaleTime *unversioned.Time + + // current number of replicas of pods managed by this autoscaler. + CurrentReplicas int + + // desired number of replicas of pods managed by this autoscaler. + DesiredReplicas int + + // current average CPU utilization over all pods, represented as a percentage of requested CPU, + // e.g. 70 means that an average pod is using now 70% of its requested CPU. + CurrentCPUUtilizationPercentage *int +} +``` + +`ScaleRef` is a reference to the Scale subresource. +`MinReplicas`, `MaxReplicas` and `CPUUtilization` define autoscaler configuration. +We are also introducing HorizontalPodAutoscalerList object to enable listing all autoscalers in a namespace: + +```go +// list of horizontal pod autoscaler objects. +type HorizontalPodAutoscalerList struct { + unversioned.TypeMeta + unversioned.ListMeta + + // list of horizontal pod autoscaler objects. + Items []HorizontalPodAutoscaler +} +``` + +## Autoscaling Algorithm + +The autoscaler is implemented as a control loop. It periodically queries pods described by `Status.PodSelector` of Scale subresource, and collects their CPU utilization. +Then, it compares the arithmetic mean of the pods' CPU utilization with the target defined in `Spec.CPUUtilization`, +and adjust the replicas of the Scale if needed to match the target +(preserving condition: MinReplicas <= Replicas <= MaxReplicas). + +The period of the autoscaler is controlled by `--horizontal-pod-autoscaler-sync-period` flag of controller manager. +The default value is 30 seconds. + + +CPU utilization is the recent CPU usage of a pod (average across the last 1 minute) divided by the CPU requested by the pod. +In Kubernetes version 1.1, CPU usage is taken directly from Heapster. +In future, there will be API on master for this purpose +(see [#11951](https://github.com/kubernetes/kubernetes/issues/11951)). + +The target number of pods is calculated from the following formula: + +``` +TargetNumOfPods = ceil(sum(CurrentPodsCPUUtilization) / Target) +``` + +Starting and stopping pods may introduce noise to the metric (for instance, starting may temporarily increase CPU). +So, after each action, the autoscaler should wait some time for reliable data. +Scale-up can only happen if there was no rescaling within the last 3 minutes. +Scale-down will wait for 5 minutes from the last rescaling. +Moreover any scaling will only be made if: `avg(CurrentPodsConsumption) / Target` drops below 0.9 or increases above 1.1 (10% tolerance). +Such approach has two benefits: + +* Autoscaler works in a conservative way. + If new user load appears, it is important for us to rapidly increase the number of pods, + so that user requests will not be rejected. + Lowering the number of pods is not that urgent. + +* Autoscaler avoids thrashing, i.e.: prevents rapid execution of conflicting decision if the load is not stable. + +## Relative vs. absolute metrics + +We chose values of the target metric to be relative (e.g. 90% of requested CPU resource) rather than absolute (e.g. 0.6 core) for the following reason. +If we choose absolute metric, user will need to guarantee that the target is lower than the request. +Otherwise, overloaded pods may not be able to consume more than the autoscaler's absolute target utilization, +thereby preventing the autoscaler from seeing high enough utilization to trigger it to scale up. +This may be especially troublesome when user changes requested resources for a pod +because they would need to also change the autoscaler utilization threshold. +Therefore, we decided to choose relative metric. +For user, it is enough to set it to a value smaller than 100%, and further changes of requested resources will not invalidate it. + +## Support in kubectl + +To make manipulation of HorizontalPodAutoscaler object simpler, we added support for +creating/updating/deleting/listing of HorizontalPodAutoscaler to kubectl. +In addition, in future, we are planning to add kubectl support for the following use-cases: +* When creating a replication controller or deployment with `kubectl create [-f]`, there should be + a possibility to specify an additional autoscaler object. + (This should work out-of-the-box when creation of autoscaler is supported by kubectl as we may include + multiple objects in the same config file). +* *[future]* When running an image with `kubectl run`, there should be an additional option to create + an autoscaler for it. +* *[future]* We will add a new command `kubectl autoscale` that will allow for easy creation of an autoscaler object + for already existing replication controller/deployment. + +## Next steps + +We list here some features that are not supported in Kubernetes version 1.1. +However, we want to keep them in mind, as they will most probably be needed in future. +Our design is in general compatible with them. +* *[future]* **Autoscale pods based on metrics different than CPU** (e.g. memory, network traffic, qps). + This includes scaling based on a custom/application metric. +* *[future]* **Autoscale pods base on an aggregate metric.** + Autoscaler, instead of computing average for a target metric across pods, will use a single, external, metric (e.g. qps metric from load balancer). + The metric will be aggregated while the target will remain per-pod + (e.g. when observing 100 qps on load balancer while the target is 20 qps per pod, autoscaler will set the number of replicas to 5). +* *[future]* **Autoscale pods based on multiple metrics.** + If the target numbers of pods for different metrics are different, choose the largest target number of pods. +* *[future]* **Scale the number of pods starting from 0.** + All pods can be turned-off, and then turned-on when there is a demand for them. + When a request to service with no pods arrives, kube-proxy will generate an event for autoscaler + to create a new pod. + Discussed in [#3247](https://github.com/kubernetes/kubernetes/issues/3247). +* *[future]* **When scaling down, make more educated decision which pods to kill.** + E.g.: if two or more pods from the same replication controller are on the same node, kill one of them. + Discussed in [#4301](https://github.com/kubernetes/kubernetes/issues/4301). + + + + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/horizontal-pod-autoscaler.md?pixel)]() + -- cgit v1.2.3 From f6c42a0cf8abc4e3839021c8bf2b5f2a0ac84397 Mon Sep 17 00:00:00 2001 From: Jerzy Szczepkowski Date: Fri, 16 Oct 2015 12:04:43 +0200 Subject: Proposal for horizontal pod autoscaler updated and moved to design. Proposal for horizontal pod autoscaler updated and moved to design. Related to #15652. --- autoscaling.md | 8 ++ horizontal-pod-autoscaler.md | 280 ------------------------------------------- 2 files changed, 8 insertions(+), 280 deletions(-) delete mode 100644 horizontal-pod-autoscaler.md diff --git a/autoscaling.md b/autoscaling.md index ea60af74..97fa672d 100644 --- a/autoscaling.md +++ b/autoscaling.md @@ -31,6 +31,14 @@ Documentation for other releases can be found at +--- + +# WARNING: + +## This document is outdated. It is superseded by [the horizontal pod autoscaler design doc](../design/horizontal-pod-autoscaler.md). + +--- + ## Abstract Auto-scaling is a data-driven feature that allows users to increase or decrease capacity as needed by controlling the diff --git a/horizontal-pod-autoscaler.md b/horizontal-pod-autoscaler.md deleted file mode 100644 index efefb800..00000000 --- a/horizontal-pod-autoscaler.md +++ /dev/null @@ -1,280 +0,0 @@ - - - - -WARNING -WARNING -WARNING -WARNING -WARNING - -

PLEASE NOTE: This document applies to the HEAD of the source tree

- -If you are using a released version of Kubernetes, you should -refer to the docs that go with that version. - - -The latest 1.0.x release of this document can be found -[here](http://releases.k8s.io/release-1.0/docs/proposals/horizontal-pod-autoscaler.md). - -Documentation for other releases can be found at -[releases.k8s.io](http://releases.k8s.io). - --- - - - - - -# Horizontal Pod Autoscaling - -**Author**: Jerzy Szczepkowski (@jszczepkowski) - -## Preface - -This document briefly describes the design of the horizontal autoscaler for pods. -The autoscaler (implemented as a kubernetes control loop) will be responsible for automatically -choosing and setting the number of pods of a given type that run in a kubernetes cluster. - -This proposal supersedes [autoscaling.md](http://releases.k8s.io/release-1.0/docs/proposals/autoscaling.md). - -## Overview - -The usage of a serving application usually vary over time: sometimes the demand for the application rises, -and sometimes it drops. -In Kubernetes version 1.0, a user can only manually set the number of serving pods. -Our aim is to provide a mechanism for the automatic adjustment of the number of pods based on usage statistics. - -## Scale Subresource - -We are going to introduce Scale subresource and implement horizontal autoscaling of pods based on it. -Scale subresource will be supported for replication controllers and deployments. -Scale subresource will be a Virtual Resource (will not be stored in etcd as a separate object). -It will be only present in API as an interface to accessing replication controller or deployment, -and the values of Scale fields will be inferred from the corresponding replication controller/deployment object. -HorizontalPodAutoscaler object will be bound with exactly one Scale subresource and will be -autoscaling associated replication controller/deployment through it. -The main advantage of such approach is that whenever we introduce another type we want to auto-scale, -we just need to implement Scale subresource for it (w/o modifying autoscaler code or API). -The wider discussion regarding Scale took place in [#1629](https://github.com/kubernetes/kubernetes/issues/1629). - -Scale subresource will be present in API for replication controller or deployment under the following paths: - - * `api/vX/replicationcontrollers/myrc/scale` - * `api/vX/deployments/mydeployment/scale` - -It will have the following structure: - -```go -// Scale subresource, applicable to ReplicationControllers and (in future) Deployment. -type Scale struct { - api.TypeMeta - api.ObjectMeta - - // Spec defines the behavior of the scale. - Spec ScaleSpec - - // Status represents the current status of the scale. - Status ScaleStatus -} - -// ScaleSpec describes the attributes a Scale subresource -type ScaleSpec struct { - // Replicas is the number of desired replicas. - Replicas int -} - -// ScaleStatus represents the current status of a Scale subresource. -type ScaleStatus struct { - // Replicas is the number of actual replicas. - Replicas int - - // Selector is a label query over pods that should match the replicas count. - Selector map[string]string -} - -``` - -Writing ```ScaleSpec.Replicas``` will resize the replication controller/deployment associated with -the given Scale subresource. -```ScaleStatus.Replicas``` will report how many pods are currently running in the replication controller/deployment, -and ```ScaleStatus.Selector``` will return selector for the pods. - -## HorizontalPodAutoscaler Object - -We will introduce HorizontalPodAutoscaler object, it will be accessible under: - -``` -api/vX/horizontalpodautoscalers/myautoscaler -``` - -It will have the following structure: - -```go -// HorizontalPodAutoscaler represents the configuration of a horizontal pod autoscaler. -type HorizontalPodAutoscaler struct { - api.TypeMeta - api.ObjectMeta - - // Spec defines the behaviour of autoscaler. - Spec HorizontalPodAutoscalerSpec - - // Status represents the current information about the autoscaler. - Status HorizontalPodAutoscalerStatus -} - -// HorizontalPodAutoscalerSpec is the specification of a horizontal pod autoscaler. -type HorizontalPodAutoscalerSpec struct { - // ScaleRef is a reference to Scale subresource. HorizontalPodAutoscaler will learn the current - // resource consumption from its status, and will set the desired number of pods by modifying its spec. - ScaleRef *SubresourceReference - // MinReplicas is the lower limit for the number of pods that can be set by the autoscaler. - MinReplicas int - // MaxReplicas is the upper limit for the number of pods that can be set by the autoscaler. - // It cannot be smaller than MinReplicas. - MaxReplicas int - // Target is the target average consumption of the given resource that the autoscaler will try - // to maintain by adjusting the desired number of pods. - // Currently this can be either "cpu" or "memory". - Target ResourceConsumption -} - -// HorizontalPodAutoscalerStatus contains the current status of a horizontal pod autoscaler -type HorizontalPodAutoscalerStatus struct { - // CurrentReplicas is the number of replicas of pods managed by this autoscaler. - CurrentReplicas int - - // DesiredReplicas is the desired number of replicas of pods managed by this autoscaler. - // The number may be different because pod downscaling is sometimes delayed to keep the number - // of pods stable. - DesiredReplicas int - - // CurrentConsumption is the current average consumption of the given resource that the autoscaler will - // try to maintain by adjusting the desired number of pods. - // Two types of resources are supported: "cpu" and "memory". - CurrentConsumption ResourceConsumption - - // LastScaleTimestamp is the last time the HorizontalPodAutoscaler scaled the number of pods. - // This is used by the autoscaler to control how often the number of pods is changed. - LastScaleTimestamp *unversioned.Time -} - -// ResourceConsumption is an object for specifying average resource consumption of a particular resource. -type ResourceConsumption struct { - Resource api.ResourceName - Quantity resource.Quantity -} -``` - -```Scale``` will be a reference to the Scale subresource. -```MinReplicas```, ```MaxReplicas``` and ```Target``` will define autoscaler configuration. -We will also introduce HorizontalPodAutoscalerList object to enable listing all autoscalers in the cluster: - -```go -// HorizontalPodAutoscaler is a collection of pod autoscalers. -type HorizontalPodAutoscalerList struct { - api.TypeMeta - api.ListMeta - - Items []HorizontalPodAutoscaler -} -``` - -## Autoscaling Algorithm - -The autoscaler will be implemented as a control loop. -It will periodically (e.g.: every 1 minute) query pods described by ```Status.PodSelector``` of Scale subresource, -and check their average CPU or memory usage from the last 1 minute -(there will be API on master for this purpose, see -[#11951](https://github.com/kubernetes/kubernetes/issues/11951). -Then, it will compare the current CPU or memory consumption with the Target, -and adjust the replicas of the Scale if needed to match the target -(preserving condition: MinReplicas <= Replicas <= MaxReplicas). - -The target number of pods will be calculated from the following formula: - -``` -TargetNumOfPods =ceil(sum(CurrentPodsConsumption) / Target) -``` - -Starting and stopping pods may introduce noise to the metrics (for instance starting may temporarily increase -CPU and decrease average memory consumption) so, after each action, the autoscaler should wait some time for reliable data. - -Scale-up will happen if there was no rescaling within the last 3 minutes. -Scale-down will wait for 10 minutes from the last rescaling. Moreover any scaling will only be made if - -``` -avg(CurrentPodsConsumption) / Target -``` - -drops below 0.9 or increases above 1.1 (10% tolerance). Such approach has two benefits: - -* Autoscaler works in a conservative way. - If new user load appears, it is important for us to rapidly increase the number of pods, - so that user requests will not be rejected. - Lowering the number of pods is not that urgent. - -* Autoscaler avoids thrashing, i.e.: prevents rapid execution of conflicting decision if the load is not stable. - -## Relative vs. absolute metrics - -The question arises whether the values of the target metrics should be absolute (e.g.: 0.6 core, 100MB of RAM) -or relative (e.g.: 110% of resource request, 90% of resource limit). -The argument for the relative metrics is that when user changes resources for a pod, -she will not have to change the definition of the autoscaler object, as the relative metric will still be valid. -However, we want to be able to base autoscaling on custom metrics in the future. -Such metrics will rather be absolute (e.g.: the number of queries-per-second). -Therefore, we decided to give absolute values for the target metrics in the initial version. - -Please note that when custom metrics are supported, it will be possible to create additional metrics -in heapster that will divide CPU/memory consumption by resource request/limit. -From autoscaler point of view the metrics will be absolute, -although such metrics will be bring the benefits of relative metrics to the user. - - -## Support in kubectl - -To make manipulation on HorizontalPodAutoscaler object simpler, we will add support for -creating/updating/deletion/listing of HorizontalPodAutoscaler to kubectl. -In addition, we will add kubectl support for the following use-cases: -* When running an image with ```kubectl run```, there should be an additional option to create - an autoscaler for it. -* When creating a replication controller or deployment with ```kubectl create [-f]```, there should be - a possibility to specify an additional autoscaler object. - (This should work out-of-the-box when creation of autoscaler is supported by kubectl as we may include - multiple objects in the same config file). -* We will and a new command ```kubectl autoscale``` that will allow for easy creation of an autoscaler object - for already existing replication controller/deployment. - -## Next steps - -We list here some features that will not be supported in the initial version of autoscaler. -However, we want to keep them in mind, as they will most probably be needed in future. -Our design is in general compatible with them. -* Autoscale pods based on metrics different than CPU & memory (e.g.: network traffic, qps). - This includes scaling based on a custom metric. -* Autoscale pods based on multiple metrics. - If the target numbers of pods for different metrics are different, choose the largest target number of pods. -* Scale the number of pods starting from 0: all pods can be turned-off, - and then turned-on when there is a demand for them. - When a request to service with no pods arrives, kube-proxy will generate an event for autoscaler - to create a new pod. - Discussed in [#3247](https://github.com/kubernetes/kubernetes/issues/3247). -* When scaling down, make more educated decision which pods to kill (e.g.: if two or more pods are on the same node, kill one of them). - Discussed in [#4301](https://github.com/kubernetes/kubernetes/issues/4301). -* Allow rule based autoscaling: instead of specifying the target value for metric, - specify a rule, e.g.: “if average CPU consumption of pod is higher than 80% add two more replicas”. - This approach was initially suggested in - [autoscaling.md](http://releases.k8s.io/release-1.0/docs/proposals/autoscaling.md) proposal. - Before doing this, we need to evaluate why the target based scaling described in this proposal is not sufficient. - - - - -[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/horizontal-pod-autoscaler.md?pixel)]() - -- cgit v1.2.3 From 706954ab3ef1d8499edc6d2a7888f6330f1a3225 Mon Sep 17 00:00:00 2001 From: "Tim St. Clair" Date: Tue, 27 Oct 2015 15:04:57 -0700 Subject: Add kubelet raw metrics API proposal --- compute-resource-metrics-api.md | 204 +++++++++++++++++++++++++++++----------- 1 file changed, 147 insertions(+), 57 deletions(-) diff --git a/compute-resource-metrics-api.md b/compute-resource-metrics-api.md index c9f3d9af..ba7b4e28 100644 --- a/compute-resource-metrics-api.md +++ b/compute-resource-metrics-api.md @@ -35,15 +35,31 @@ Documentation for other releases can be found at ## Goals -Provide resource usage metrics on pods and nodes on the API server to be used -by the scheduler to improve job placement, utilization, etc. and by end users -to understand the resource utilization of their jobs. Horizontal and vertical -auto-scaling are also near-term uses. +Provide resource usage metrics on pods and nodes through the API server to be +used by the scheduler to improve pod placement, utilization, etc. and by end +users to understand the resource utilization of their jobs. Horizontal and +vertical auto-scaling are also near-term uses. Additionally, a subset of the +metrics API should be served directly from the kubelet. + +### API Requirements + +- Provide machine level metrics, all pod metrics (in single request), specific + pod metrics +- Ability to authenticate machine & pod metrics independently from each other +- Support multiple kinds of metrics (e.g. raw & derived types) +- Follow existing API conventions, fully compatible types able to eventually be + served by apiserver library +- Maximum common ground between cluster and Kubelet API. ## Current state -Right now, the Kubelet exports container metrics via an API endpoint. This -information is not gathered nor served by the Kubernetes API server. +Kubelet currently exposes raw container metrics through the `/stats/` endpoint +that serves raw container stats. However, this endpoint serves individual +container stats, and applications like heapster, which aggregates metrics across +the cluster, must repeatedly query for each container. The high QPS combined +with the potential data size of raw stats puts unnecessary load on the system +that could be avoided with an aggregate API. This information is not gathered +nor served by the Kubernetes API server. ## Use cases @@ -61,13 +77,60 @@ kube-ui-v1-nd7in 0.07 cores 130 MB A second user will be the scheduler. To assign pods to nodes efficiently, the scheduler needs to know the current free resources on each node. -## Proposed endpoints +The Kubelet API will be used by heapster to provide metrics at the +cluster-level. The Kubelet API will also be useful for debugging individual +nodes, and stand-alone kubelets. - /api/v1/namespaces/myns/podMetrics/mypod - /api/v1/nodeMetrics/myNode +## Proposed endpoints -The derived metrics include the mean, max and a few percentiles of the list of -values. +The metrics API will be its own [API group](api-group.md), and is shared by the +kubelet and cluster API. The derived metrics include the mean, max and a few +percentiles of the list of values, and will initially only be available through +the API server. The raw metrics include the stat samples from cAdvisor, and will +only be available through the kubelet. The types of metrics are detailed +[below](#schema). All endpoints are GET endpoints, rooted at +`/apis/metrics/v1alpha1/` + +- `/` - discovery endpoint; type resource list +- `/rawNodes` - raw host metrics; type `[]metrics.RawNode` + - `/rawNodes/localhost` - The only node provided is `localhost`; type + metrics.Node +- `/derivedNodes` - host metrics; type `[]metrics.DerivedNode` + - `/nodes/{node}` - derived metrics for a specific node +- `/rawPods` - All raw pod metrics across all namespaces; type + `[]metrics.RawPod` +- `/derivedPods` - All derived pod metrics across all namespaces; type + `[]metrics.DerivedPod` +- `/namespaces/{namespace}/rawPods` - All raw pod metrics within namespace; type + `[]metrics.RawPod` + - `/namespaces/{namespace}/rawPods/{pod}` - raw metrics for specific pod +- `/namespaces/{namespace}/derivedPods` - All derived pod metrics within + namespace; type `[]metrics.DerivedPod` + - `/namespaces/{namespace}/derivedPods/{pod}` - derived metrics for specific + pod +- Unsupported paths return status not found (404) + - `/namespaces/` + - `/namespaces/{namespace}` + +Additionally, all endpoints (except root discovery endpoint) support the +following optional query parameters: + +- `start` - start time to return metrics from; type json encoded + `time.Time`; since samples are retrieved at discrete intervals, the first + sample after the start time is the actual beginning. +- `end` - end time to return metrics to; type json encoded `time.Time` +- `step` - the time step between each stats sample; type int (seconds), default + 10s, must be a multiple of 10s +- `count` - maximum number of stats to return in each ContainerMetrics instance; + type int + +As well as the common query parameters: + +- `pretty` - pretty print the response +- `labelSelector` - restrict the list of returned objects by labels (list endpoints only) +- `fieldSelector` - restrict the list of returned objects by fields (list endpoints only) + +### Rationale We are not adding new methods to pods and nodes, e.g. `/api/v1/namespaces/myns/pods/mypod/metrics`, for a number of reasons. For @@ -80,16 +143,73 @@ namespace or service aggregation, metrics at those levels could also be exposed taking advantage of the fact that Heapster already does aggregation and metrics for them. -Initially, this proposal included raw metrics alongside the derived metrics. -After revising the use cases, it was clear that raw metrics could be left out -of this proposal. They can be dealt with in a separate proposal, exposing them -in the Kubelet API via proper versioned endpoints for Heapster to poll -periodically. +## Schema + +Types are colocated with other API groups in `/pkg/apis/metrics`, and follow api +groups conventions there. + +```go +// Raw metrics are only available through the kubelet API. +type RawNode struct { + TypeMeta + ObjectMeta // Should include node name + Machine ContainerMetrics + SystemContainers []ContainerMetrics +} +type RawPod struct { + TypeMeta + ObjectMeta // Should include pod name + Containers []Container +} +type RawContainer struct { + TypeMeta + ObjectMeta // Should include container name + Spec ContainerSpec // Mirrors cadvisorv2.ContainerSpec + Stats []ContainerStats // Mirrors cadvisorv2.ContainerStats +} + +// Derived metrics are (initially) only available through the API server. +type DerivedNode struct { + TypeMeta + ObjectMeta // Should include node name + Machine MetricsWindow + SystemContainers []DerivedContainer +} +type DerivedPod struct { + TypeMeta + ObjectMeta // Should include pod name + Containers []DerivedContainer +} +type DerivedContainer struct { + TypeMeta + ObjectMeta // Should include container name + Metrics DerivedWindows +} + +// Last overlapping 10s, 1m, 1h and 1d as a start +// Updated every 10s, so the 10s window is sequential and the rest are +// rolling. +type DerivedWindows map[time.Duration]DerivedMetrics + +type DerivedMetrics struct { + // End time of all the time windows in Metrics + EndTime unversioned.Time `json:"endtime"` + + Mean ResourceUsage `json:"mean"` + Max ResourceUsage `json:"max"` + NinetyFive ResourceUsage `json:"95th"` +} + +type ResourceUsage map[resource.Type]resource.Quantity +``` + +See +[cadvisor/info/v2](https://github.com/google/cadvisor/blob/master/info/v2/container.go) +for `ContainerSpec` and `ContainerStats` definitions. -This also means that the amount of data pushed by each Kubelet to the API -server will be much smaller. +## Implementation -## Data gathering +### Cluster We will use a push based system. Each kubelet will periodically - every 10s - POST its derived metrics to the API server. Then, any users of the metrics can @@ -131,45 +251,15 @@ the future to improve that situation. More information on kubelet checkpoints can be read on [#489](https://issues.k8s.io/489). -## Data structure - -```Go -type DerivedPodMetrics struct { - TypeMeta - ObjectMeta // should have pod name - // the key is the container name - Containers []struct { - ContainerReference *Container - Metrics MetricsWindows - } -} - -type DerivedNodeMetrics struct { - TypeMeta - ObjectMeta // should have node name - NodeMetrics MetricsWindows - SystemContainers []struct { - ContainerReference *Container - Metrics MetricsWindows - } -} - -// Last overlapping 10s, 1m, 1h and 1d as a start -// Updated every 10s, so the 10s window is sequential and the rest are -// rolling. -type MetricsWindows map[time.Duration]DerivedMetrics +### Kubelet -type DerivedMetrics struct { - // End time of all the time windows in Metrics - EndTime unversioned.Time `json:"endtime"` - - Mean ResourceUsage `json:"mean"` - Max ResourceUsage `json:"max"` - NinetyFive ResourceUsage `json:"95th"` -} - -type ResourceUsage map[resource.Type]resource.Quantity -``` +The eventual goal is to use the `apiserver` library to serve kubelet versioned +APIs. Since the apiserver library is not currently reuseable at the kubelet and +we do not want to block on it, we will write a simple 1-off solution for this +API. The 1-off code should be an implementation detail, and the exposed API +should match the expectations of the API server, so that we can throw away the +initial implementation when the apiserver is ready to serve the kubelet API. We +should prioritize replacing it before the API becomes too large or complicated. -- cgit v1.2.3 From 6f050771d96266a78901d8cb9946b159c70fb14f Mon Sep 17 00:00:00 2001 From: hurf Date: Fri, 30 Oct 2015 14:12:20 +0800 Subject: Remove trace of "kubectl stop" Remove doc and use of "kubectl stop" since it's deprecated. --- flaky-tests.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/flaky-tests.md b/flaky-tests.md index 3a7af51e..2470a815 100644 --- a/flaky-tests.md +++ b/flaky-tests.md @@ -87,10 +87,10 @@ done grep "Exited ([^0])" output.txt ``` -Eventually you will have sufficient runs for your purposes. At that point you can stop and delete the replication controller by running: +Eventually you will have sufficient runs for your purposes. At that point you can delete the replication controller by running: ```sh -kubectl stop replicationcontroller flakecontroller +kubectl delete replicationcontroller flakecontroller ``` If you do a final check for flakes with `docker ps -a`, ignore tasks that exited -1, since that's what happens when you stop the replication controller. -- cgit v1.2.3 From 6a9d36de0ba742897ec9239118f5dd32c5daaefa Mon Sep 17 00:00:00 2001 From: Isaac Hollander McCreery Date: Thu, 22 Oct 2015 05:24:34 -0700 Subject: Remove out-of-date information about releasing --- releasing.md | 74 ------------------------------------------------------------ 1 file changed, 74 deletions(-) diff --git a/releasing.md b/releasing.md index 9a73405f..6ff8e862 100644 --- a/releasing.md +++ b/releasing.md @@ -249,80 +249,6 @@ can, for instance, tell it to override `gitVersion` and set it to `v0.4-13-g4567bcdef6789-dirty` and set `gitCommit` to `4567bcdef6789...` which is the complete SHA1 of the (dirty) tree used at build time. -## Handling Official Versions - -Handling official versions from git is easy, as long as there is an annotated -git tag pointing to a specific version then `git describe` will return that tag -exactly which will match the idea of an official version (e.g. `v0.5`). - -Handling it on tarballs is a bit harder since the exact version string must be -present in `pkg/version/base.go` for it to get embedded into the binaries. But -simply creating a commit with `v0.5` on its own would mean that the commits -coming after it would also get the `v0.5` version when built from tarball or `go -get` while in fact they do not match `v0.5` (the one that was tagged) exactly. - -To handle that case, creating a new release should involve creating two adjacent -commits where the first of them will set the version to `v0.5` and the second -will set it to `v0.5-dev`. In that case, even in the presence of merges, there -will be a single commit where the exact `v0.5` version will be used and all -others around it will either have `v0.4-dev` or `v0.5-dev`. - -The diagram below illustrates it. - -![Diagram of git commits involved in the release](releasing.png) - -After working on `v0.4-dev` and merging PR 99 we decide it is time to release -`v0.5`. So we start a new branch, create one commit to update -`pkg/version/base.go` to include `gitVersion = "v0.5"` and `git commit` it. - -We test it and make sure everything is working as expected. - -Before sending a PR for it, we create a second commit on that same branch, -updating `pkg/version/base.go` to include `gitVersion = "v0.5-dev"`. That will -ensure that further builds (from tarball or `go install`) on that tree will -always include the `-dev` prefix and will not have a `v0.5` version (since they -do not match the official `v0.5` exactly.) - -We then send PR 100 with both commits in it. - -Once the PR is accepted, we can use `git tag -a` to create an annotated tag -*pointing to the one commit* that has `v0.5` in `pkg/version/base.go` and push -it to GitHub. (Unfortunately GitHub tags/releases are not annotated tags, so -this needs to be done from a git client and pushed to GitHub using SSH or -HTTPS.) - -## Parallel Commits - -While we are working on releasing `v0.5`, other development takes place and -other PRs get merged. For instance, in the example above, PRs 101 and 102 get -merged to the master branch before the versioning PR gets merged. - -This is not a problem, it is only slightly inaccurate that checking out the tree -at commit `012abc` or commit `345cde` or at the commit of the merges of PR 101 -or 102 will yield a version of `v0.4-dev` *but* those commits are not present in -`v0.5`. - -In that sense, there is a small window in which commits will get a -`v0.4-dev` or `v0.4-N-gXXX` label and while they're indeed later than `v0.4` -but they are not really before `v0.5` in that `v0.5` does not contain those -commits. - -Unfortunately, there is not much we can do about it. On the other hand, other -projects seem to live with that and it does not really become a large problem. - -As an example, Docker commit a327d9b91edf has a `v1.1.1-N-gXXX` label but it is -not present in Docker `v1.2.0`: - -```console -$ git describe a327d9b91edf -v1.1.1-822-ga327d9b91edf - -$ git log --oneline v1.2.0..a327d9b91edf -a327d9b91edf Fix data space reporting from Kb/Mb to KB/MB - -(Non-empty output here means the commit is not present on v1.2.0.) -``` - ## Release Notes No official release should be made final without properly matching release notes. -- cgit v1.2.3 From 87e5266e0aecf72c5f018c363fabecf4c422d824 Mon Sep 17 00:00:00 2001 From: Isaac Hollander McCreery Date: Thu, 22 Oct 2015 05:25:35 -0700 Subject: TODOs --- releasing.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/releasing.md b/releasing.md index 6ff8e862..7366d999 100644 --- a/releasing.md +++ b/releasing.md @@ -177,6 +177,8 @@ include everything in the release notes. ## Origin of the Sources +TODO(ihmccreery) update this + Kubernetes may be built from either a git tree (using `hack/build-go.sh`) or from a tarball (using either `hack/build-go.sh` or `go install`) or directly by the Go native build system (using `go get`). @@ -193,6 +195,8 @@ between releases (e.g. at some point in development between v0.3 and v0.4). ## Version Number Format +TODO(ihmccreery) update this + In order to account for these use cases, there are some specific formats that may end up representing the Kubernetes version. Here are a few examples: @@ -251,6 +255,8 @@ is the complete SHA1 of the (dirty) tree used at build time. ## Release Notes +TODO(ihmccreery) update this + No official release should be made final without properly matching release notes. There should be made available, per release, a small summary, preamble, of the -- cgit v1.2.3 From 2650762ee43dea96a2a166527f4a3c9c1615f930 Mon Sep 17 00:00:00 2001 From: Isaac Hollander McCreery Date: Thu, 22 Oct 2015 08:37:26 -0700 Subject: Proposed design for release infra --- releasing.md | 255 +++++++++++++++++++++++++++++++++-------------------------- 1 file changed, 145 insertions(+), 110 deletions(-) diff --git a/releasing.md b/releasing.md index 7366d999..acb46a34 100644 --- a/releasing.md +++ b/releasing.md @@ -42,31 +42,53 @@ after the first section. Regardless of whether you are cutting a major or minor version, cutting a release breaks down into four pieces: -1. Selecting release components. -1. Tagging and merging the release in Git. -1. Building and pushing the binaries. -1. Writing release notes. +1. selecting release components; +1. cutting/branching the release; +1. publishing binaries and release notes. You should progress in this strict order. -### Building a New Major/Minor Version (`vX.Y.0`) +### Selecting release components + +First, figure out what kind of release you're doing, what branch you're cutting +from, and other prerequisites. + +* Alpha releases (`vX.Y.0-alpha.W`) are cut directly from `master`. + * Alpha releases don't require anything besides green tests, (see below). +* Official releases (`vX.Y.Z`) are cut from their respective release branch, + `release-X.Y`. + * Make sure all necessary cherry picks have been resolved. You should ensure + that all outstanding cherry picks have been reviewed and merged and the + branch validated on Jenkins. See [Cherry Picks](cherry-picks.md) for more + information on how to manage cherry picks prior to cutting the release. + * Official releases also require green tests, (see below). +* New release series are also cut direclty from `master`. + * **This is a big deal!** If you're reading this doc for the first time, you + probably shouldn't be doing this release, and should talk to someone on the + release team. + * New release series cut a new release branch, `release-X.Y`, off of + `master`, and also release the first beta in the series, `vX.Y.0-beta`. + * Every change in the `vX.Y` series from this point on will have to be + cherry picked, so be sure you want to do this before proceeding. + * You should still look for green tests, (see below). + +No matter what you're cutting, you're going to want to look at +[Jenkins](http://go/k8s-test/). Figure out what branch you're cutting from, +(see above,) and look at the critical jobs building from that branch. First +glance through builds and look for nice solid rows of green builds, and then +check temporally with the other critical builds to make sure they're solid +around then as well. Once you find some greens, you can find the Git hash for a +build by looking at the Full Console Output and searching for `githash=`. You +should see a line: -#### Selecting Release Components +```console +githash=v1.2.0-alpha.2.164+b44c7d79d6c9bb +``` -When cutting a major/minor release, your first job is to find the branch -point. We cut `vX.Y.0` releases directly from `master`, which is also the -branch that we have most continuous validation on. Go first to [the main GCE -Jenkins end-to-end job](http://go/k8s-test/job/kubernetes-e2e-gce) and next to [the -Critical Builds page](http://go/k8s-test/view/Critical%20Builds) and hopefully find a -recent Git hash that looks stable across at least `kubernetes-e2e-gce` and -`kubernetes-e2e-gke-ci`. First glance through builds and look for nice solid -rows of green builds, and then check temporally with the other Critical Builds -to make sure they're solid around then as well. Once you find some greens, you -can find the Git hash for a build by looking at the "Console Log", then look for -`githash=`. You should see a line line: +Or, if you're cutting from a release branch (i.e. doing an official release), ```console -+ githash=v0.20.2-322-g974377b +githash=v1.1.0-beta.567+d79d6c9bbb44c7 ``` Because Jenkins builds frequently, if you're looking between jobs @@ -81,99 +103,112 @@ oncall. Before proceeding to the next step: ```sh -export BRANCHPOINT=v0.20.2-322-g974377b +export GITHASH=v1.2.0-alpha.2.164+b44c7d79d6c9bb +``` + +Where `v1.2.0-alpha.2.164+b44c7d79d6c9bb` is the Git hash you decided on. This +will become your release point. + +### Cutting/branching the release + +You'll need the latest version of the releasing tools: + +```console +git clone git@github.com:kubernetes/contrib.git +cd contrib/release +``` + +#### Cutting an alpha release (`vX.Y.0-alpha.W`) + +Figure out what version you're cutting, and + +```console +export VER=vX.Y.0-alpha.W +``` + +then, from `contrib/release`, run + +```console +cut-alpha.sh "${VER}" "${GITHASH}" +``` + +This will: + +1. clone a temporary copy of the [kubernetes repo](https://github.com/kubernetes/kubernetes); +1. mark the `vX.Y.0-alpha.W` tag at the given Git hash; +1. push the tag to GitHub; +1. build the release binaries at the given Git hash; +1. publish the binaries to GCS; +1. prompt you to do the remainder of the work. + +#### Cutting an official release (`vX.Y.Z`) + +Figure out what version you're cutting, and + +```console +export VER=vX.Y.Z +``` + +then, from `contrib/release`, run + +```console +cut-official.sh "${VER}" "${GITHASH}" ``` -Where `v0.20.2-322-g974377b` is the git hash you decided on. This will become -our (retroactive) branch point. - -#### Branching, Tagging and Merging - -Do the following: - -1. `export VER=x.y` (e.g. `0.20` for v0.20) -1. cd to the base of the repo -1. `git fetch upstream && git checkout -b release-${VER} ${BRANCHPOINT}` (you did set `${BRANCHPOINT}`, right?) -1. Make sure you don't have any files you care about littering your repo (they - better be checked in or outside the repo, or the next step will delete them). -1. `make clean && git reset --hard HEAD && git clean -xdf` -1. `make` (TBD: you really shouldn't have to do this, but the swagger output step requires it right now) -1. `./build/mark-new-version.sh v${VER}.0` to mark the new release and get further - instructions. This creates a series of commits on the branch you're working - on (`release-${VER}`), including forking our documentation for the release, - the release version commit (which is then tagged), and the post-release - version commit. -1. Follow the instructions given to you by that script. They are canon for the - remainder of the Git process. If you don't understand something in that - process, please ask! - -**TODO**: how to fix tags, etc., if you have to shift the release branchpoint. - -#### Building and Pushing Binaries - -In your git repo (you still have `${VER}` set from above right?): - -1. `git checkout upstream/master && build/build-official-release.sh v${VER}.0` (the `build-official-release.sh` script is version agnostic, so it's best to run it off `master` directly). -1. Follow the instructions given to you by that script. -1. At this point, you've done all the Git bits, you've got all the binary bits pushed, and you've got the template for the release started on GitHub. - -#### Writing Release Notes - -[This helpful guide](making-release-notes.md) describes how to write release -notes for a major/minor release. In the release template on GitHub, leave the -last PR number that the tool finds for the `.0` release, so the next releaser -doesn't have to hunt. - -### Building a New Patch Release (`vX.Y.Z` for `Z > 0`) - -#### Selecting Release Components - -We cut `vX.Y.Z` releases from the `release-vX.Y` branch after all cherry picks -to the branch have been resolved. You should ensure all outstanding cherry picks -have been reviewed and merged and the branch validated on Jenkins (validation -TBD). See the [Cherry Picks](cherry-picks.md) for more information on how to -manage cherry picks prior to cutting the release. - -#### Tagging and Merging - -1. `export VER=x.y` (e.g. `0.20` for v0.20) -1. `export PATCH=Z` where `Z` is the patch level of `vX.Y.Z` -1. cd to the base of the repo -1. `git fetch upstream && git checkout -b upstream/release-${VER} release-${VER}` -1. Make sure you don't have any files you care about littering your repo (they - better be checked in or outside the repo, or the next step will delete them). -1. `make clean && git reset --hard HEAD && git clean -xdf` -1. `make` (TBD: you really shouldn't have to do this, but the swagger output step requires it right now) -1. `./build/mark-new-version.sh v${VER}.${PATCH}` to mark the new release and get further - instructions. This creates a series of commits on the branch you're working - on (`release-${VER}`), including forking our documentation for the release, - the release version commit (which is then tagged), and the post-release - version commit. -1. Follow the instructions given to you by that script. They are canon for the - remainder of the Git process. If you don't understand something in that - process, please ask! When proposing PRs, you can pre-fill the body with - `hack/cherry_pick_list.sh upstream/release-${VER}` to inform people of what - is already on the branch. - -**TODO**: how to fix tags, etc., if the release is changed. - -#### Building and Pushing Binaries - -In your git repo (you still have `${VER}` and `${PATCH}` set from above right?): - -1. `git checkout upstream/master && build/build-official-release.sh - v${VER}.${PATCH}` (the `build-official-release.sh` script is version - agnostic, so it's best to run it off `master` directly). -1. Follow the instructions given to you by that script. At this point, you've - done all the Git bits, you've got all the binary bits pushed, and you've got - the template for the release started on GitHub. - -#### Writing Release Notes - -Run `hack/cherry_pick_list.sh ${VER}.${PATCH}~1` to get the release notes for -the patch release you just created. Feel free to prune anything internal, like -you would for a major release, but typically for patch releases we tend to -include everything in the release notes. +This will: + +1. clone a temporary copy of the [kubernetes repo](https://github.com/kubernetes/kubernetes); +1. do a series of commits on the branch, including forking the documentation + and doing the release version commit; + * TODO(ihmccreery) it's not yet clear what exactly this is going to look like. +1. mark both the `vX.Y.Z` and `vX.Y.(Z+1)-beta` tags at the given Git hash; +1. push the tags to GitHub; +1. build the release binaries at the given Git hash (on the appropriate + branch); +1. publish the binaries to GCS; +1. prompt you to do the remainder of the work. + +#### Branching a new release series (`vX.Y`) + +Once again, **this is a big deal!** If you're reading this doc for the first +time, you probably shouldn't be doing this release, and should talk to someone +on the release team. + +Figure out what series you're cutting, and + +```console +export VER=vX.Y +``` + +then, from `contrib/release`, run + +```console +branch-series.sh "${VER}" "${GITHASH}" +``` + +This will: + +1. clone a temporary copy of the [kubernetes repo](https://github.com/kubernetes/kubernetes); +1. mark the `vX.(Y+1).0-alpha.0` tag at the given Git hash on `master`; +1. fork a new branch `release-X.Y` off of `master` at the Given Git hash; +1. do a series of commits on the branch, including forking the documentation + and doing the release version commit; + * TODO(ihmccreery) it's not yet clear what exactly this is going to look like. +1. mark the `vX.Y.0-beta` tag at the appropriate commit on the new `release-X.Y` branch; +1. push the tags to GitHub; +1. build the release binaries at the appropriate Git hash on the appropriate + branches, (for both the new alpha and beta releases); +1. publish the binaries to GCS; +1. prompt you to do the remainder of the work. + +**TODO(ihmccreery)**: can we fix tags, etc., if you have to shift the release branchpoint? + +### Publishing binaries and release notes + +Whichever script you ran above will prompt you to take any remaining steps, +including publishing binaries and release notes. + +**TODO(ihmccreery)**: deal with the `making-release-notes` doc in `docs/devel`. ## Origin of the Sources @@ -195,7 +230,7 @@ between releases (e.g. at some point in development between v0.3 and v0.4). ## Version Number Format -TODO(ihmccreery) update this +TODO(ihmccreery) update everything below here In order to account for these use cases, there are some specific formats that may end up representing the Kubernetes version. Here are a few examples: -- cgit v1.2.3 From b264864ea601b33b938c1fc2429a2170779356e1 Mon Sep 17 00:00:00 2001 From: Isaac Hollander McCreery Date: Wed, 28 Oct 2015 11:08:55 -0700 Subject: Doc fixup to reflect script reality --- releasing.md | 68 +++++++++++++++++++++++++----------------------------------- 1 file changed, 28 insertions(+), 40 deletions(-) diff --git a/releasing.md b/releasing.md index acb46a34..971c2878 100644 --- a/releasing.md +++ b/releasing.md @@ -44,6 +44,7 @@ release breaks down into four pieces: 1. selecting release components; 1. cutting/branching the release; +1. building and pushing the binaries; and 1. publishing binaries and release notes. You should progress in this strict order. @@ -77,7 +78,7 @@ No matter what you're cutting, you're going to want to look at (see above,) and look at the critical jobs building from that branch. First glance through builds and look for nice solid rows of green builds, and then check temporally with the other critical builds to make sure they're solid -around then as well. Once you find some greens, you can find the Git hash for a +around then as well. Once you find some greens, you can find the git hash for a build by looking at the Full Console Output and searching for `githash=`. You should see a line: @@ -106,7 +107,7 @@ Before proceeding to the next step: export GITHASH=v1.2.0-alpha.2.164+b44c7d79d6c9bb ``` -Where `v1.2.0-alpha.2.164+b44c7d79d6c9bb` is the Git hash you decided on. This +Where `v1.2.0-alpha.2.164+b44c7d79d6c9bb` is the git hash you decided on. This will become your release point. ### Cutting/branching the release @@ -123,50 +124,46 @@ cd contrib/release Figure out what version you're cutting, and ```console -export VER=vX.Y.0-alpha.W +export VER="vX.Y.0-alpha.W" ``` then, from `contrib/release`, run ```console -cut-alpha.sh "${VER}" "${GITHASH}" +cut.sh "${VER}" "${GITHASH}" ``` This will: -1. clone a temporary copy of the [kubernetes repo](https://github.com/kubernetes/kubernetes); -1. mark the `vX.Y.0-alpha.W` tag at the given Git hash; -1. push the tag to GitHub; -1. build the release binaries at the given Git hash; -1. publish the binaries to GCS; -1. prompt you to do the remainder of the work. +1. mark the `vX.Y.0-alpha.W` tag at the given git hash; +1. prompt you to do the remainder of the work, including building the + appropriate binaries and pushing them to the appropriate places. #### Cutting an official release (`vX.Y.Z`) Figure out what version you're cutting, and ```console -export VER=vX.Y.Z +export VER="vX.Y.Z" ``` then, from `contrib/release`, run ```console -cut-official.sh "${VER}" "${GITHASH}" +cut.sh "${VER}" "${GITHASH}" ``` This will: -1. clone a temporary copy of the [kubernetes repo](https://github.com/kubernetes/kubernetes); -1. do a series of commits on the branch, including forking the documentation - and doing the release version commit; - * TODO(ihmccreery) it's not yet clear what exactly this is going to look like. -1. mark both the `vX.Y.Z` and `vX.Y.(Z+1)-beta` tags at the given Git hash; -1. push the tags to GitHub; -1. build the release binaries at the given Git hash (on the appropriate - branch); -1. publish the binaries to GCS; -1. prompt you to do the remainder of the work. +1. do a series of commits on the branch for `vX.Y.Z`, including versionizing + the documentation and doing the release version commit; +1. mark the `vX.Y.Z` tag at the release version commit; +1. do a series of commits on the branch for `vX.Y.(Z+1)-beta` on top of the + previous commits, including versionizing the documentation and doing the + beta version commit; +1. mark the `vX.Y.(Z+1)-beta` tag at the release version commit; +1. prompt you to do the remainder of the work, including building the + appropriate binaries and pushing them to the appropriate places. #### Branching a new release series (`vX.Y`) @@ -177,39 +174,30 @@ on the release team. Figure out what series you're cutting, and ```console -export VER=vX.Y +export VER="vX.Y" ``` then, from `contrib/release`, run ```console -branch-series.sh "${VER}" "${GITHASH}" +cut.sh "${VER}" "${GITHASH}" ``` This will: -1. clone a temporary copy of the [kubernetes repo](https://github.com/kubernetes/kubernetes); -1. mark the `vX.(Y+1).0-alpha.0` tag at the given Git hash on `master`; -1. fork a new branch `release-X.Y` off of `master` at the Given Git hash; -1. do a series of commits on the branch, including forking the documentation - and doing the release version commit; - * TODO(ihmccreery) it's not yet clear what exactly this is going to look like. -1. mark the `vX.Y.0-beta` tag at the appropriate commit on the new `release-X.Y` branch; -1. push the tags to GitHub; -1. build the release binaries at the appropriate Git hash on the appropriate - branches, (for both the new alpha and beta releases); -1. publish the binaries to GCS; -1. prompt you to do the remainder of the work. - -**TODO(ihmccreery)**: can we fix tags, etc., if you have to shift the release branchpoint? +1. mark the `vX.(Y+1).0-alpha.0` tag at the given git hash on `master`; +1. fork a new branch `release-X.Y` off of `master` at the given git hash; +1. do a series of commits on the branch for `vX.Y.0-beta`, including versionizing + the documentation and doing the release version commit; +1. mark the `vX.Y.(Z+1)-beta` tag at the beta version commit; +1. prompt you to do the remainder of the work, including building the + appropriate binaries and pushing them to the appropriate places. ### Publishing binaries and release notes Whichever script you ran above will prompt you to take any remaining steps, including publishing binaries and release notes. -**TODO(ihmccreery)**: deal with the `making-release-notes` doc in `docs/devel`. - ## Origin of the Sources TODO(ihmccreery) update this -- cgit v1.2.3 From a17031110e8725d3944979fc52cc2540e75531f9 Mon Sep 17 00:00:00 2001 From: Isaac Hollander McCreery Date: Thu, 29 Oct 2015 15:10:00 -0700 Subject: Fixups of docs and scripts --- releasing.md | 90 ++++++++++++++++++------------------------------------------ 1 file changed, 27 insertions(+), 63 deletions(-) diff --git a/releasing.md b/releasing.md index 971c2878..2ba88bd3 100644 --- a/releasing.md +++ b/releasing.md @@ -78,9 +78,16 @@ No matter what you're cutting, you're going to want to look at (see above,) and look at the critical jobs building from that branch. First glance through builds and look for nice solid rows of green builds, and then check temporally with the other critical builds to make sure they're solid -around then as well. Once you find some greens, you can find the git hash for a -build by looking at the Full Console Output and searching for `githash=`. You -should see a line: +around then as well. + +If you're doing an alpha release or cutting a new release series, you can +choose an arbitrary build. If you are doing an official release, you have to +release from HEAD of the branch, (because you have to do some version-rev +commits,) so choose the latest build on the release branch. (Remember, that +branch should be frozen.) + +Once you find some greens, you can find the git hash for a build by looking at +the Full Console Output and searching for `githash=`. You should see a line: ```console githash=v1.2.0-alpha.2.164+b44c7d79d6c9bb @@ -115,10 +122,12 @@ will become your release point. You'll need the latest version of the releasing tools: ```console -git clone git@github.com:kubernetes/contrib.git -cd contrib/release +git clone git@github.com:kubernetes/kubernetes.git +cd kubernetes ``` +or `git checkout upstream/master` from an existing repo. + #### Cutting an alpha release (`vX.Y.0-alpha.W`) Figure out what version you're cutting, and @@ -127,10 +136,10 @@ Figure out what version you're cutting, and export VER="vX.Y.0-alpha.W" ``` -then, from `contrib/release`, run +then, run ```console -cut.sh "${VER}" "${GITHASH}" +build/cut-official-release.sh "${VER}" "${GITHASH}" ``` This will: @@ -147,10 +156,10 @@ Figure out what version you're cutting, and export VER="vX.Y.Z" ``` -then, from `contrib/release`, run +then, run ```console -cut.sh "${VER}" "${GITHASH}" +build/cut-official-release.sh "${VER}" "${GITHASH}" ``` This will: @@ -161,7 +170,7 @@ This will: 1. do a series of commits on the branch for `vX.Y.(Z+1)-beta` on top of the previous commits, including versionizing the documentation and doing the beta version commit; -1. mark the `vX.Y.(Z+1)-beta` tag at the release version commit; +1. mark the `vX.Y.(Z+1)-beta` tag at the beta version commit; 1. prompt you to do the remainder of the work, including building the appropriate binaries and pushing them to the appropriate places. @@ -177,10 +186,10 @@ Figure out what series you're cutting, and export VER="vX.Y" ``` -then, from `contrib/release`, run +then, run ```console -cut.sh "${VER}" "${GITHASH}" +build/cut-official-release.sh "${VER}" "${GITHASH}" ``` This will: @@ -189,18 +198,19 @@ This will: 1. fork a new branch `release-X.Y` off of `master` at the given git hash; 1. do a series of commits on the branch for `vX.Y.0-beta`, including versionizing the documentation and doing the release version commit; -1. mark the `vX.Y.(Z+1)-beta` tag at the beta version commit; +1. mark the `vX.Y.0-beta` tag at the beta version commit; 1. prompt you to do the remainder of the work, including building the appropriate binaries and pushing them to the appropriate places. ### Publishing binaries and release notes -Whichever script you ran above will prompt you to take any remaining steps, -including publishing binaries and release notes. +The script you ran above will prompt you to take any remaining steps, including +publishing binaries and release notes. -## Origin of the Sources +## Injecting Version into Binaries -TODO(ihmccreery) update this +*Please note that this information may be out of date. The scripts are the +authoritative source on how version injection works.* Kubernetes may be built from either a git tree (using `hack/build-go.sh`) or from a tarball (using either `hack/build-go.sh` or `go install`) or directly by @@ -216,36 +226,6 @@ access to the information about the git tree, but we still want to be able to tell whether this build corresponds to an exact release (e.g. v0.3) or is between releases (e.g. at some point in development between v0.3 and v0.4). -## Version Number Format - -TODO(ihmccreery) update everything below here - -In order to account for these use cases, there are some specific formats that -may end up representing the Kubernetes version. Here are a few examples: - -- **v0.5**: This is official version 0.5 and this version will only be used - when building from a clean git tree at the v0.5 git tag, or from a tree - extracted from the tarball corresponding to that specific release. -- **v0.5-15-g0123abcd4567**: This is the `git describe` output and it indicates - that we are 15 commits past the v0.5 release and that the SHA1 of the commit - where the binaries were built was `0123abcd4567`. It is only possible to have - this level of detail in the version information when building from git, not - when building from a tarball. -- **v0.5-15-g0123abcd4567-dirty** or **v0.5-dirty**: The extra `-dirty` prefix - means that the tree had local modifications or untracked files at the time of - the build, so there's no guarantee that the source code matches exactly the - state of the tree at the `0123abcd4567` commit or at the `v0.5` git tag - (resp.) -- **v0.5-dev**: This means we are building from a tarball or using `go get` or, - if we have a git tree, we are using `go install` directly, so it is not - possible to inject the git version into the build information. Additionally, - this is not an official release, so the `-dev` prefix indicates that the - version we are building is after `v0.5` but before `v0.6`. (There is actually - an exception where a commit with `v0.5-dev` is not present on `v0.6`, see - later for details.) - -## Injecting Version into Binaries - In order to cover the different build cases, we start by providing information that can be used when using only Go build tools or when we do not have the git version information available. @@ -276,22 +256,6 @@ can, for instance, tell it to override `gitVersion` and set it to `v0.4-13-g4567bcdef6789-dirty` and set `gitCommit` to `4567bcdef6789...` which is the complete SHA1 of the (dirty) tree used at build time. -## Release Notes - -TODO(ihmccreery) update this - -No official release should be made final without properly matching release notes. - -There should be made available, per release, a small summary, preamble, of the -major changes, both in terms of feature improvements/bug fixes and notes about -functional feature changes (if any) regarding the previous released version so -that the BOM regarding updating to it gets as obvious and trouble free as possible. - -After this summary, preamble, all the relevant PRs/issues that got in that -version should be listed and linked together with a small summary understandable -by plain mortals (in a perfect world PR/issue's title would be enough but often -it is just too cryptic/geeky/domain-specific that it isn't). - [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/releasing.md?pixel)]() -- cgit v1.2.3 From 1e19e8e1c8137d11c6f9eab041990fa32299bd14 Mon Sep 17 00:00:00 2001 From: Isaac Hollander McCreery Date: Thu, 29 Oct 2015 15:14:13 -0700 Subject: Move to release/ --- releasing.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/releasing.md b/releasing.md index 2ba88bd3..fad957b6 100644 --- a/releasing.md +++ b/releasing.md @@ -139,7 +139,7 @@ export VER="vX.Y.0-alpha.W" then, run ```console -build/cut-official-release.sh "${VER}" "${GITHASH}" +release/cut-official-release.sh "${VER}" "${GITHASH}" ``` This will: @@ -159,7 +159,7 @@ export VER="vX.Y.Z" then, run ```console -build/cut-official-release.sh "${VER}" "${GITHASH}" +release/cut-official-release.sh "${VER}" "${GITHASH}" ``` This will: @@ -189,7 +189,7 @@ export VER="vX.Y" then, run ```console -build/cut-official-release.sh "${VER}" "${GITHASH}" +release/cut-official-release.sh "${VER}" "${GITHASH}" ``` This will: -- cgit v1.2.3 From c19a1f90d8c5c47210af5c8e8577c4950999a078 Mon Sep 17 00:00:00 2001 From: Isaac Hollander McCreery Date: Fri, 9 Oct 2015 15:29:17 -0700 Subject: Updates to versioning.md --- versioning.md | 23 +++++++++++++---------- 1 file changed, 13 insertions(+), 10 deletions(-) diff --git a/versioning.md b/versioning.md index c764a585..75cdffce 100644 --- a/versioning.md +++ b/versioning.md @@ -35,23 +35,26 @@ Documentation for other releases can be found at Legend: -* **Kube <major>.<minor>.<patch>** refers to the version of Kubernetes that is released. This versions all components: apiserver, kubelet, kubectl, etc. +* **Kube X.Y.Z** refers to the version of Kubernetes that is released. This versions all components: apiserver, kubelet, kubectl, etc. (**X** is the major version, **Y** is the minor version, and **Z** is the patch version.) * **API vX[betaY]** refers to the version of the HTTP API. -## Release Timeline +## Release versioning ### Minor version scheme and timeline -* Kube 1.0.0, 1.0.1 -- DONE! -* Kube 1.0.X (X>1): Standard operating procedure. We patch the release-1.0 branch as needed and increment the patch number. -* Kube 1.1.0-alpha.X: Released roughly every two weeks by cutting from HEAD. No cherrypick releases. If there is a critical bugfix, a new release from HEAD can be created ahead of schedule. -* Kube 1.1.0-beta: When HEAD is feature-complete, we will cut the release-1.1.0 branch 2 weeks prior to the desired 1.1.0 date and only merge PRs essential to 1.1. This cut will be marked as 1.1.0-beta, and HEAD will be revved to 1.2.0-alpha.0. -* Kube 1.1.0: Final release, cut from the release-1.1.0 branch cut two weeks prior. Should occur between 3 and 4 months after 1.0. 1.1.1-beta will be tagged at the same time on the same branch. +* Kube X.Y.0-alpha.W, W > 0: Alpha releases are released roughly every two weeks directly from the master branch. No cherrypick releases. If there is a critical bugfix, a new release from master can be created ahead of schedule. +* Kube X.Y.Z-beta: When master is feature-complete for Kube X.Y, we will cut the release-X.Y branch 2 weeks prior to the desired X.Y.0 date and cherrypick only PRs essential to X.Y. This cut will be marked as X.Y.0-beta, and master will be revved to X.Y+1.0-alpha.0. +* Kube X.Y.0: Final release, cut from the release-X.Y branch cut two weeks prior. X.Y.1-beta will be tagged at the same commit on the same branch. X.Y.0 occur 3 to 4 months after X.Y-1.0. +* Kube X.Y.Z, Z > 0: [Patch releases](#patches) are released as we cherrypick commits into the release-X.Y branch, (which is at X.Y.Z-beta,) as needed. X.Y.Z is cut straight from the release-X.Y branch, and X.Y.Z+1-beta is tagged on the same commit. ### Major version timeline There is no mandated timeline for major versions. They only occur when we need to start the clock on deprecating features. A given major version should be the latest major version for at least one year from its original release date. +### CI version scheme + +* Continuous integration versions also exist, and are versioned off of alpha and beta releases. X.Y.Z-alpha.W.C+aaaa is C commits after X.Y.Z-alpha.W, with an additional +aaaa build suffix added; X.Y.Z-beta.C+bbbb is C commits after X.Y.Z-beta, with an additional +bbbb build suffix added. + ## Release versions as related to API versions Here is an example major release cycle: @@ -64,11 +67,11 @@ Here is an example major release cycle: * Before Kube 2.0 is cut, API v2 must be released in 1.x. This enables two things: (1) users can upgrade to API v2 when running Kube 1.x and then switch over to Kube 2.x transparently, and (2) in the Kube 2.0 release itself we can cleanup and remove all API v2beta\* versions because no one should have v2beta\* objects left in their database. As mentioned above, tooling will exist to make sure there are no calls or references to a given API version anywhere inside someone's kube installation before someone upgrades. * Kube 2.0 must include the v1 API, but Kube 3.0 must include the v2 API only. It *may* include the v1 API as well if the burden is not high - this will be determined on a per-major-version basis. -## Rationale for API v2 being complete before v2.0's release +### Rationale for API v2 being complete before v2.0's release It may seem a bit strange to complete the v2 API before v2.0 is released, but *adding* a v2 API is not a breaking change. *Removing* the v2beta\* APIs *is* a breaking change, which is what necessitates the major version bump. There are other ways to do this, but having the major release be the fresh start of that release's API without the baggage of its beta versions seems most intuitive out of the available options. -# Patches +## Patches Patch releases are intended for critical bug fixes to the latest minor version, such as addressing security vulnerabilities, fixes to problems affecting a large number of users, severe problems with no workaround, and blockers for products based on Kubernetes. @@ -76,7 +79,7 @@ They should not contain miscellaneous feature additions or improvements, and esp Dependencies, such as Docker or Etcd, should also not be changed unless absolutely necessary, and also just to fix critical bugs (so, at most patch version changes, not new major nor minor versions). -# Upgrades +## Upgrades * Users can upgrade from any Kube 1.x release to any other Kube 1.x release as a rolling upgrade across their cluster. (Rolling upgrade means being able to upgrade the master first, then one node at a time. See #4855 for details.) * No hard breaking changes over version boundaries. -- cgit v1.2.3 From bea654021f08ac105301afac36662cf487354cdb Mon Sep 17 00:00:00 2001 From: Isaac Hollander McCreery Date: Thu, 29 Oct 2015 15:21:35 -0700 Subject: Remove old releasing illustrations --- releasing.dot | 113 ---------------------------------------------------------- releasing.png | Bin 30693 -> 0 bytes releasing.svg | 113 ---------------------------------------------------------- 3 files changed, 226 deletions(-) delete mode 100644 releasing.dot delete mode 100644 releasing.png delete mode 100644 releasing.svg diff --git a/releasing.dot b/releasing.dot deleted file mode 100644 index fe8124c3..00000000 --- a/releasing.dot +++ /dev/null @@ -1,113 +0,0 @@ -// Build it with: -// $ dot -Tsvg releasing.dot >releasing.svg - -digraph tagged_release { - size = "5,5" - // Arrows go up. - rankdir = BT - subgraph left { - // Group the left nodes together. - ci012abc -> pr101 -> ci345cde -> pr102 - style = invis - } - subgraph right { - // Group the right nodes together. - version_commit -> dev_commit - style = invis - } - { // Align the version commit and the info about it. - rank = same - // Align them with pr101 - pr101 - version_commit - // release_info shows the change in the commit. - release_info - } - { // Align the dev commit and the info about it. - rank = same - // Align them with 345cde - ci345cde - dev_commit - dev_info - } - // Join the nodes from subgraph left. - pr99 -> ci012abc - pr102 -> pr100 - // Do the version node. - pr99 -> version_commit - dev_commit -> pr100 - tag -> version_commit - pr99 [ - label = "Merge PR #99" - shape = box - fillcolor = "#ccccff" - style = "filled" - fontname = "Helvetica Neue, Helvetica, Segoe UI, Arial, freesans, sans-serif" - ]; - ci012abc [ - label = "012abc" - shape = circle - fillcolor = "#ffffcc" - style = "filled" - fontname = "Consolas, Liberation Mono, Menlo, Courier, monospace" - ]; - pr101 [ - label = "Merge PR #101" - shape = box - fillcolor = "#ccccff" - style = "filled" - fontname = "Helvetica Neue, Helvetica, Segoe UI, Arial, freesans, sans-serif" - ]; - ci345cde [ - label = "345cde" - shape = circle - fillcolor = "#ffffcc" - style = "filled" - fontname = "Consolas, Liberation Mono, Menlo, Courier, monospace" - ]; - pr102 [ - label = "Merge PR #102" - shape = box - fillcolor = "#ccccff" - style = "filled" - fontname = "Helvetica Neue, Helvetica, Segoe UI, Arial, freesans, sans-serif" - ]; - version_commit [ - label = "678fed" - shape = circle - fillcolor = "#ccffcc" - style = "filled" - fontname = "Consolas, Liberation Mono, Menlo, Courier, monospace" - ]; - dev_commit [ - label = "456dcb" - shape = circle - fillcolor = "#ffffcc" - style = "filled" - fontname = "Consolas, Liberation Mono, Menlo, Courier, monospace" - ]; - pr100 [ - label = "Merge PR #100" - shape = box - fillcolor = "#ccccff" - style = "filled" - fontname = "Helvetica Neue, Helvetica, Segoe UI, Arial, freesans, sans-serif" - ]; - release_info [ - label = "pkg/version/base.go:\ngitVersion = \"v0.5\";" - shape = none - fontname = "Helvetica Neue, Helvetica, Segoe UI, Arial, freesans, sans-serif" - ]; - dev_info [ - label = "pkg/version/base.go:\ngitVersion = \"v0.5-dev\";" - shape = none - fontname = "Helvetica Neue, Helvetica, Segoe UI, Arial, freesans, sans-serif" - ]; - tag [ - label = "$ git tag -a v0.5" - fillcolor = "#ffcccc" - style = "filled" - fontname = "Helvetica Neue, Helvetica, Segoe UI, Arial, freesans, sans-serif" - ]; -} - diff --git a/releasing.png b/releasing.png deleted file mode 100644 index 935628de..00000000 Binary files a/releasing.png and /dev/null differ diff --git a/releasing.svg b/releasing.svg deleted file mode 100644 index f703e6e2..00000000 --- a/releasing.svg +++ /dev/null @@ -1,113 +0,0 @@ - - - - - - -tagged_release - - -ci012abc - -012abc - - -pr101 - -Merge PR #101 - - -ci012abc->pr101 - - - - -ci345cde - -345cde - - -pr101->ci345cde - - - - -pr102 - -Merge PR #102 - - -ci345cde->pr102 - - - - -pr100 - -Merge PR #100 - - -pr102->pr100 - - - - -version_commit - -678fed - - -dev_commit - -456dcb - - -version_commit->dev_commit - - - - -dev_commit->pr100 - - - - -release_info -pkg/version/base.go: -gitVersion = "v0.5"; - - -dev_info -pkg/version/base.go: -gitVersion = "v0.5-dev"; - - -pr99 - -Merge PR #99 - - -pr99->ci012abc - - - - -pr99->version_commit - - - - -tag - -$ git tag -a v0.5 - - -tag->version_commit - - - - - -- cgit v1.2.3 From 77d47119045fbaa5128ecddab9150341599c0049 Mon Sep 17 00:00:00 2001 From: Isaac Hollander McCreery Date: Thu, 29 Oct 2015 15:10:00 -0700 Subject: Fixups of docs and scripts --- versioning.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/versioning.md b/versioning.md index 75cdffce..a189d0cf 100644 --- a/versioning.md +++ b/versioning.md @@ -53,7 +53,7 @@ There is no mandated timeline for major versions. They only occur when we need t ### CI version scheme -* Continuous integration versions also exist, and are versioned off of alpha and beta releases. X.Y.Z-alpha.W.C+aaaa is C commits after X.Y.Z-alpha.W, with an additional +aaaa build suffix added; X.Y.Z-beta.C+bbbb is C commits after X.Y.Z-beta, with an additional +bbbb build suffix added. +* Continuous integration versions also exist, and are versioned off of alpha and beta releases. X.Y.Z-alpha.W.C+aaaa is C commits after X.Y.Z-alpha.W, with an additional +aaaa build suffix added; X.Y.Z-beta.C+bbbb is C commits after X.Y.Z-beta, with an additional +bbbb build suffix added. Furthermore, builds that are built off of a dirty build tree, (with things in the tree that are not checked it,) it will be appended with -dirty. ## Release versions as related to API versions -- cgit v1.2.3 From 37361519a63b34464956e92cd78ed426c68ff325 Mon Sep 17 00:00:00 2001 From: Isaac Hollander McCreery Date: Mon, 2 Nov 2015 14:54:11 -0800 Subject: Versioned beta releases --- releasing.md | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/releasing.md b/releasing.md index fad957b6..f430b3a3 100644 --- a/releasing.md +++ b/releasing.md @@ -56,14 +56,20 @@ from, and other prerequisites. * Alpha releases (`vX.Y.0-alpha.W`) are cut directly from `master`. * Alpha releases don't require anything besides green tests, (see below). -* Official releases (`vX.Y.Z`) are cut from their respective release branch, +* Beta releases (`vX.Y.Z-beta.W`) are cut from their respective release branch, `release-X.Y`. * Make sure all necessary cherry picks have been resolved. You should ensure that all outstanding cherry picks have been reviewed and merged and the branch validated on Jenkins. See [Cherry Picks](cherry-picks.md) for more information on how to manage cherry picks prior to cutting the release. + * Beta releases also require green tests, (see below). +* Official releases (`vX.Y.Z`) are cut from their respective release branch, + `release-X.Y`. + * Official releases should be similar or identical to their respective beta + releases, so have a look at the cherry picks that have been merged since + the beta release and question everything you find. * Official releases also require green tests, (see below). -* New release series are also cut direclty from `master`. +* New release series are also cut directly from `master`. * **This is a big deal!** If you're reading this doc for the first time, you probably shouldn't be doing this release, and should talk to someone on the release team. -- cgit v1.2.3 From ab925644290c51d9491cb698bc1f4991af5d6039 Mon Sep 17 00:00:00 2001 From: Isaac Hollander McCreery Date: Mon, 2 Nov 2015 15:38:57 -0800 Subject: Update docs and prompts for better dry-runs and no more versionizing docs --- releasing.md | 68 ++++++++++++++++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 59 insertions(+), 9 deletions(-) diff --git a/releasing.md b/releasing.md index f430b3a3..671ab8af 100644 --- a/releasing.md +++ b/releasing.md @@ -148,12 +148,49 @@ then, run release/cut-official-release.sh "${VER}" "${GITHASH}" ``` -This will: +This will do a dry run of: 1. mark the `vX.Y.0-alpha.W` tag at the given git hash; 1. prompt you to do the remainder of the work, including building the appropriate binaries and pushing them to the appropriate places. +If you're satisfied with the result, run + +```console +release/cut-official-release.sh "${VER}" "${GITHASH}" --no-dry-run +``` + +and follow the instructions. + +#### Cutting an beta release (`vX.Y.Z-beta.W`) + +Figure out what version you're cutting, and + +```console +export VER="vX.Y.Z-beta.W" +``` + +then, run + +```console +release/cut-official-release.sh "${VER}" "${GITHASH}" +``` + +This will do a dry run of: + +1. do a series of commits on the release branch for `vX.Y.Z-beta`; +1. mark the `vX.Y.Z-beta` tag at the beta version commit; +1. prompt you to do the remainder of the work, including building the + appropriate binaries and pushing them to the appropriate places. + +If you're satisfied with the result, run + +```console +release/cut-official-release.sh "${VER}" "${GITHASH}" --no-dry-run +``` + +and follow the instructions. + #### Cutting an official release (`vX.Y.Z`) Figure out what version you're cutting, and @@ -168,18 +205,24 @@ then, run release/cut-official-release.sh "${VER}" "${GITHASH}" ``` -This will: +This will do a dry run of: -1. do a series of commits on the branch for `vX.Y.Z`, including versionizing - the documentation and doing the release version commit; +1. do a series of commits on the branch for `vX.Y.Z`; 1. mark the `vX.Y.Z` tag at the release version commit; 1. do a series of commits on the branch for `vX.Y.(Z+1)-beta` on top of the - previous commits, including versionizing the documentation and doing the - beta version commit; + previous commits; 1. mark the `vX.Y.(Z+1)-beta` tag at the beta version commit; 1. prompt you to do the remainder of the work, including building the appropriate binaries and pushing them to the appropriate places. +If you're satisfied with the result, run + +```console +release/cut-official-release.sh "${VER}" "${GITHASH}" --no-dry-run +``` + +and follow the instructions. + #### Branching a new release series (`vX.Y`) Once again, **this is a big deal!** If you're reading this doc for the first @@ -198,16 +241,23 @@ then, run release/cut-official-release.sh "${VER}" "${GITHASH}" ``` -This will: +This will do a dry run of: 1. mark the `vX.(Y+1).0-alpha.0` tag at the given git hash on `master`; 1. fork a new branch `release-X.Y` off of `master` at the given git hash; -1. do a series of commits on the branch for `vX.Y.0-beta`, including versionizing - the documentation and doing the release version commit; +1. do a series of commits on the branch for `vX.Y.0-beta`; 1. mark the `vX.Y.0-beta` tag at the beta version commit; 1. prompt you to do the remainder of the work, including building the appropriate binaries and pushing them to the appropriate places. +If you're satisfied with the result, run + +```console +release/cut-official-release.sh "${VER}" "${GITHASH}" --no-dry-run +``` + +and follow the instructions. + ### Publishing binaries and release notes The script you ran above will prompt you to take any remaining steps, including -- cgit v1.2.3 From de53b2aec614a1f98bea17159be3670ac969b6e0 Mon Sep 17 00:00:00 2001 From: Isaac Hollander McCreery Date: Mon, 2 Nov 2015 14:54:11 -0800 Subject: Versioned beta releases --- versioning.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/versioning.md b/versioning.md index a189d0cf..7b63059c 100644 --- a/versioning.md +++ b/versioning.md @@ -43,9 +43,9 @@ Legend: ### Minor version scheme and timeline * Kube X.Y.0-alpha.W, W > 0: Alpha releases are released roughly every two weeks directly from the master branch. No cherrypick releases. If there is a critical bugfix, a new release from master can be created ahead of schedule. -* Kube X.Y.Z-beta: When master is feature-complete for Kube X.Y, we will cut the release-X.Y branch 2 weeks prior to the desired X.Y.0 date and cherrypick only PRs essential to X.Y. This cut will be marked as X.Y.0-beta, and master will be revved to X.Y+1.0-alpha.0. -* Kube X.Y.0: Final release, cut from the release-X.Y branch cut two weeks prior. X.Y.1-beta will be tagged at the same commit on the same branch. X.Y.0 occur 3 to 4 months after X.Y-1.0. -* Kube X.Y.Z, Z > 0: [Patch releases](#patches) are released as we cherrypick commits into the release-X.Y branch, (which is at X.Y.Z-beta,) as needed. X.Y.Z is cut straight from the release-X.Y branch, and X.Y.Z+1-beta is tagged on the same commit. +* Kube X.Y.Z-beta.W: When master is feature-complete for Kube X.Y, we will cut the release-X.Y branch 2 weeks prior to the desired X.Y.0 date and cherrypick only PRs essential to X.Y. This cut will be marked as X.Y.0-beta.0, and master will be revved to X.Y+1.0-alpha.0. If we're not satisfied with X.Y.0-beta.0, we'll release other beta releases, (X.Y.0-beta.W | W > 0) as necessary. +* Kube X.Y.0: Final release, cut from the release-X.Y branch cut two weeks prior. X.Y.1-beta.0 will be tagged at the same commit on the same branch. X.Y.0 occur 3 to 4 months after X.Y-1.0. +* Kube X.Y.Z, Z > 0: [Patch releases](#patches) are released as we cherrypick commits into the release-X.Y branch, (which is at X.Y.Z-beta.W,) as needed. X.Y.Z is cut straight from the release-X.Y branch, and X.Y.Z+1-beta.0 is tagged on the same commit. ### Major version timeline @@ -53,7 +53,7 @@ There is no mandated timeline for major versions. They only occur when we need t ### CI version scheme -* Continuous integration versions also exist, and are versioned off of alpha and beta releases. X.Y.Z-alpha.W.C+aaaa is C commits after X.Y.Z-alpha.W, with an additional +aaaa build suffix added; X.Y.Z-beta.C+bbbb is C commits after X.Y.Z-beta, with an additional +bbbb build suffix added. Furthermore, builds that are built off of a dirty build tree, (with things in the tree that are not checked it,) it will be appended with -dirty. +* Continuous integration versions also exist, and are versioned off of alpha and beta releases. X.Y.Z-alpha.W.C+aaaa is C commits after X.Y.Z-alpha.W, with an additional +aaaa build suffix added; X.Y.Z-beta.W.C+bbbb is C commits after X.Y.Z-beta.W, with an additional +bbbb build suffix added. Furthermore, builds that are built off of a dirty build tree, (with things in the tree that are not checked it,) it will be appended with -dirty. ## Release versions as related to API versions -- cgit v1.2.3 From b6745eb538ee4515b08a87a7573a9319b3d2460c Mon Sep 17 00:00:00 2001 From: Isaac Hollander McCreery Date: Tue, 3 Nov 2015 09:26:01 -0800 Subject: Fix releasing clause about cutting beta releases --- releasing.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/releasing.md b/releasing.md index 671ab8af..238f3791 100644 --- a/releasing.md +++ b/releasing.md @@ -74,7 +74,7 @@ from, and other prerequisites. probably shouldn't be doing this release, and should talk to someone on the release team. * New release series cut a new release branch, `release-X.Y`, off of - `master`, and also release the first beta in the series, `vX.Y.0-beta`. + `master`, and also release the first beta in the series, `vX.Y.0-beta.0`. * Every change in the `vX.Y` series from this point on will have to be cherry picked, so be sure you want to do this before proceeding. * You should still look for green tests, (see below). -- cgit v1.2.3 From 06e6f72355b8e5bf1700e966a322453186c0fae4 Mon Sep 17 00:00:00 2001 From: Isaac Hollander McCreery Date: Tue, 3 Nov 2015 09:42:49 -0800 Subject: Clarify -dirty language, and add --no-dry-run to usage --- versioning.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/versioning.md b/versioning.md index 7b63059c..341e3b36 100644 --- a/versioning.md +++ b/versioning.md @@ -51,9 +51,9 @@ Legend: There is no mandated timeline for major versions. They only occur when we need to start the clock on deprecating features. A given major version should be the latest major version for at least one year from its original release date. -### CI version scheme +### CI and dev version scheme -* Continuous integration versions also exist, and are versioned off of alpha and beta releases. X.Y.Z-alpha.W.C+aaaa is C commits after X.Y.Z-alpha.W, with an additional +aaaa build suffix added; X.Y.Z-beta.W.C+bbbb is C commits after X.Y.Z-beta.W, with an additional +bbbb build suffix added. Furthermore, builds that are built off of a dirty build tree, (with things in the tree that are not checked it,) it will be appended with -dirty. +* Continuous integration versions also exist, and are versioned off of alpha and beta releases. X.Y.Z-alpha.W.C+aaaa is C commits after X.Y.Z-alpha.W, with an additional +aaaa build suffix added; X.Y.Z-beta.W.C+bbbb is C commits after X.Y.Z-beta.W, with an additional +bbbb build suffix added. Furthermore, builds that are built off of a dirty build tree, (during development, with things in the tree that are not checked it,) it will be appended with -dirty. ## Release versions as related to API versions -- cgit v1.2.3 From 33c86129c946fa5cf023494608b305404ea771a9 Mon Sep 17 00:00:00 2001 From: Chao Xu Date: Wed, 4 Nov 2015 14:07:23 -0800 Subject: add a guide on how to create an API group --- adding-an-APIGroup.md | 81 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 81 insertions(+) create mode 100644 adding-an-APIGroup.md diff --git a/adding-an-APIGroup.md b/adding-an-APIGroup.md new file mode 100644 index 00000000..6db23198 --- /dev/null +++ b/adding-an-APIGroup.md @@ -0,0 +1,81 @@ + + + + +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest 1.0.x release of this document can be found +[here](http://releases.k8s.io/release-1.0/docs/devel/adding-an-APIGroup.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- + + + + + +Adding an API Group +=============== + +This document includes the steps to add an API group. You may also want to take a look at PR [#16621](https://github.com/kubernetes/kubernetes/pull/16621) and PR [#13146](https://github.com/kubernetes/kubernetes/pull/13146), which add API groups. + +Please also read about [API conventions](api-conventions.md) and [API changes](api_changes.md) before adding an API group. + +### Your core group package: + +1. creaet a folder in pkg/apis to hold you group. Create types.go in pkg/apis/\/ and pkg/apis/\/\/ to define API objects in your group. + +2. create pkg/apis/\/{register.go, \/register.go} to register this group's API objects to the scheme; + +3. add a pkg/apis/\/install/install.go, which is responsible for adding the group to the `latest` package, so that other packages can access the group's meta through `latest.Group`. You need to import this `install` package in {pkg/master, pkg/client/unversioned, cmd/kube-version-change}/import_known_versions.go, if you want to make your group accessible to other packages in the kube-apiserver binary, binaries that uses the client package, or the kube-version-change tool. + +### Scripts changes and auto-generated code: + +1. Generate conversions and deep-copies: + + 1. add your "group/" or "group/version" into hack/after-build/{update-generated-conversions.sh, update-generated-deep-copies.sh, verify-generated-conversions.sh, verify-generated-deep-copies.sh}; + 2. run hack/update-generated-conversions.sh, hack/update-generated-deep-copies.sh. + +2. Generate files for Ugorji codec: + + 1. touch types.generated.go in pkg/apis/\{/, \}, and run hack/update-codecgen.sh. + +### Client (optional): + +We are overhauling pkg/client, so this section might be outdated. Currently, to add your group to the client package, you need to + +1. create pkg/client/unversioned/\.go, define a group client interface and implement the client. You can take pkg/client/unversioned/extensions.go as a reference. + +2. add the group client interface to the `Interface` in pkg/client/unversioned/client.go and add method to fetch the interface. Again, you can take how we add the Extensions group there as an example. + +3. if you need to support the group in kubectl, you'll also need to modify pkg/kubectl/cmd/util/factory.go. + +### Make the group/version selectable in unit tests (optional): + +1. add your group in pkg/api/testapi/testapi.go, then you can access the group in tests through testapi.\; + +2. add your "group/version" to `KUBE_API_VERSIONS` and `KUBE_TEST_API_VERSIONS` in hack/test-go.sh. + + + + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/adding-an-APIGroup.md?pixel)]() + -- cgit v1.2.3 From 342265e8c14392f135363c04b2a7bacbbc715068 Mon Sep 17 00:00:00 2001 From: Chao Xu Date: Wed, 4 Nov 2015 15:52:18 -0800 Subject: address lavalamp's comment --- adding-an-APIGroup.md | 28 ++++++++++++++++------------ 1 file changed, 16 insertions(+), 12 deletions(-) diff --git a/adding-an-APIGroup.md b/adding-an-APIGroup.md index 6db23198..f6bf99a2 100644 --- a/adding-an-APIGroup.md +++ b/adding-an-APIGroup.md @@ -40,38 +40,42 @@ Please also read about [API conventions](api-conventions.md) and [API changes](a ### Your core group package: -1. creaet a folder in pkg/apis to hold you group. Create types.go in pkg/apis/\/ and pkg/apis/\/\/ to define API objects in your group. +We plan on improving the way the types are factored in the future; see [#16062](https://github.com/kubernetes/kubernetes/pull/16062) for the directions in which this might evolve. -2. create pkg/apis/\/{register.go, \/register.go} to register this group's API objects to the scheme; +1. Create a folder in pkg/apis to hold you group. Create types.go in pkg/apis/``/ and pkg/apis/``/``/ to define API objects in your group; -3. add a pkg/apis/\/install/install.go, which is responsible for adding the group to the `latest` package, so that other packages can access the group's meta through `latest.Group`. You need to import this `install` package in {pkg/master, pkg/client/unversioned, cmd/kube-version-change}/import_known_versions.go, if you want to make your group accessible to other packages in the kube-apiserver binary, binaries that uses the client package, or the kube-version-change tool. +2. Create pkg/apis/``/{register.go, ``/register.go} to register this group's API objects to the encoding/decoding scheme; + +3. Add a pkg/apis/``/install/install.go, which is responsible for adding the group to the `latest` package, so that other packages can access the group's meta through `latest.Group`. You need to import this `install` package in {pkg/master, pkg/client/unversioned, cmd/kube-version-change}/import_known_versions.go, if you want to make your group accessible to other packages in the kube-apiserver binary, binaries that uses the client package, or the kube-version-change tool. + +Step 2 and 3 are mechanical, we plan on autogenerate these using the cmd/libs/go2idl/ tool. ### Scripts changes and auto-generated code: 1. Generate conversions and deep-copies: - 1. add your "group/" or "group/version" into hack/after-build/{update-generated-conversions.sh, update-generated-deep-copies.sh, verify-generated-conversions.sh, verify-generated-deep-copies.sh}; - 2. run hack/update-generated-conversions.sh, hack/update-generated-deep-copies.sh. + 1. Add your "group/" or "group/version" into hack/after-build/{update-generated-conversions.sh, update-generated-deep-copies.sh, verify-generated-conversions.sh, verify-generated-deep-copies.sh}; + 2. Run hack/update-generated-conversions.sh, hack/update-generated-deep-copies.sh. 2. Generate files for Ugorji codec: - 1. touch types.generated.go in pkg/apis/\{/, \}, and run hack/update-codecgen.sh. + 1. Touch types.generated.go in pkg/apis/``{/, ``}, and run hack/update-codecgen.sh. ### Client (optional): -We are overhauling pkg/client, so this section might be outdated. Currently, to add your group to the client package, you need to +We are overhauling pkg/client, so this section might be outdated; see [#15730](https://github.com/kubernetes/kubernetes/pull/15730) for how the client package might evolve. Currently, to add your group to the client package, you need to -1. create pkg/client/unversioned/\.go, define a group client interface and implement the client. You can take pkg/client/unversioned/extensions.go as a reference. +1. Create pkg/client/unversioned/``.go, define a group client interface and implement the client. You can take pkg/client/unversioned/extensions.go as a reference. -2. add the group client interface to the `Interface` in pkg/client/unversioned/client.go and add method to fetch the interface. Again, you can take how we add the Extensions group there as an example. +2. Add the group client interface to the `Interface` in pkg/client/unversioned/client.go and add method to fetch the interface. Again, you can take how we add the Extensions group there as an example. -3. if you need to support the group in kubectl, you'll also need to modify pkg/kubectl/cmd/util/factory.go. +3. If you need to support the group in kubectl, you'll also need to modify pkg/kubectl/cmd/util/factory.go. ### Make the group/version selectable in unit tests (optional): -1. add your group in pkg/api/testapi/testapi.go, then you can access the group in tests through testapi.\; +1. Add your group in pkg/api/testapi/testapi.go, then you can access the group in tests through testapi.``; -2. add your "group/version" to `KUBE_API_VERSIONS` and `KUBE_TEST_API_VERSIONS` in hack/test-go.sh. +2. Add your "group/version" to `KUBE_API_VERSIONS` and `KUBE_TEST_API_VERSIONS` in hack/test-go.sh. -- cgit v1.2.3 From 3f40d2080709f696a506daf99af5bccdc2bd0f54 Mon Sep 17 00:00:00 2001 From: Chao Xu Date: Thu, 5 Nov 2015 15:44:20 -0800 Subject: address timstclair's comments --- adding-an-APIGroup.md | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/adding-an-APIGroup.md b/adding-an-APIGroup.md index f6bf99a2..e5f08552 100644 --- a/adding-an-APIGroup.md +++ b/adding-an-APIGroup.md @@ -44,9 +44,9 @@ We plan on improving the way the types are factored in the future; see [#16062]( 1. Create a folder in pkg/apis to hold you group. Create types.go in pkg/apis/``/ and pkg/apis/``/``/ to define API objects in your group; -2. Create pkg/apis/``/{register.go, ``/register.go} to register this group's API objects to the encoding/decoding scheme; +2. Create pkg/apis/``/{register.go, ``/register.go} to register this group's API objects to the encoding/decoding scheme (e.g., [pkg/apis/extensions/register.go](../../pkg/apis/extensions/register.go) and [pkg/apis/extensions/v1beta1/register.go](../../pkg/apis/extensions/v1beta1/register.go); -3. Add a pkg/apis/``/install/install.go, which is responsible for adding the group to the `latest` package, so that other packages can access the group's meta through `latest.Group`. You need to import this `install` package in {pkg/master, pkg/client/unversioned, cmd/kube-version-change}/import_known_versions.go, if you want to make your group accessible to other packages in the kube-apiserver binary, binaries that uses the client package, or the kube-version-change tool. +3. Add a pkg/apis/``/install/install.go, which is responsible for adding the group to the `latest` package, so that other packages can access the group's meta through `latest.Group`. You probably only need to change the name of group and version in the [example](../../pkg/apis/extensions/install/install.go)). You need to import this `install` package in {pkg/master, pkg/client/unversioned, cmd/kube-version-change}/import_known_versions.go, if you want to make your group accessible to other packages in the kube-apiserver binary, binaries that uses the client package, or the kube-version-change tool. Step 2 and 3 are mechanical, we plan on autogenerate these using the cmd/libs/go2idl/ tool. @@ -59,7 +59,8 @@ Step 2 and 3 are mechanical, we plan on autogenerate these using the cmd/libs/go 2. Generate files for Ugorji codec: - 1. Touch types.generated.go in pkg/apis/``{/, ``}, and run hack/update-codecgen.sh. + 1. Touch types.generated.go in pkg/apis/``{/, ``}; + 2. Run hack/update-codecgen.sh. ### Client (optional): -- cgit v1.2.3 From 5978c29f80989ee1e10599cfd1adca012bc89f67 Mon Sep 17 00:00:00 2001 From: Isaac Hollander McCreery Date: Fri, 6 Nov 2015 10:32:05 -0800 Subject: Use ./ notation --- releasing.md | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/releasing.md b/releasing.md index 238f3791..60609f0d 100644 --- a/releasing.md +++ b/releasing.md @@ -145,7 +145,7 @@ export VER="vX.Y.0-alpha.W" then, run ```console -release/cut-official-release.sh "${VER}" "${GITHASH}" +./release/cut-official-release.sh "${VER}" "${GITHASH}" ``` This will do a dry run of: @@ -157,7 +157,7 @@ This will do a dry run of: If you're satisfied with the result, run ```console -release/cut-official-release.sh "${VER}" "${GITHASH}" --no-dry-run +./release/cut-official-release.sh "${VER}" "${GITHASH}" --no-dry-run ``` and follow the instructions. @@ -173,7 +173,7 @@ export VER="vX.Y.Z-beta.W" then, run ```console -release/cut-official-release.sh "${VER}" "${GITHASH}" +./release/cut-official-release.sh "${VER}" "${GITHASH}" ``` This will do a dry run of: @@ -186,7 +186,7 @@ This will do a dry run of: If you're satisfied with the result, run ```console -release/cut-official-release.sh "${VER}" "${GITHASH}" --no-dry-run +./release/cut-official-release.sh "${VER}" "${GITHASH}" --no-dry-run ``` and follow the instructions. @@ -202,7 +202,7 @@ export VER="vX.Y.Z" then, run ```console -release/cut-official-release.sh "${VER}" "${GITHASH}" +./release/cut-official-release.sh "${VER}" "${GITHASH}" ``` This will do a dry run of: @@ -218,7 +218,7 @@ This will do a dry run of: If you're satisfied with the result, run ```console -release/cut-official-release.sh "${VER}" "${GITHASH}" --no-dry-run +./release/cut-official-release.sh "${VER}" "${GITHASH}" --no-dry-run ``` and follow the instructions. @@ -238,7 +238,7 @@ export VER="vX.Y" then, run ```console -release/cut-official-release.sh "${VER}" "${GITHASH}" +./release/cut-official-release.sh "${VER}" "${GITHASH}" ``` This will do a dry run of: @@ -253,7 +253,7 @@ This will do a dry run of: If you're satisfied with the result, run ```console -release/cut-official-release.sh "${VER}" "${GITHASH}" --no-dry-run +./release/cut-official-release.sh "${VER}" "${GITHASH}" --no-dry-run ``` and follow the instructions. -- cgit v1.2.3 From 889fa90febe3e133f0bbfda871e3a7ff45e25d02 Mon Sep 17 00:00:00 2001 From: Isaac Hollander McCreery Date: Fri, 6 Nov 2015 11:35:16 -0800 Subject: Cleanup for versioning --- releasing.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/releasing.md b/releasing.md index 238f3791..aef0e168 100644 --- a/releasing.md +++ b/releasing.md @@ -178,8 +178,8 @@ release/cut-official-release.sh "${VER}" "${GITHASH}" This will do a dry run of: -1. do a series of commits on the release branch for `vX.Y.Z-beta`; -1. mark the `vX.Y.Z-beta` tag at the beta version commit; +1. do a series of commits on the release branch for `vX.Y.Z-beta.W`; +1. mark the `vX.Y.Z-beta.W` tag at the beta version commit; 1. prompt you to do the remainder of the work, including building the appropriate binaries and pushing them to the appropriate places. @@ -209,9 +209,9 @@ This will do a dry run of: 1. do a series of commits on the branch for `vX.Y.Z`; 1. mark the `vX.Y.Z` tag at the release version commit; -1. do a series of commits on the branch for `vX.Y.(Z+1)-beta` on top of the +1. do a series of commits on the branch for `vX.Y.(Z+1)-beta.0` on top of the previous commits; -1. mark the `vX.Y.(Z+1)-beta` tag at the beta version commit; +1. mark the `vX.Y.(Z+1)-beta.0` tag at the beta version commit; 1. prompt you to do the remainder of the work, including building the appropriate binaries and pushing them to the appropriate places. @@ -245,8 +245,8 @@ This will do a dry run of: 1. mark the `vX.(Y+1).0-alpha.0` tag at the given git hash on `master`; 1. fork a new branch `release-X.Y` off of `master` at the given git hash; -1. do a series of commits on the branch for `vX.Y.0-beta`; -1. mark the `vX.Y.0-beta` tag at the beta version commit; +1. do a series of commits on the branch for `vX.Y.0-beta.0`; +1. mark the `vX.Y.0-beta.0` tag at the beta version commit; 1. prompt you to do the remainder of the work, including building the appropriate binaries and pushing them to the appropriate places. -- cgit v1.2.3 From 2dce50054d005a77e8007b348f0db065b99217bd Mon Sep 17 00:00:00 2001 From: derekwaynecarr Date: Tue, 3 Nov 2015 11:38:52 -0500 Subject: Add event correlation to client --- event_compression.md | 35 +++++++++++++++++++++++++++++++---- 1 file changed, 31 insertions(+), 4 deletions(-) diff --git a/event_compression.md b/event_compression.md index b9861717..e1a95165 100644 --- a/event_compression.md +++ b/event_compression.md @@ -35,14 +35,23 @@ Documentation for other releases can be found at This document captures the design of event compression. - ## Background -Kubernetes components can get into a state where they generate tons of events which are identical except for the timestamp. For example, when pulling a non-existing image, Kubelet will repeatedly generate `image_not_existing` and `container_is_waiting` events until upstream components correct the image. When this happens, the spam from the repeated events makes the entire event mechanism useless. It also appears to cause memory pressure in etcd (see [#3853](http://issue.k8s.io/3853)). +Kubernetes components can get into a state where they generate tons of events. + +The events can be categorized in one of two ways: + +1. same - the event is identical to previous events except it varies only on timestamp +2. similar - the event is identical to previous events except it varies on timestamp and message + +For example, when pulling a non-existing image, Kubelet will repeatedly generate `image_not_existing` and `container_is_waiting` events until upstream components correct the image. When this happens, the spam from the repeated events makes the entire event mechanism useless. It also appears to cause memory pressure in etcd (see [#3853](http://issue.k8s.io/3853)). + +The goal is introduce event counting to increment same events, and event aggregation to collapse similar events. ## Proposal -Each binary that generates events (for example, `kubelet`) should keep track of previously generated events so that it can collapse recurring events into a single event instead of creating a new instance for each new event. +Each binary that generates events (for example, `kubelet`) should keep track of previously generated events so that it can collapse recurring events into a single event instead of creating a new instance for each new event. In addition, if many similar events are +created, events should be aggregated into a single event to reduce spam. Event compression should be best effort (not guaranteed). Meaning, in the worst case, `n` identical (minus timestamp) events may still result in `n` event entries. @@ -61,6 +70,24 @@ Instead of a single Timestamp, each event object [contains](http://releases.k8s. Each binary that generates events: * Maintains a historical record of previously generated events: * Implemented with ["Least Recently Used Cache"](https://github.com/golang/groupcache/blob/master/lru/lru.go) in [`pkg/client/record/events_cache.go`](../../pkg/client/record/events_cache.go). + * Implemented behind an `EventCorrelator` that manages two subcomponents: `EventAggregator` and `EventLogger` + * The `EventCorrelator` observes all incoming events and lets each subcomponent visit and modify the event in turn. + * The `EventAggregator` runs an aggregation function over each event. This function buckets each event based on an `aggregateKey`, + and identifies the event uniquely with a `localKey` in that bucket. + * The default aggregation function groups similar events that differ only by `event.Message`. It's `localKey` is `event.Message` and its aggregate key is produced by joining: + * `event.Source.Component` + * `event.Source.Host` + * `event.InvolvedObject.Kind` + * `event.InvolvedObject.Namespace` + * `event.InvolvedObject.Name` + * `event.InvolvedObject.UID` + * `event.InvolvedObject.APIVersion` + * `event.Reason` + * If the `EventAggregator` observes a similar event produced 10 times in a 10 minute window, it drops the event that was provided as + input and creates a new event that differs only on the message. The message denotes that this event is used to group similar events + that matched on reason. This aggregated `Event` is then used in the event processing sequence. + * The `EventLogger` observes the event out of `EventAggregation` and tracks the number of times it has observed that event previously + by incrementing a key in a cache associated with that matching event. * The key in the cache is generated from the event object minus timestamps/count/transient fields, specifically the following events fields are used to construct a unique key for an event: * `event.Source.Component` * `event.Source.Host` @@ -71,7 +98,7 @@ Each binary that generates events: * `event.InvolvedObject.APIVersion` * `event.Reason` * `event.Message` - * The LRU cache is capped at 4096 events. That means if a component (e.g. kubelet) runs for a long period of time and generates tons of unique events, the previously generated events cache will not grow unchecked in memory. Instead, after 4096 unique events are generated, the oldest events are evicted from the cache. + * The LRU cache is capped at 4096 events for both `EventAggregator` and `EventLogger`. That means if a component (e.g. kubelet) runs for a long period of time and generates tons of unique events, the previously generated events cache will not grow unchecked in memory. Instead, after 4096 unique events are generated, the oldest events are evicted from the cache. * When an event is generated, the previously generated events cache is checked (see [`pkg/client/unversioned/record/event.go`](http://releases.k8s.io/HEAD/pkg/client/unversioned/record/event.go)). * If the key for the new event matches the key for a previously generated event (meaning all of the above fields match between the new event and some previously generated event), then the event is considered to be a duplicate and the existing event entry is updated in etcd: * The new PUT (update) event API is called to update the existing event entry in etcd with the new last seen timestamp and count. -- cgit v1.2.3 From 4a45a50ed69b428987ae2e946f829cfd3ac09c24 Mon Sep 17 00:00:00 2001 From: Janet Kuo Date: Tue, 3 Nov 2015 13:27:24 -0800 Subject: Document how to document --- how-to-doc.md | 171 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 171 insertions(+) create mode 100644 how-to-doc.md diff --git a/how-to-doc.md b/how-to-doc.md new file mode 100644 index 00000000..718aa8c0 --- /dev/null +++ b/how-to-doc.md @@ -0,0 +1,171 @@ + + + + +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest 1.0.x release of this document can be found +[here](http://releases.k8s.io/release-1.0/docs/devel/how-to-doc.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- + + + + + +Document Conventions +==================== + +Updated: 11/3/2015 + +*This document is oriented at users and developers who want to write documents for Kubernetes.* + +**Table of Contents** + + + - [What Are Mungers?](#what-are-mungers) + - [Table of Contents](#table-of-contents) + - [Writing Examples](#writing-examples) + - [Adding Links](#adding-links) + - [Auto-added Mungers](#auto-added-mungers) + - [Unversioned Warning](#unversioned-warning) + - [Is Versioned](#is-versioned) + - [Generate Analytics](#generate-analytics) + + + +## What Are Mungers? + +Mungers are like gofmt for md docs which we use to format documents. To use it, simply place + +``` + + +``` + +in your md files. Note that xxxx is the placeholder for a specific munger. Appropriate content will be generated and inserted between two brackets after you run `hack/update-generated-docs.sh`. See [munger document](../../cmd/mungedocs/) for more details. + + +## Table of Contents + +Instead of writing table of contents by hand, use the TOC munger: + +``` + + +``` + +## Writing Examples + +Sometimes you may want to show the content of certain example files. Use EXAMPLE munger whenever possible: + +``` + + +``` + +This way, you save the time to do the copy-and-paste; what's better, the content won't become out-of-date everytime you update the example file. + +For example, the following munger: + +``` + + +``` + +generates + + +```yaml +apiVersion: v1 +kind: Pod +metadata: + name: nginx + labels: + app: nginx +spec: + containers: + - name: nginx + image: nginx + ports: + - containerPort: 80 +``` + +[Download example](../user-guide/pod.yaml?raw=true) + + +## Adding Links + +Use inline link instead of url at all times. When you add internal links from `docs/` to `docs/` or `examples/`, use relative links; otherwise, use `http://releases.k8s.io/HEAD/`. For example, use: + +``` +[GCE](../getting-started-guides/gce.md) # note that it's under docs/ +[Kubernetes package](http://releases.k8s.io/HEAD/pkg/) # note that it's under pkg/ +[Kubernetes](http://kubernetes.io/) +``` + +and avoid using: + +``` +[GCE](https://github.com/kubernetes/kubernetes/blob/master/docs/getting-started-guides/gce.md) +[Kubernetes package](../../pkg/) +http://kubernetes.io/ +``` + +## Auto-added Mungers + +Some mungers are auto-added. You don't have to add them manually, and `hack/update-generated-docs.sh` does that for you. It's recommended to just read this section as a reference instead of messing up with the following mungers. + +### Unversioned Warning + +UNVERSIONED_WARNING munger inserts unversioned warning which warns the users when they're reading the document from HEAD and informs them where to find the corresponding document for a specific release. + +``` + + + + + + +``` + +### Is Versioned + +IS_VERSIONED munger inserts `IS_VERSIONED` tag in documents in each release, which stops UNVERSIONED_WARNING munger from inserting warning messages. + +``` + + + +``` + +### Generate Analytics + +ANALYTICS munger inserts a Google Anaylytics link for this page. + +``` + + +``` + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/how-to-doc.md?pixel)]() + -- cgit v1.2.3 From 0f458971f63a60aa749ba0133e0e762b58bdca6c Mon Sep 17 00:00:00 2001 From: Janet Kuo Date: Fri, 6 Nov 2015 17:19:21 -0800 Subject: address comments --- how-to-doc.md | 103 +++++++++++++++++++++++++++++++++++++++++----------------- 1 file changed, 74 insertions(+), 29 deletions(-) diff --git a/how-to-doc.md b/how-to-doc.md index 718aa8c0..283cab1f 100644 --- a/how-to-doc.md +++ b/how-to-doc.md @@ -31,8 +31,7 @@ Documentation for other releases can be found at -Document Conventions -==================== +# Document Conventions Updated: 11/3/2015 @@ -41,10 +40,16 @@ Updated: 11/3/2015 **Table of Contents** +- [Document Conventions](#document-conventions) + - [General Concepts](#general-concepts) + - [How to Get a Table of Contents](#how-to-get-a-table-of-contents) + - [How to Write Links](#how-to-write-links) + - [How to Include an Example](#how-to-include-an-example) + - [Misc.](#misc) + - [Code formatting](#code-formatting) + - [Syntax Highlighting](#syntax-highlighting) + - [Headings](#headings) - [What Are Mungers?](#what-are-mungers) - - [Table of Contents](#table-of-contents) - - [Writing Examples](#writing-examples) - - [Adding Links](#adding-links) - [Auto-added Mungers](#auto-added-mungers) - [Unversioned Warning](#unversioned-warning) - [Is Versioned](#is-versioned) @@ -52,46 +57,63 @@ Updated: 11/3/2015 -## What Are Mungers? +## General Concepts -Mungers are like gofmt for md docs which we use to format documents. To use it, simply place +Each document needs to be munged to ensure its format is correct, links are valid, etc. To munge a document, simply run `hack/update-generated-docs.sh`. We verify that all documents have been munged using `hack/verify-generated-docs.sh`. The scripts for munging documents are called mungers, see the [mungers section](#what-are-mungers) below if you're curious about how mungers are implemented or if you want to write one. + +## How to Get a Table of Contents + +Instead of writing table of contents by hand, insert the following code in your md file: ``` - - + + ``` -in your md files. Note that xxxx is the placeholder for a specific munger. Appropriate content will be generated and inserted between two brackets after you run `hack/update-generated-docs.sh`. See [munger document](../../cmd/mungedocs/) for more details. +After running `hack/update-generated-docs.sh`, you'll see a table of contents generated for you, layered based on the headings. +## How to Write Links -## Table of Contents +It's important to follow the rules when writing links. It helps us correctly versionize documents for each release. -Instead of writing table of contents by hand, use the TOC munger: +Use inline links instead of urls at all times. When you add internal links to `docs/` or `examples/`, use relative links; otherwise, use `http://releases.k8s.io/HEAD/`. For example, avoid using: ``` - - +[GCE](https://github.com/kubernetes/kubernetes/blob/master/docs/getting-started-guides/gce.md) # note that it's under docs/ +[Kubernetes package](../../pkg/) # note that it's under pkg/ +http://kubernetes.io/ # external link ``` -## Writing Examples +Instead, use: -Sometimes you may want to show the content of certain example files. Use EXAMPLE munger whenever possible: +``` +[GCE](../getting-started-guides/gce.md) # note that it's under docs/ +[Kubernetes package](http://releases.k8s.io/HEAD/pkg/) # note that it's under pkg/ +[Kubernetes](http://kubernetes.io/) # external link +``` + +The above example generates the following links: [GCE](../getting-started-guides/gce.md), [Kubernetes package](http://releases.k8s.io/HEAD/pkg/), and [Kubernetes](http://kubernetes.io/). + +## How to Include an Example + +While writing examples, you may want to show the content of certain example files (e.g. [pod.yaml](../user-guide/pod.yaml)). In this case, insert the following code in the md file: ``` ``` -This way, you save the time to do the copy-and-paste; what's better, the content won't become out-of-date everytime you update the example file. +Note that you should replace `path/to/file` with the relative path to the example file. Then `hack/update-generated-docs.sh` will generate a code block with the content of the specified file, and a link to download it. This way, you save the time to do the copy-and-paste; what's better, the content won't become out-of-date every time you update the example file. -For example, the following munger: +For example, the following: ``` ``` -generates +generates the following after `hack/update-generated-docs.sh`: + ```yaml @@ -112,27 +134,50 @@ spec: [Download example](../user-guide/pod.yaml?raw=true) -## Adding Links +## Misc. -Use inline link instead of url at all times. When you add internal links from `docs/` to `docs/` or `examples/`, use relative links; otherwise, use `http://releases.k8s.io/HEAD/`. For example, use: +### Code formatting + +Wrap a span of code with single backticks (`` ` ``). To format multiple lines of code as its own code block, use triple backticks (```` ``` ````). + +### Syntax Highlighting + +Adding syntax highlighting to code blocks improves readability. To do so, in your fenced block, add an optional language identifier. Some useful identifier includes `yaml`, `console` (for console output), and `sh` (for shell quote format). Note that in a console output, put `$ ` at the beginning of each command and put nothing at the beginning of the output. Here's an example of console code block: ``` -[GCE](../getting-started-guides/gce.md) # note that it's under docs/ -[Kubernetes package](http://releases.k8s.io/HEAD/pkg/) # note that it's under pkg/ -[Kubernetes](http://kubernetes.io/) +```console + +$ kubectl create -f docs/user-guide/pod.yaml +pod "foo" created + +```  +``` + +which renders as: + +```console +$ kubectl create -f docs/user-guide/pod.yaml +pod "foo" created ``` -and avoid using: +### Headings + +Add a single `#` before the document title to create a title heading, and add `##` to the next level of section title, and so on. Note that the number of `#` will determine the size of the heading. + +## What Are Mungers? + +Mungers are like gofmt for md docs which we use to format documents. To use it, simply place ``` -[GCE](https://github.com/kubernetes/kubernetes/blob/master/docs/getting-started-guides/gce.md) -[Kubernetes package](../../pkg/) -http://kubernetes.io/ + + ``` +in your md files. Note that xxxx is the placeholder for a specific munger. Appropriate content will be generated and inserted between two brackets after you run `hack/update-generated-docs.sh`. See [munger document](http://releases.k8s.io/HEAD/cmd/mungedocs/) for more details. + ## Auto-added Mungers -Some mungers are auto-added. You don't have to add them manually, and `hack/update-generated-docs.sh` does that for you. It's recommended to just read this section as a reference instead of messing up with the following mungers. +After running `hack/update-generated-docs.sh`, you may see some code / mungers in your md file that are auto-added. You don't have to add them manually. It's recommended to just read this section as a reference instead of messing up with the following mungers. ### Unversioned Warning -- cgit v1.2.3 From 43a4eb1d214bbb7c1a685937beba3653b4c6e8aa Mon Sep 17 00:00:00 2001 From: Tim Hockin Date: Tue, 3 Nov 2015 10:17:57 -0800 Subject: Run update-gendocs --- api-group.md | 4 ++-- apiserver-watch.md | 4 ++-- autoscaling.md | 4 ++-- compute-resource-metrics-api.md | 4 ++-- deployment.md | 4 ++-- federation.md | 4 ++-- high-availability.md | 4 ++-- initial-resources.md | 4 ++-- job.md | 4 ++-- kubemark.md | 4 ++-- metrics-plumbing.md | 4 ++-- pod-security-context.md | 4 ++-- rescheduler.md | 4 ++-- resource-qos.md | 4 ++-- scalability-testing.md | 4 ++-- selinux.md | 4 ++-- volumes.md | 4 ++-- 17 files changed, 34 insertions(+), 34 deletions(-) diff --git a/api-group.md b/api-group.md index ec79efe0..9b40bb81 100644 --- a/api-group.md +++ b/api-group.md @@ -19,8 +19,8 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. -The latest 1.0.x release of this document can be found -[here](http://releases.k8s.io/release-1.0/docs/proposals/api-group.md). +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/proposals/api-group.md). Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/apiserver-watch.md b/apiserver-watch.md index 3e92d1e0..f2011f13 100644 --- a/apiserver-watch.md +++ b/apiserver-watch.md @@ -19,8 +19,8 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. -The latest 1.0.x release of this document can be found -[here](http://releases.k8s.io/release-1.0/docs/proposals/apiserver-watch.md). +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/proposals/apiserver-watch.md). Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/autoscaling.md b/autoscaling.md index 97fa672d..806d1ece 100644 --- a/autoscaling.md +++ b/autoscaling.md @@ -19,8 +19,8 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. -The latest 1.0.x release of this document can be found -[here](http://releases.k8s.io/release-1.0/docs/proposals/autoscaling.md). +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/proposals/autoscaling.md). Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/compute-resource-metrics-api.md b/compute-resource-metrics-api.md index ba7b4e28..fb4ed908 100644 --- a/compute-resource-metrics-api.md +++ b/compute-resource-metrics-api.md @@ -19,8 +19,8 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. -The latest 1.0.x release of this document can be found -[here](http://releases.k8s.io/release-1.0/docs/proposals/compute-resource-metrics-api.md). +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/proposals/compute-resource-metrics-api.md). Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/deployment.md b/deployment.md index ab23e69d..c4d4cf88 100644 --- a/deployment.md +++ b/deployment.md @@ -19,8 +19,8 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. -The latest 1.0.x release of this document can be found -[here](http://releases.k8s.io/release-1.0/docs/proposals/deployment.md). +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/proposals/deployment.md). Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/federation.md b/federation.md index 371d9c30..0bf6c618 100644 --- a/federation.md +++ b/federation.md @@ -19,8 +19,8 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. -The latest 1.0.x release of this document can be found -[here](http://releases.k8s.io/release-1.0/docs/proposals/federation.md). +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/proposals/federation.md). Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/high-availability.md b/high-availability.md index 6318921e..696c90be 100644 --- a/high-availability.md +++ b/high-availability.md @@ -19,8 +19,8 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. -The latest 1.0.x release of this document can be found -[here](http://releases.k8s.io/release-1.0/docs/proposals/high-availability.md). +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/proposals/high-availability.md). Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/initial-resources.md b/initial-resources.md index 514c03ff..1eace646 100644 --- a/initial-resources.md +++ b/initial-resources.md @@ -19,8 +19,8 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. -The latest 1.0.x release of this document can be found -[here](http://releases.k8s.io/release-1.0/docs/proposals/initial-resources.md). +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/proposals/initial-resources.md). Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/job.md b/job.md index d3247b1a..6f8befa3 100644 --- a/job.md +++ b/job.md @@ -19,8 +19,8 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. -The latest 1.0.x release of this document can be found -[here](http://releases.k8s.io/release-1.0/docs/proposals/job.md). +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/proposals/job.md). Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/kubemark.md b/kubemark.md index 51ea4375..fb7f0e02 100644 --- a/kubemark.md +++ b/kubemark.md @@ -19,8 +19,8 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. -The latest 1.0.x release of this document can be found -[here](http://releases.k8s.io/release-1.0/docs/proposals/kubemark.md). +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/proposals/kubemark.md). Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/metrics-plumbing.md b/metrics-plumbing.md index f4fffaea..41fbed9b 100644 --- a/metrics-plumbing.md +++ b/metrics-plumbing.md @@ -19,8 +19,8 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. -The latest 1.0.x release of this document can be found -[here](http://releases.k8s.io/release-1.0/docs/proposals/metrics-plumbing.md). +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/proposals/metrics-plumbing.md). Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/pod-security-context.md b/pod-security-context.md index 95e60856..0bf4e78c 100644 --- a/pod-security-context.md +++ b/pod-security-context.md @@ -19,8 +19,8 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. -The latest 1.0.x release of this document can be found -[here](http://releases.k8s.io/release-1.0/docs/proposals/pod-security-context.md). +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/proposals/pod-security-context.md). Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/rescheduler.md b/rescheduler.md index 512064f2..550c2270 100644 --- a/rescheduler.md +++ b/rescheduler.md @@ -19,8 +19,8 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. -The latest 1.0.x release of this document can be found -[here](http://releases.k8s.io/release-1.0/docs/proposals/rescheduler.md). +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/proposals/rescheduler.md). Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/resource-qos.md b/resource-qos.md index c7475586..1f8dacca 100644 --- a/resource-qos.md +++ b/resource-qos.md @@ -19,8 +19,8 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. -The latest 1.0.x release of this document can be found -[here](http://releases.k8s.io/release-1.0/docs/proposals/resource-qos.md). +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/proposals/resource-qos.md). Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/scalability-testing.md b/scalability-testing.md index cf87d84d..edcf5172 100644 --- a/scalability-testing.md +++ b/scalability-testing.md @@ -20,8 +20,8 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. -The latest 1.0.x release of this document can be found -[here](http://releases.k8s.io/release-1.0/docs/proposals/scalability-testing.md). +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/proposals/scalability-testing.md). Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/selinux.md b/selinux.md index c16ab0a5..fd9eb73c 100644 --- a/selinux.md +++ b/selinux.md @@ -19,8 +19,8 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. -The latest 1.0.x release of this document can be found -[here](http://releases.k8s.io/release-1.0/docs/proposals/selinux.md). +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/proposals/selinux.md). Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/volumes.md b/volumes.md index 34daf005..5be43cf5 100644 --- a/volumes.md +++ b/volumes.md @@ -19,8 +19,8 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. -The latest 1.0.x release of this document can be found -[here](http://releases.k8s.io/release-1.0/docs/proposals/volumes.md). +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/proposals/volumes.md). Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). -- cgit v1.2.3 From ff8cdfcde813e5287a833539541a9b5f3206fe5d Mon Sep 17 00:00:00 2001 From: Tim Hockin Date: Tue, 3 Nov 2015 10:17:57 -0800 Subject: Run update-gendocs --- README.md | 4 ++-- adding-an-APIGroup.md | 4 ++-- api-conventions.md | 4 ++-- api_changes.md | 4 ++-- automation.md | 4 ++-- cherry-picks.md | 4 ++-- cli-roadmap.md | 4 ++-- client-libraries.md | 4 ++-- coding-conventions.md | 4 ++-- collab.md | 4 ++-- developer-guides/vagrant.md | 4 ++-- development.md | 4 ++-- e2e-tests.md | 4 ++-- faster_reviews.md | 4 ++-- flaky-tests.md | 4 ++-- getting-builds.md | 4 ++-- how-to-doc.md | 4 ++-- instrumentation.md | 4 ++-- issues.md | 4 ++-- kubectl-conventions.md | 4 ++-- logging.md | 4 ++-- making-release-notes.md | 4 ++-- profiling.md | 4 ++-- pull-requests.md | 4 ++-- releasing.md | 4 ++-- scheduler.md | 4 ++-- scheduler_algorithm.md | 4 ++-- writing-a-getting-started-guide.md | 4 ++-- 28 files changed, 56 insertions(+), 56 deletions(-) diff --git a/README.md b/README.md index 756846ce..87ede398 100644 --- a/README.md +++ b/README.md @@ -19,8 +19,8 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. -The latest 1.0.x release of this document can be found -[here](http://releases.k8s.io/release-1.0/docs/devel/README.md). +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/devel/README.md). Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/adding-an-APIGroup.md b/adding-an-APIGroup.md index e5f08552..afef1456 100644 --- a/adding-an-APIGroup.md +++ b/adding-an-APIGroup.md @@ -19,8 +19,8 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. -The latest 1.0.x release of this document can be found -[here](http://releases.k8s.io/release-1.0/docs/devel/adding-an-APIGroup.md). +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/devel/adding-an-APIGroup.md). Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/api-conventions.md b/api-conventions.md index cf389231..e8aaf612 100644 --- a/api-conventions.md +++ b/api-conventions.md @@ -19,8 +19,8 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. -The latest 1.0.x release of this document can be found -[here](http://releases.k8s.io/release-1.0/docs/devel/api-conventions.md). +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/devel/api-conventions.md). Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/api_changes.md b/api_changes.md index 53dfb014..4bbb5bd4 100644 --- a/api_changes.md +++ b/api_changes.md @@ -19,8 +19,8 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. -The latest 1.0.x release of this document can be found -[here](http://releases.k8s.io/release-1.0/docs/devel/api_changes.md). +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/devel/api_changes.md). Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/automation.md b/automation.md index f01b6158..c21f4ed6 100644 --- a/automation.md +++ b/automation.md @@ -19,8 +19,8 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. -The latest 1.0.x release of this document can be found -[here](http://releases.k8s.io/release-1.0/docs/devel/automation.md). +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/devel/automation.md). Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/cherry-picks.md b/cherry-picks.md index 7cb60465..f407c949 100644 --- a/cherry-picks.md +++ b/cherry-picks.md @@ -19,8 +19,8 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. -The latest 1.0.x release of this document can be found -[here](http://releases.k8s.io/release-1.0/docs/devel/cherry-picks.md). +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/devel/cherry-picks.md). Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/cli-roadmap.md b/cli-roadmap.md index 2b713260..de2f4a43 100644 --- a/cli-roadmap.md +++ b/cli-roadmap.md @@ -19,8 +19,8 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. -The latest 1.0.x release of this document can be found -[here](http://releases.k8s.io/release-1.0/docs/devel/cli-roadmap.md). +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/devel/cli-roadmap.md). Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/client-libraries.md b/client-libraries.md index b63e2d44..22a59d06 100644 --- a/client-libraries.md +++ b/client-libraries.md @@ -19,8 +19,8 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. -The latest 1.0.x release of this document can be found -[here](http://releases.k8s.io/release-1.0/docs/devel/client-libraries.md). +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/devel/client-libraries.md). Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/coding-conventions.md b/coding-conventions.md index 3e3abaf7..df9f63e7 100644 --- a/coding-conventions.md +++ b/coding-conventions.md @@ -19,8 +19,8 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. -The latest 1.0.x release of this document can be found -[here](http://releases.k8s.io/release-1.0/docs/devel/coding-conventions.md). +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/devel/coding-conventions.md). Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/collab.md b/collab.md index 624b3bcb..de2ce10c 100644 --- a/collab.md +++ b/collab.md @@ -19,8 +19,8 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. -The latest 1.0.x release of this document can be found -[here](http://releases.k8s.io/release-1.0/docs/devel/collab.md). +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/devel/collab.md). Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/developer-guides/vagrant.md b/developer-guides/vagrant.md index f451d755..61560db7 100644 --- a/developer-guides/vagrant.md +++ b/developer-guides/vagrant.md @@ -19,8 +19,8 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. -The latest 1.0.x release of this document can be found -[here](http://releases.k8s.io/release-1.0/docs/devel/developer-guides/vagrant.md). +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/devel/developer-guides/vagrant.md). Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/development.md b/development.md index 0b778dd9..09abe1e7 100644 --- a/development.md +++ b/development.md @@ -19,8 +19,8 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. -The latest 1.0.x release of this document can be found -[here](http://releases.k8s.io/release-1.0/docs/devel/development.md). +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/devel/development.md). Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/e2e-tests.md b/e2e-tests.md index 882da396..d1f909dc 100644 --- a/e2e-tests.md +++ b/e2e-tests.md @@ -19,8 +19,8 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. -The latest 1.0.x release of this document can be found -[here](http://releases.k8s.io/release-1.0/docs/devel/e2e-tests.md). +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/devel/e2e-tests.md). Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/faster_reviews.md b/faster_reviews.md index 0c70e435..f0cb159c 100644 --- a/faster_reviews.md +++ b/faster_reviews.md @@ -19,8 +19,8 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. -The latest 1.0.x release of this document can be found -[here](http://releases.k8s.io/release-1.0/docs/devel/faster_reviews.md). +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/devel/faster_reviews.md). Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/flaky-tests.md b/flaky-tests.md index 2470a815..27c788aa 100644 --- a/flaky-tests.md +++ b/flaky-tests.md @@ -19,8 +19,8 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. -The latest 1.0.x release of this document can be found -[here](http://releases.k8s.io/release-1.0/docs/devel/flaky-tests.md). +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/devel/flaky-tests.md). Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/getting-builds.md b/getting-builds.md index 3803c873..375a1fac 100644 --- a/getting-builds.md +++ b/getting-builds.md @@ -19,8 +19,8 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. -The latest 1.0.x release of this document can be found -[here](http://releases.k8s.io/release-1.0/docs/devel/getting-builds.md). +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/devel/getting-builds.md). Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/how-to-doc.md b/how-to-doc.md index 283cab1f..7f1d30ba 100644 --- a/how-to-doc.md +++ b/how-to-doc.md @@ -19,8 +19,8 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. -The latest 1.0.x release of this document can be found -[here](http://releases.k8s.io/release-1.0/docs/devel/how-to-doc.md). +The latest 1.1.x release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/devel/how-to-doc.md). Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/instrumentation.md b/instrumentation.md index 683f9d93..49f1f077 100644 --- a/instrumentation.md +++ b/instrumentation.md @@ -19,8 +19,8 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. -The latest 1.0.x release of this document can be found -[here](http://releases.k8s.io/release-1.0/docs/devel/instrumentation.md). +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/devel/instrumentation.md). Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/issues.md b/issues.md index c7bda07b..f2ce6949 100644 --- a/issues.md +++ b/issues.md @@ -19,8 +19,8 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. -The latest 1.0.x release of this document can be found -[here](http://releases.k8s.io/release-1.0/docs/devel/issues.md). +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/devel/issues.md). Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/kubectl-conventions.md b/kubectl-conventions.md index a37e5899..3775c0b3 100644 --- a/kubectl-conventions.md +++ b/kubectl-conventions.md @@ -19,8 +19,8 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. -The latest 1.0.x release of this document can be found -[here](http://releases.k8s.io/release-1.0/docs/devel/kubectl-conventions.md). +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/devel/kubectl-conventions.md). Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/logging.md b/logging.md index 3870c4c3..3dc22ca5 100644 --- a/logging.md +++ b/logging.md @@ -19,8 +19,8 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. -The latest 1.0.x release of this document can be found -[here](http://releases.k8s.io/release-1.0/docs/devel/logging.md). +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/devel/logging.md). Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/making-release-notes.md b/making-release-notes.md index 871e65b4..7a2d73c0 100644 --- a/making-release-notes.md +++ b/making-release-notes.md @@ -19,8 +19,8 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. -The latest 1.0.x release of this document can be found -[here](http://releases.k8s.io/release-1.0/docs/devel/making-release-notes.md). +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/devel/making-release-notes.md). Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/profiling.md b/profiling.md index f563ce0a..f05b9d74 100644 --- a/profiling.md +++ b/profiling.md @@ -19,8 +19,8 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. -The latest 1.0.x release of this document can be found -[here](http://releases.k8s.io/release-1.0/docs/devel/profiling.md). +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/devel/profiling.md). Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/pull-requests.md b/pull-requests.md index 15a0f447..b97da36e 100644 --- a/pull-requests.md +++ b/pull-requests.md @@ -19,8 +19,8 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. -The latest 1.0.x release of this document can be found -[here](http://releases.k8s.io/release-1.0/docs/devel/pull-requests.md). +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/devel/pull-requests.md). Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/releasing.md b/releasing.md index a41568e0..01f185bd 100644 --- a/releasing.md +++ b/releasing.md @@ -19,8 +19,8 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. -The latest 1.0.x release of this document can be found -[here](http://releases.k8s.io/release-1.0/docs/devel/releasing.md). +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/devel/releasing.md). Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/scheduler.md b/scheduler.md index c9d32aa4..ffc73ca1 100755 --- a/scheduler.md +++ b/scheduler.md @@ -19,8 +19,8 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. -The latest 1.0.x release of this document can be found -[here](http://releases.k8s.io/release-1.0/docs/devel/scheduler.md). +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/devel/scheduler.md). Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/scheduler_algorithm.md b/scheduler_algorithm.md index d6a8b6c5..c8790af9 100755 --- a/scheduler_algorithm.md +++ b/scheduler_algorithm.md @@ -19,8 +19,8 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. -The latest 1.0.x release of this document can be found -[here](http://releases.k8s.io/release-1.0/docs/devel/scheduler_algorithm.md). +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/devel/scheduler_algorithm.md). Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/writing-a-getting-started-guide.md b/writing-a-getting-started-guide.md index c9d4e2ca..a82691a8 100644 --- a/writing-a-getting-started-guide.md +++ b/writing-a-getting-started-guide.md @@ -19,8 +19,8 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. -The latest 1.0.x release of this document can be found -[here](http://releases.k8s.io/release-1.0/docs/devel/writing-a-getting-started-guide.md). +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/devel/writing-a-getting-started-guide.md). Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). -- cgit v1.2.3 From f10d80bd8b208da1c5470177e0d843fe1d0de830 Mon Sep 17 00:00:00 2001 From: Tim Hockin Date: Tue, 3 Nov 2015 10:17:57 -0800 Subject: Run update-gendocs --- README.md | 4 ++-- access.md | 4 ++-- admission_control.md | 4 ++-- admission_control_limit_range.md | 4 ++-- admission_control_resource_quota.md | 4 ++-- architecture.md | 4 ++-- aws_under_the_hood.md | 4 ++-- clustering.md | 4 ++-- clustering/README.md | 4 ++-- command_execution_port_forwarding.md | 4 ++-- daemon.md | 4 ++-- event_compression.md | 4 ++-- expansion.md | 4 ++-- extending-api.md | 4 ++-- horizontal-pod-autoscaler.md | 4 ++-- identifiers.md | 4 ++-- namespaces.md | 4 ++-- networking.md | 4 ++-- persistent-storage.md | 4 ++-- principles.md | 4 ++-- resources.md | 4 ++-- secrets.md | 4 ++-- security.md | 4 ++-- security_context.md | 4 ++-- service_accounts.md | 4 ++-- simple-rolling-update.md | 4 ++-- versioning.md | 4 ++-- 27 files changed, 54 insertions(+), 54 deletions(-) diff --git a/README.md b/README.md index 72d2c662..ef5a1157 100644 --- a/README.md +++ b/README.md @@ -19,8 +19,8 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. -The latest 1.0.x release of this document can be found -[here](http://releases.k8s.io/release-1.0/docs/design/README.md). +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/design/README.md). Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/access.md b/access.md index 123516f9..10a0c9fe 100644 --- a/access.md +++ b/access.md @@ -19,8 +19,8 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. -The latest 1.0.x release of this document can be found -[here](http://releases.k8s.io/release-1.0/docs/design/access.md). +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/design/access.md). Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/admission_control.md b/admission_control.md index a2b5700b..e9303728 100644 --- a/admission_control.md +++ b/admission_control.md @@ -19,8 +19,8 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. -The latest 1.0.x release of this document can be found -[here](http://releases.k8s.io/release-1.0/docs/design/admission_control.md). +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/design/admission_control.md). Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/admission_control_limit_range.md b/admission_control_limit_range.md index e7c706ef..d13a98f1 100644 --- a/admission_control_limit_range.md +++ b/admission_control_limit_range.md @@ -19,8 +19,8 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. -The latest 1.0.x release of this document can be found -[here](http://releases.k8s.io/release-1.0/docs/design/admission_control_limit_range.md). +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/design/admission_control_limit_range.md). Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/admission_control_resource_quota.md b/admission_control_resource_quota.md index a9de7a9c..31d4a147 100644 --- a/admission_control_resource_quota.md +++ b/admission_control_resource_quota.md @@ -19,8 +19,8 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. -The latest 1.0.x release of this document can be found -[here](http://releases.k8s.io/release-1.0/docs/design/admission_control_resource_quota.md). +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/design/admission_control_resource_quota.md). Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/architecture.md b/architecture.md index 2a761dea..3bb24e44 100644 --- a/architecture.md +++ b/architecture.md @@ -19,8 +19,8 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. -The latest 1.0.x release of this document can be found -[here](http://releases.k8s.io/release-1.0/docs/design/architecture.md). +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/design/architecture.md). Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/aws_under_the_hood.md b/aws_under_the_hood.md index ec8a31c2..9fe46d6f 100644 --- a/aws_under_the_hood.md +++ b/aws_under_the_hood.md @@ -19,8 +19,8 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. -The latest 1.0.x release of this document can be found -[here](http://releases.k8s.io/release-1.0/docs/design/aws_under_the_hood.md). +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/design/aws_under_the_hood.md). Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/clustering.md b/clustering.md index 757c1f0b..66bd0784 100644 --- a/clustering.md +++ b/clustering.md @@ -19,8 +19,8 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. -The latest 1.0.x release of this document can be found -[here](http://releases.k8s.io/release-1.0/docs/design/clustering.md). +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/design/clustering.md). Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/clustering/README.md b/clustering/README.md index d02b7d50..073deb05 100644 --- a/clustering/README.md +++ b/clustering/README.md @@ -19,8 +19,8 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. -The latest 1.0.x release of this document can be found -[here](http://releases.k8s.io/release-1.0/docs/design/clustering/README.md). +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/design/clustering/README.md). Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/command_execution_port_forwarding.md b/command_execution_port_forwarding.md index 852e761e..dbd7b0eb 100644 --- a/command_execution_port_forwarding.md +++ b/command_execution_port_forwarding.md @@ -19,8 +19,8 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. -The latest 1.0.x release of this document can be found -[here](http://releases.k8s.io/release-1.0/docs/design/command_execution_port_forwarding.md). +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/design/command_execution_port_forwarding.md). Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/daemon.md b/daemon.md index a72b8755..6e783d8f 100644 --- a/daemon.md +++ b/daemon.md @@ -19,8 +19,8 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. -The latest 1.0.x release of this document can be found -[here](http://releases.k8s.io/release-1.0/docs/design/daemon.md). +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/design/daemon.md). Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/event_compression.md b/event_compression.md index e1a95165..c7982712 100644 --- a/event_compression.md +++ b/event_compression.md @@ -19,8 +19,8 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. -The latest 1.0.x release of this document can be found -[here](http://releases.k8s.io/release-1.0/docs/design/event_compression.md). +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/design/event_compression.md). Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/expansion.md b/expansion.md index b19731b9..770ec054 100644 --- a/expansion.md +++ b/expansion.md @@ -19,8 +19,8 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. -The latest 1.0.x release of this document can be found -[here](http://releases.k8s.io/release-1.0/docs/design/expansion.md). +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/design/expansion.md). Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/extending-api.md b/extending-api.md index 077d5530..303ebeac 100644 --- a/extending-api.md +++ b/extending-api.md @@ -19,8 +19,8 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. -The latest 1.0.x release of this document can be found -[here](http://releases.k8s.io/release-1.0/docs/design/extending-api.md). +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/design/extending-api.md). Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/horizontal-pod-autoscaler.md b/horizontal-pod-autoscaler.md index 35991847..42cd27bb 100644 --- a/horizontal-pod-autoscaler.md +++ b/horizontal-pod-autoscaler.md @@ -19,8 +19,8 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. -The latest 1.0.x release of this document can be found -[here](http://releases.k8s.io/release-1.0/docs/design/horizontal-pod-autoscaler.md). +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/design/horizontal-pod-autoscaler.md). Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/identifiers.md b/identifiers.md index 7deff9e9..04ee4ab1 100644 --- a/identifiers.md +++ b/identifiers.md @@ -19,8 +19,8 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. -The latest 1.0.x release of this document can be found -[here](http://releases.k8s.io/release-1.0/docs/design/identifiers.md). +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/design/identifiers.md). Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/namespaces.md b/namespaces.md index bb907c67..b5965348 100644 --- a/namespaces.md +++ b/namespaces.md @@ -19,8 +19,8 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. -The latest 1.0.x release of this document can be found -[here](http://releases.k8s.io/release-1.0/docs/design/namespaces.md). +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/design/namespaces.md). Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/networking.md b/networking.md index dfe0f93e..56009d5b 100644 --- a/networking.md +++ b/networking.md @@ -19,8 +19,8 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. -The latest 1.0.x release of this document can be found -[here](http://releases.k8s.io/release-1.0/docs/design/networking.md). +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/design/networking.md). Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/persistent-storage.md b/persistent-storage.md index bb200811..a95ba305 100644 --- a/persistent-storage.md +++ b/persistent-storage.md @@ -19,8 +19,8 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. -The latest 1.0.x release of this document can be found -[here](http://releases.k8s.io/release-1.0/docs/design/persistent-storage.md). +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/design/persistent-storage.md). Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/principles.md b/principles.md index be3dff55..20343ac4 100644 --- a/principles.md +++ b/principles.md @@ -19,8 +19,8 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. -The latest 1.0.x release of this document can be found -[here](http://releases.k8s.io/release-1.0/docs/design/principles.md). +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/design/principles.md). Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/resources.md b/resources.md index f9bbc8db..9b6ac51b 100644 --- a/resources.md +++ b/resources.md @@ -19,8 +19,8 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. -The latest 1.0.x release of this document can be found -[here](http://releases.k8s.io/release-1.0/docs/design/resources.md). +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/design/resources.md). Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/secrets.md b/secrets.md index e8a5e42f..763c5567 100644 --- a/secrets.md +++ b/secrets.md @@ -19,8 +19,8 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. -The latest 1.0.x release of this document can be found -[here](http://releases.k8s.io/release-1.0/docs/design/secrets.md). +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/design/secrets.md). Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/security.md b/security.md index 5c187d69..e845c925 100644 --- a/security.md +++ b/security.md @@ -19,8 +19,8 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. -The latest 1.0.x release of this document can be found -[here](http://releases.k8s.io/release-1.0/docs/design/security.md). +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/design/security.md). Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/security_context.md b/security_context.md index 434d275e..413e2a2e 100644 --- a/security_context.md +++ b/security_context.md @@ -19,8 +19,8 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. -The latest 1.0.x release of this document can be found -[here](http://releases.k8s.io/release-1.0/docs/design/security_context.md). +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/design/security_context.md). Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/service_accounts.md b/service_accounts.md index 8e63e045..fb065d1a 100644 --- a/service_accounts.md +++ b/service_accounts.md @@ -19,8 +19,8 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. -The latest 1.0.x release of this document can be found -[here](http://releases.k8s.io/release-1.0/docs/design/service_accounts.md). +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/design/service_accounts.md). Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/simple-rolling-update.md b/simple-rolling-update.md index 720f4cbf..31f31d67 100644 --- a/simple-rolling-update.md +++ b/simple-rolling-update.md @@ -19,8 +19,8 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. -The latest 1.0.x release of this document can be found -[here](http://releases.k8s.io/release-1.0/docs/design/simple-rolling-update.md). +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/design/simple-rolling-update.md). Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/versioning.md b/versioning.md index 341e3b36..def20a03 100644 --- a/versioning.md +++ b/versioning.md @@ -19,8 +19,8 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. -The latest 1.0.x release of this document can be found -[here](http://releases.k8s.io/release-1.0/docs/design/versioning.md). +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/design/versioning.md). Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). -- cgit v1.2.3 From 87d268af9bf5c17f24ff3dfd52e9391847ce57fe Mon Sep 17 00:00:00 2001 From: Tim Hockin Date: Mon, 16 Nov 2015 10:52:26 -0800 Subject: clarify experimental annotations doc --- api-conventions.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/api-conventions.md b/api-conventions.md index e8aaf612..6781fcae 100644 --- a/api-conventions.md +++ b/api-conventions.md @@ -746,7 +746,7 @@ Therefore, resources supporting auto-generation of unique labels should have a ` Annotations have very different intended usage from labels. We expect them to be primarily generated and consumed by tooling and system extensions. I'm inclined to generalize annotations to permit them to directly store arbitrary json. Rigid names and name prefixes make sense, since they are analogous to API fields. -In fact, in-development API fields, including those used to represent fields of newer alpha/beta API versions in the older stable storage version, may be represented as annotations with the form `something.alpha.kubernetes.io/name` or `something.beta.kubernetes.io/name` (depending on our confidence in it). For example `net.alpha.kubernetes.io/policy` might represent an experimental network policy field. +In fact, in-development API fields, including those used to represent fields of newer alpha/beta API versions in the older stable storage version, may be represented as annotations with the form `something.alpha.kubernetes.io/name` or `something.beta.kubernetes.io/name` (depending on our confidence in it). For example `net.alpha.kubernetes.io/policy` might represent an experimental network policy field. The "name" portion of the annotation should follow the below conventions for annotations. When an annotation gets promoted to a field, the name transformation should then be mechanical: `foo-bar` becomes `fooBar`. Other advice regarding use of labels, annotations, and other generic map keys by Kubernetes components and tools: - Key names should be all lowercase, with words separated by dashes, such as `desired-replicas` -- cgit v1.2.3 From 95d30e62890bb3b2f8aef65cbd96d0042d6d8e19 Mon Sep 17 00:00:00 2001 From: dingh Date: Tue, 10 Nov 2015 15:09:23 +0800 Subject: Create proposal on multiple schedulers update according to many reviewers, 2015.11.17 --- multiple-schedulers.md | 165 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 165 insertions(+) create mode 100644 multiple-schedulers.md diff --git a/multiple-schedulers.md b/multiple-schedulers.md new file mode 100644 index 00000000..51466008 --- /dev/null +++ b/multiple-schedulers.md @@ -0,0 +1,165 @@ + + + + +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/proposals/multiple-schedulers.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- + + + + + +# Multi-Scheduler in Kubernetes + +**Status**: Design & Implementation in progress. + +> Contact @HaiyangDING for questions & suggestions. + +## Motivation + +In current Kubernetes design, there is only one default scheduler in a Kubernetes cluster. +However it is common that multiple types of workload, such as traditional batch, DAG batch, streaming and user-facing production services, +are running in the same cluster and they need to be scheduled in different ways. For example, in +[Omega](http://research.google.com/pubs/pub41684.html) batch workload and service workload are scheduled by two types of schedulers: +the batch workload is scheduled by a scheduler which looks at the current usage of the cluster to improve the resource usage rate +and the service workload is scheduled by another one which considers the reserved resources in the +cluster and many other constraints since their performance must meet some higher SLOs. +[Mesos](http://mesos.apache.org/) has done a great work to support multiple schedulers by building a +two-level scheduling structure. This proposal describes how Kubernetes is going to support multi-scheduler +so that users could be able to run their user-provided scheduler(s) to enable some customized scheduling +behavior as they need. As previously discussed in [#11793](https://github.com/kubernetes/kubernetes/issues/11793), +[#9920](https://github.com/kubernetes/kubernetes/issues/9920) and [#11470](https://github.com/kubernetes/kubernetes/issues/11470), +the design of the multiple scheduler should be generic and includes adding a scheduler name annotation to separate the pods. +It is worth mentioning that the proposal does not address the question of how the scheduler name annotation gets +set although it is reasonable to anticipate that it would be set by a component like admission controller/initializer, +as the doc currently does. + +Before going to the details of this proposal, below lists a number of the methods to extend the scheduler: + +- Write your own scheduler and run it along with Kubernetes native scheduler. This is going to be detailed in this proposal +- Use the callout approach such as the one implemented in [#13580](https://github.com/kubernetes/kubernetes/issues/13580) +- Recompile the scheduler with a new policy +- Restart the scheduler with a new [scheduler policy config file](../../examples/scheduler-policy-config.json) +- Or maybe in future dynamically link a new policy into the running scheduler + +## Challenges in multiple schedulers + +- Separating the pods + + Each pod should be scheduled by only one scheduler. As for implementation, a pod should + have an additional field to tell by which scheduler it wants to be scheduled. Besides, + each scheduler, including the default one, should have a unique logic of how to add unscheduled + pods to its to-be-scheduled pod queue. Details will be explained in later sections. + +- Dealing with conflicts + + Different schedulers are essentially separated processes. When all schedulers try to schedule + their pods onto the nodes, there might be conflicts. + + One example of the conflicts is resource racing: Suppose there be a `pod1` scheduled by + `my-scheduler` requiring 1 CPU's *request*, and a `pod2` scheduled by `kube-scheduler` (k8s native + scheduler, acting as default scheduler) requiring 2 CPU's *request*, while `node-a` only has 2.5 + free CPU's, if both schedulers all try to put their pods on `node-a`, then one of them would eventually + fail when Kubelet on `node-a` performs the create action due to insufficient CPU resources. + + This conflict is complex to deal with in api-server and etcd. Our current solution is to let Kubelet + to do the conflict check and if the conflict happens, effected pods would be put back to scheduler + and waiting to be scheduled again. Implementation details are in later sections. + +## Where to start: initial design + +We definitely want the multi-scheduler design to be a generic mechanism. The following lists the changes +we want to make in the first step. + +- Add an annotation in pod template: `scheduler.alpha.kubernetes.io/name: scheduler-name`, this is used to +separate pods between schedulers. `scheduler-name` should match one of the schedulers' `scheduler-name` +- Add a `scheduler-name` to each scheduler. It is done by hardcode or as command-line argument. The +Kubernetes native scheduler (now `kube-scheduler` process) would have the name as `kube-scheduler` +- The `scheduler-name` plays an important part in separating the pods between different schedulers. +Pods are statically dispatched to different schedulers based on `scheduler.alpha.kubernetes.io/name: scheduler-name` +annotation and there should not be any conflicts between different schedulers handling their pods, i.e. one pod must +NOT be claimed by more than one scheduler. To be specific, a scheduler can add a pod to its queue if and only if: + 1. The pod has no nodeName, **AND** + 2. The `scheduler-name` specified in the pod's annotation `scheduler.alpha.kubernetes.io/name: scheduler-name` + matches the `scheduler-name` of the scheduler. + + The only one exception is the default scheduler. Any pod that has no `scheduler.alpha.kubernetes.io/name: scheduler-name` + annotation is assumed to be handled by the "default scheduler". In the first version of the multi-scheduler feature, + the default scheduler would be the Kubernetes built-in scheduler with `scheduler-name` as `kube-scheduler`. + The Kubernetes build-in scheduler will claim any pod which has no `scheduler.alpha.kubernetes.io/name: scheduler-name` + annotation or which has `scheduler.alpha.kubernetes.io/name: kube-scheduler`. In the future, it may be possible to + change which scheduler is the default for a given cluster. + +- Dealing with conflicts. All schedulers must use predicate functions that are at least as strict as +the ones that Kubelet applies when deciding whether to accept a pod, otherwise Kubelet and scheduler +may get into an infinite loop where Kubelet keeps rejecting a pod and scheduler keeps re-scheduling +it back the same node. To make it easier for people who write new schedulers to obey this rule, we will +create a library containing the predicates Kubelet uses. (See issue [#12744](https://github.com/kubernetes/kubernetes/issues/12744).) + +In summary, in the initial version of this multi-scheduler design, we will achieve the following: + +- If a pod has the annotation `scheduler.alpha.kubernetes.io/name: kube-scheduler` or the user does not explicitly +sets this annotation in the template, it will be picked up by default scheduler +- If the annotation is set and refers to a valid `scheduler-name`, it will be picked up by the scheduler of +specified `scheduler-name` +- If the annotation is set but refers to an invalid `scheduler-name`, the pod will not be picked by any scheduler. +The pod will keep PENDING. + +### An example + +```yaml + kind: Pod + apiVersion: v1 + metadata: + name: pod-abc + labels: + foo: bar + annotations: + scheduler.alpha.kubernetes.io/name: my-scheduler +``` + +This pod will be scheduled by "my-scheduler" and ignored by "kube-scheduler". If there is no running scheduler +of name "my-scheduler", the pod will never be scheduled. + +## Next steps + +1. Use admission controller to add and verify the annotation, and do some modification if necessary. For example, the +admission controller might add the scheduler annotation based on the namespace of the pod, and/or identify if +there are conflicting rules, and/or set a default value for the scheduler annotation, and/or reject pods on +which the client has set a scheduler annotation that does not correspond to a running scheduler. +2. Dynamic launching scheduler(s) and registering to admission controller (as an external call). This also +requires some work on authorization and authentication to control what schedulers can write the /binding +subresource of which pods. + +## Other issues/discussions related to scheduler design + +- [#13580](https://github.com/kubernetes/kubernetes/pull/13580): scheduler extension +- [#17097](https://github.com/kubernetes/kubernetes/issues/17097): policy config file in pod template +- [#16845](https://github.com/kubernetes/kubernetes/issues/16845): scheduling groups of pods +- [#17208](https://github.com/kubernetes/kubernetes/issues/17208): guide to writing a new scheduler + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/multiple-schedulers.md?pixel)]() + -- cgit v1.2.3 From 31832344a1c2b32ba9813f032213944543ac1767 Mon Sep 17 00:00:00 2001 From: Brian Grant Date: Tue, 17 Nov 2015 15:18:17 +0000 Subject: Add conventions about primitive types. --- api-conventions.md | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/api-conventions.md b/api-conventions.md index e8aaf612..18e2ddb9 100644 --- a/api-conventions.md +++ b/api-conventions.md @@ -50,6 +50,7 @@ using resources with kubectl can be found in [Working with resources](../user-gu - [Typical status properties](#typical-status-properties) - [References to related objects](#references-to-related-objects) - [Lists of named subobjects preferred over maps](#lists-of-named-subobjects-preferred-over-maps) + - [Primitive types](#primitive-types) - [Constants](#constants) - [Lists and Simple kinds](#lists-and-simple-kinds) - [Differing Representations](#differing-representations) @@ -247,6 +248,14 @@ ports: This rule maintains the invariant that all JSON/YAML keys are fields in API objects. The only exceptions are pure maps in the API (currently, labels, selectors, annotations, data), as opposed to sets of subobjects. +#### Primitive types + +* Avoid floating-point values as much as possible, and never use them in spec. Floating-point values cannot be reliably round-tripped (encoded and re-decoded) without changing, and have varying precision and representations across languages and architectures. +* Do not use unsigned integers. Similarly, not all languages (e.g., Javascript) support unsigned integers. +* int64 is converted to float by Javascript and some other languages, so they also need to be accepted as strings. +* Do not use enums. Use aliases for string instead (e.g., `NodeConditionType`). +* Look at similar fields in the API (e.g., ports, durations) and follow the conventions of existing fields. + #### Constants Some fields will have a list of allowed values (enumerations). These values will be strings, and they will be in CamelCase, with an initial uppercase letter. Examples: "ClusterFirst", "Pending", "ClientIP". -- cgit v1.2.3 From 400bb52cc4fb10c6b63b8ee82f11d836842eba0d Mon Sep 17 00:00:00 2001 From: "Tim St. Clair" Date: Tue, 17 Nov 2015 18:45:16 -0800 Subject: Correct erroneous metric endpoint. --- compute-resource-metrics-api.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/compute-resource-metrics-api.md b/compute-resource-metrics-api.md index fb4ed908..b716dbd8 100644 --- a/compute-resource-metrics-api.md +++ b/compute-resource-metrics-api.md @@ -96,7 +96,7 @@ only be available through the kubelet. The types of metrics are detailed - `/rawNodes/localhost` - The only node provided is `localhost`; type metrics.Node - `/derivedNodes` - host metrics; type `[]metrics.DerivedNode` - - `/nodes/{node}` - derived metrics for a specific node + - `/derivedNodes/{node}` - derived metrics for a specific node - `/rawPods` - All raw pod metrics across all namespaces; type `[]metrics.RawPod` - `/derivedPods` - All derived pod metrics across all namespaces; type -- cgit v1.2.3 From 3ec5bdafb97feaeb40135b4268548d33095a247c Mon Sep 17 00:00:00 2001 From: gmarek Date: Mon, 12 Oct 2015 11:48:05 +0200 Subject: Add Kubemark User Guide --- kubemark-guide.md | 175 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 175 insertions(+) create mode 100644 kubemark-guide.md diff --git a/kubemark-guide.md b/kubemark-guide.md new file mode 100644 index 00000000..7a68f4e6 --- /dev/null +++ b/kubemark-guide.md @@ -0,0 +1,175 @@ + + + + +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/devel/kubemark-guide.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- + + + + + +# Kubemark User Guide + +## Introduction + +Kubemark is a performance testing tool which allows users to run experiments on simulated clusters. The primary use case is scalability testing, as simulated clusters can be +much bigger than the real ones. The objective is to expose problems with the master components (API server, controller manager or scheduler) that appear only on bigger +clusters (e.g. small memory leaks). + +This document serves as a primer to understand what Kubemark is, what it is not, and how to use it. + +## Architecture + +On a very high level Kubemark cluster consists of two parts: real master components and a set of “Hollow” Nodes. The prefix “Hollow” means an implementation/instantiation of a +component with all “moving” parts mocked out. The best example is HollowKubelet, which pretends to be an ordinary Kubelet, but does not start anything, nor mount any volumes - +it just lies it does. More detailed design and implementation details are at the end of this document. + +Currently master components run on a dedicated machine(s), and HollowNodes run on an ‘external’ Kubernetes cluster. This design has a slight advantage, over running master +components on external cluster, of completely isolating master resources from everything else. + +## Requirements + +To run Kubemark you need a Kubernetes cluster for running all your HollowNodes and a dedicated machine for a master. Master machine has to be directly routable from +HollowNodes. You also need an access to some Docker repository. + +Currently scripts are written to be easily usable by GCE, but it should be relatively straightforward to port them to different providers or bare metal. + +## Common use cases and helper scripts + +Common workflow for Kubemark is: +- starting a Kubemark cluster (on GCE) +- running e2e tests on Kubemark cluster +- monitoring test execution and debugging problems +- turning down Kubemark cluster + +Included in descrptions there will be comments helpful for anyone who’ll want to port Kubemark to different providers. + +### Starting a Kubemark cluster + +To start a Kubemark cluster on GCE you need to create an external cluster (it can be GCE, GKE or any other cluster) by yourself, build a kubernetes release (e.g. by running +`make quick-release`) and run `test/kubemark/start-kubemark.sh` script. This script will create a VM for master components, Pods for HollowNodes and do all the setup necessary +to let them talk to each other. It will use the configuration stored in `cluster/kubemark/config-default.sh` - you can tweak it however you want, but note that some features +may not be implemented yet, as implementation of Hollow components/mocks will probably be lagging behind ‘real’ one. For performance tests interesting variables are +`NUM_MINIONS` and `MASTER_SIZE`. After start-kubemark script is finished you’ll have a ready Kubemark cluster, a kubeconfig file for talking to the Kubemark +cluster is stored in `test/kubemark/kubeconfig.loc`. + +Currently we're running HollowNode with limit of 0.05 a CPU core and ~60MB or memory, which taking into account default cluster addons and fluentD running on an 'external' +cluster, allows running ~17.5 HollowNodes per core. + +#### Behind the scene details: + +Start-kubemark script does quite a lot of things: +- Creates a master machine called hollow-cluster-master and PD for it (*uses gcloud, should be easy to do outside of GCE*) +- Creates a firewall rule which opens port 443\* on the master machine (*uses gcloud, should be easy to do outside of GCE*) +- Builds a Docker image for HollowNode from the current repository and pushes it to the Docker repository (*GCR for us, using scripts from `cluster/gce/util.sh` - it may get +tricky outside of GCE*) +- Generates certificates and kubeconfig files, writes a kubeconfig locally to `test/kubemark/kubeconfig.loc` and creates a Secret which stores kubeconfig for HollowKubelet/ +HollowProxy use (*used gcloud to transfer files to Master, should be easy to do outside of GCE*). +- Creates a ReplicationController for HollowNodes and starts them up. (*will work exactly the same everywhere as long as MASTER_IP will be populated correctly, but you’ll need +to update docker image address if you’re not using GCR and default image name*) +- Waits until all HollowNodes are in the Running phase (*will work exactly the same everywhere*) + +\* Port 443 is a secured port on the master machine which is used for all external communication with the API server. In the last sentence *external* means all traffic +comming from other machines, including all the Nodes, not only from outside of the cluster. Currently local components, i.e. ControllerManager and Scheduler talk with API server using insecure port 8080. + +### Running e2e tests on Kubemark cluster + +To run standard e2e test on your Kubemark cluster created in the previous step you execute `test/kubemark/run-e2e-tests.sh` script. It will configure ginkgo to +use Kubemark cluster instead of something else and start an e2e test. This script should not need any changes to work on other cloud providers. + +By default (if nothig will be passed to it) the script will run a Density '30 test. If you want to run a different e2e test you just need to provide flags you want to be +passed to `hack/ginkgo-e2e.sh` script, e.g. `--ginkgo.focus="Load"` to run the Load test. + +### Monitoring test execution and debugging problems + +Run-e2e-tests prints the same output on Kubemark as on ordinary e2e cluster, but if you need to dig deeper you need to learn how to debug HollowNodes and how Master +machine (currently) differs from the ordinary one. + +If you need to debug master machine you can do similar things as you do on your ordinary master. The difference between Kubemark setup and ordinary setup is that in Kubemark +etcd is run as a plain docker container, and all master components are run as normal processes. There’s no Kubelet overseeing them. Logs are stored in exactly the same place, +i.e. `/var/logs/` directory. Because binaries are not supervised by anything they won't be restarted in the case of a crash. + +To help you with debugging from inside the cluster startup script puts a `~/configure-kubectl.sh` script on the master. It downloads `gcloud` and `kubectl` tool and configures +kubectl to work on unsecured master port (useful if there are problems with security). After the script is run you can use kubectl command from the master machine to play with +the cluster. + +Debugging HollowNodes is a bit more tricky, as if you experience a problem on one of them you need to learn which hollow-node pod corresponds to a given HollowNode known by +the Master. During self-registeration HollowNodes provide their cluster IPs as Names, which means that if you need to find a HollowNode named `10.2.4.5` you just need to find a +Pod in external cluster with this cluster IP. There’s a helper script `test/kubemark/get-real-pod-for-hollow-node.sh` that does this for you. + +When you have a Pod name you can use `kubectl logs` on external cluster to get logs, or use a `kubectl describe pod` call to find an external Node on which this particular +HollowNode is running so you can ssh to it. + +E.g. you want to see the logs of HollowKubelet on which pod `my-pod` is running. To do so you can execute: + +``` +$ kubectl kubernetes/test/kubemark/kubeconfig.loc describe pod my-pod +``` + +Which outputs pod description and among it a line: + +``` +Node: 1.2.3.4/1.2.3.4 +``` + +To learn the `hollow-node` pod corresponding to node `1.2.3.4` you use aforementioned script: + +``` +$ kubernetes/test/kubemark/get-real-pod-for-hollow-node.sh 1.2.3.4 +``` + +which will output the line: + +``` +hollow-node-1234 +``` + +Now you just use ordinary kubectl command to get the logs: + +``` +kubectl --namespace=kubemark logs hollow-node-1234 +``` + +All those things should work exactly the same on all cloud providers. + +### Turning down Kubemark cluster + +On GCE you just need to execute `test/kubemark/stop-kubemark.sh` script, which will delete HollowNode ReplicationController and all the resources for you. On other providers +you’ll need to delete all this stuff by yourself. + +## Some current implementation details + +Kubemark master uses exactly the same binaries as ordinary Kubernetes does. This means that it will never be out of date. On the other hand HollowNodes use existing fake for +Kubelet (called SimpleKubelet), which mocks its runtime manager with `pkg/kubelet/fake-docker-manager.go`, where most logic sits. Because there’s no easy way of mocking other +managers (e.g. VolumeManager), they are not supported in Kubemark (e.g. we can’t schedule Pods with volumes in them yet). + +As the time passes more fakes will probably be plugged into HollowNodes, but it’s crucial to make it as simple as possible to allow running a big number of Hollows on a single +core. + + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/kubemark-guide.md?pixel)]() + -- cgit v1.2.3 From 1344366406a84910c7a9a85de737410b1a3b9761 Mon Sep 17 00:00:00 2001 From: Brian Grant Date: Wed, 18 Nov 2015 17:30:19 +0000 Subject: Address feedback --- api-conventions.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/api-conventions.md b/api-conventions.md index 18e2ddb9..d710cca2 100644 --- a/api-conventions.md +++ b/api-conventions.md @@ -251,8 +251,8 @@ This rule maintains the invariant that all JSON/YAML keys are fields in API obje #### Primitive types * Avoid floating-point values as much as possible, and never use them in spec. Floating-point values cannot be reliably round-tripped (encoded and re-decoded) without changing, and have varying precision and representations across languages and architectures. -* Do not use unsigned integers. Similarly, not all languages (e.g., Javascript) support unsigned integers. -* int64 is converted to float by Javascript and some other languages, so they also need to be accepted as strings. +* All numbers (e.g., uint32, int64) are converted to float64 by Javascript and some other languages, so any field which is expected to exceed that either in magnitude or in precision (specifically integer values > 53 bits) should be serialized and accepted as strings. +* Do not use unsigned integers, due to inconsistent support across languages and libraries. Just validate that the integer is non-negative if that's the case. * Do not use enums. Use aliases for string instead (e.g., `NodeConditionType`). * Look at similar fields in the API (e.g., ports, durations) and follow the conventions of existing fields. -- cgit v1.2.3 From eac73b42443d1930364e5dccd3f375b57772e5d5 Mon Sep 17 00:00:00 2001 From: Isaac Hollander McCreery Date: Wed, 18 Nov 2015 09:50:56 -0800 Subject: Defer release notes to announcement of release, and move instructions for release notes back into docs and away from scripts --- releasing.md | 165 +++++++++++++++++++---------------------------------------- 1 file changed, 53 insertions(+), 112 deletions(-) diff --git a/releasing.md b/releasing.md index 01f185bd..4805481f 100644 --- a/releasing.md +++ b/releasing.md @@ -134,123 +134,22 @@ cd kubernetes or `git checkout upstream/master` from an existing repo. -#### Cutting an alpha release (`vX.Y.0-alpha.W`) +Decide what version you're cutting and export it: -Figure out what version you're cutting, and +- alpha release: `export VER="vX.Y.0-alpha.W"`; +- beta release: `export VER="vX.Y.Z-beta.W"`; +- official release: `export VER="vX.Y.Z"`; +- new release series: `export VER="vX.Y"`. -```console -export VER="vX.Y.0-alpha.W" -``` - -then, run - -```console -./release/cut-official-release.sh "${VER}" "${GITHASH}" -``` - -This will do a dry run of: - -1. mark the `vX.Y.0-alpha.W` tag at the given git hash; -1. prompt you to do the remainder of the work, including building the - appropriate binaries and pushing them to the appropriate places. - -If you're satisfied with the result, run - -```console -./release/cut-official-release.sh "${VER}" "${GITHASH}" --no-dry-run -``` - -and follow the instructions. - -#### Cutting an beta release (`vX.Y.Z-beta.W`) - -Figure out what version you're cutting, and - -```console -export VER="vX.Y.Z-beta.W" -``` - -then, run +Then, run ```console ./release/cut-official-release.sh "${VER}" "${GITHASH}" ``` -This will do a dry run of: - -1. do a series of commits on the release branch for `vX.Y.Z-beta.W`; -1. mark the `vX.Y.Z-beta.W` tag at the beta version commit; -1. prompt you to do the remainder of the work, including building the - appropriate binaries and pushing them to the appropriate places. - -If you're satisfied with the result, run - -```console -./release/cut-official-release.sh "${VER}" "${GITHASH}" --no-dry-run -``` - -and follow the instructions. - -#### Cutting an official release (`vX.Y.Z`) - -Figure out what version you're cutting, and - -```console -export VER="vX.Y.Z" -``` - -then, run - -```console -./release/cut-official-release.sh "${VER}" "${GITHASH}" -``` - -This will do a dry run of: - -1. do a series of commits on the branch for `vX.Y.Z`; -1. mark the `vX.Y.Z` tag at the release version commit; -1. do a series of commits on the branch for `vX.Y.(Z+1)-beta.0` on top of the - previous commits; -1. mark the `vX.Y.(Z+1)-beta.0` tag at the beta version commit; -1. prompt you to do the remainder of the work, including building the - appropriate binaries and pushing them to the appropriate places. - -If you're satisfied with the result, run - -```console -./release/cut-official-release.sh "${VER}" "${GITHASH}" --no-dry-run -``` - -and follow the instructions. - -#### Branching a new release series (`vX.Y`) - -Once again, **this is a big deal!** If you're reading this doc for the first -time, you probably shouldn't be doing this release, and should talk to someone -on the release team. - -Figure out what series you're cutting, and - -```console -export VER="vX.Y" -``` - -then, run - -```console -./release/cut-official-release.sh "${VER}" "${GITHASH}" -``` - -This will do a dry run of: - -1. mark the `vX.(Y+1).0-alpha.0` tag at the given git hash on `master`; -1. fork a new branch `release-X.Y` off of `master` at the given git hash; -1. do a series of commits on the branch for `vX.Y.0-beta.0`; -1. mark the `vX.Y.0-beta.0` tag at the beta version commit; -1. prompt you to do the remainder of the work, including building the - appropriate binaries and pushing them to the appropriate places. - -If you're satisfied with the result, run +This will do a dry run of the release. It will give you instructions at the +end for `pushd`ing into the dry-run directory and having a look around. If +you're satisfied with the result, run ```console ./release/cut-official-release.sh "${VER}" "${GITHASH}" --no-dry-run @@ -260,8 +159,50 @@ and follow the instructions. ### Publishing binaries and release notes -The script you ran above will prompt you to take any remaining steps, including -publishing binaries and release notes. +The script you ran above will prompt you to take any remaining steps to push +tars, and will also give you a template for the release notes. Compose an +email to the team with the template, and use `build/make-release-notes.sh` +and/or `release-notes/release-notes.go` in +[kubernetes/contrib](https://github.com/kubernetes/contrib) to make the release +notes, (see #17444 for more info). + +- Alpha release: + - Figure out what the PR numbers for this release and last release are, and + get an api-token from GitHub (https://github.com/settings/tokens). From a + clone of kubernetes/contrib at upstream/master, + go run release-notes/release-notes.go --last-release-pr= --current-release-pr= --api-token= + Feel free to prune. +- Beta release: + - Only publish a beta release if it's a standalone pre-release. (We create + beta tags after we do official releases to maintain proper semantic + versioning, *we don't publish these beta releases*.) Use + `./hack/cherry_pick_list.sh ${VER}` to get release notes for such a + release. +- Official release: + - From your clone of upstream/master, run `./hack/cherry_pick_list.sh ${VER}` + to get the release notes for the patch release you just created. Feel free + to prune anything internal, but typically for patch releases we tend to + include everything in the release notes. + - If this is a first official release (vX.Y.0), look through the release + notes for all of the alpha releases since the last cycle, and include + anything important in release notes. + +Send the email out, letting people know these are the draft release notes. If +they want to change anything, they should update the appropriate PRs with the +`release-note` label. + +When we're ready to announce the release, [create a GitHub +release](https://github.com/kubernetes/kubernetes/releases/new): + +1. pick the appropriate tag; +1. check "This is a pre-release" if it's an alpha or beta release; +1. fill in the release title from the draft; +1. re-run the appropriate release notes tool(s) to pick up any changes people + have made; +1. find the appropriate `kubernetes.tar.gz` in GCS, download it, double check + the hash (compare to what you had in the release notes draft), and attach it + to the release; and +1. publish! ## Injecting Version into Binaries -- cgit v1.2.3 From ce41098cb3fe5dfd2a1aeade718ed37307257293 Mon Sep 17 00:00:00 2001 From: Brendan Burns Date: Fri, 13 Nov 2015 16:54:45 -0800 Subject: Add a description of the proposed owners file system for the repo --- owners.md | 131 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 131 insertions(+) create mode 100644 owners.md diff --git a/owners.md b/owners.md new file mode 100644 index 00000000..22bb2fef --- /dev/null +++ b/owners.md @@ -0,0 +1,131 @@ + + + + +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/devel/owners.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- + + + + + +# Owners files + +_Note_: This is a design for a feature that is not yet implemented. + +## Overview + +We want to establish owners for different parts of the code in the Kubernetes codebase. These owners +will serve as the approvers for code to be submitted to these parts of the repository. Notably, owners +are not necessarily expected to do the first code review for all commits to these areas, but they are +required to approve changes before they can be merged. + +## High Level flow + +### Step One: A PR is submitted + +After a PR is submitted, the automated kubernetes PR robot will append a message to the PR indicating the owners +that are required for the PR to be submitted. + +Subsequently, a user can also request the approval message from the robot by writing: + +``` +@k8s-bot approvers +``` + +into a comment. + +In either case, the automation replies with an annotation that indicates +the owners required to approve. The annotation is a comment that is applied to the PR. +This comment will say: + +``` +Approval is required from OR , AND OR , AND ... +``` + +The set of required owners is drawn from the OWNERS files in the repository (see below). For each file +there should be multiple different OWNERS, these owners are listed in the `OR` clause(s). Because +it is possible that a PR may cover different directories, with disjoint sets of OWNERS, a PR may require +approval from more than one person, this is where the `AND` clauses come from. + +`` should be the github user id of the owner _without_ a leading `@` symbol to prevent the owner +from being cc'd into the PR by email. + +### Step Two: A PR is LGTM'd + +Once a PR is reviewed and LGTM'd it is eligible for submission. However, for it to be submitted +an owner for all of the files changed in the PR have to 'approve' the PR. A user is an owner for a +file if they are included in the OWNERS hierarchy (see below) for that file. + +Owner approval comes in two forms: + + * An owner adds a comment to the PR saying "I approve" or "approved" + * An owner is the original author of the PR + +In the case of a comment based approval, the same rules as for the 'lgtm' label apply. If the PR is +changed by pushing new commits to the PR, the previous approval is invalidated, and the owner(s) must +approve again. Because of this is recommended that PR authors squash their PRs prior to getting approval +from owners. + +### Step Three: A PR is merged + +Once a PR is LGTM'd and all required owners have approved, it is eligible for merge. The merge bot takes care of +the actual merging. + +## Design details + +We need to build new features into the existing github munger in order to accomplish this. Additionally +we need to add owners files to the repository. + +### Approval Munger + +We need to add a munger that adds comments to PRs indicating whose approval they require. This munger will +look for PRs that do not have approvers already present in the comments, or where approvers have been +requested, and add an appropriate comment to the PR. + + +### Status Munger + +GitHub has a [status api](https://developer.github.com/v3/repos/statuses/), we will add a status munger that pushes a status onto a PR of approval status. This status will only be approved if the relevant +approvers have approved the PR. + +### Requiring approval status + +Github has the ability to [require status checks prior to merging](https://help.github.com/articles/enabling-required-status-checks/) + +Once we have the status check munger described above implemented, we will add this required status check +to our main branch as well as any release branches. + +### Adding owners files + +In each directory in the repository we may add an OWNERS file. This file will contain the github OWNERS +for that directory. OWNERSHIP is hierarchical, so if a directory does not container an OWNERS file, its +parent's OWNERS file is used instead. There will be a top-level OWNERS file to back-stop the system. + +Obviously changing the OWNERS file requires OWNERS permission. + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/owners.md?pixel)]() + -- cgit v1.2.3 From 8f8914f3ded0953fda8d532f1865bcc342b8e477 Mon Sep 17 00:00:00 2001 From: Isaac Hollander McCreery Date: Wed, 18 Nov 2015 16:19:01 -0800 Subject: Add sanity checks for release --- releasing.md | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/releasing.md b/releasing.md index 4805481f..d4347ce4 100644 --- a/releasing.md +++ b/releasing.md @@ -148,8 +148,17 @@ Then, run ``` This will do a dry run of the release. It will give you instructions at the -end for `pushd`ing into the dry-run directory and having a look around. If -you're satisfied with the result, run +end for `pushd`ing into the dry-run directory and having a look around. +`pushd` into the directory and make sure everythig looks as you expect: + +```console +git log "${VER}" # do you see the commit you expect? +make release +./cluster/kubectl.sh version -c +``` + +If you're satisfied with the result of the script, go back to `upstream/master` +run ```console ./release/cut-official-release.sh "${VER}" "${GITHASH}" --no-dry-run -- cgit v1.2.3 From ac54c5cc72068ecf1275c4bbea6d7950863d9ed9 Mon Sep 17 00:00:00 2001 From: Tamer Tas Date: Sat, 4 Apr 2015 21:25:14 +0300 Subject: ConfigData resource proposal --- config_data.md | 247 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 247 insertions(+) create mode 100644 config_data.md diff --git a/config_data.md b/config_data.md new file mode 100644 index 00000000..d06d3b43 --- /dev/null +++ b/config_data.md @@ -0,0 +1,247 @@ + + + + +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/proposals/config_data.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- + + + + + +# Generic Configuration Object + +## Abstract + +This proposal proposes a new API resource, `ConfigData`, that stores data used for the configuration +of applications deployed on `Kubernetes`. + +The main focus points of this proposal are: + +* Dynamic distribution of configuration data to deployed applications. +* Encapsulate configuration information and simplify `Kubernetes` deployments. +* Create a flexible configuration model for `Kubernetes`. + +## Motivation + +A `Secret`-like API resource is needed to store configuration data that pods can consume. + +Goals of this design: + +1. Describe a `ConfigData` API resource +2. Describe the semantics of consuming `ConfigData` as environment variables +3. Describe the semantics of consuming `ConfigData` as files in a volume + +## Use Cases + +1. As a user, I want to be able to consume configuration data as environment variables +2. As a user, I want to be able to consume configuration data as files in a volume +3. As a user, I want my view of configuration data in files to be eventually consistent with changes + to the data + +### Consuming `ConfigData` as Environment Variables + +Many programs read their configuration from environment variables. `ConfigData` should be possible +to consume in environment variables. The rough series of events for consuming `ConfigData` this way +is: + +1. A `ConfigData` object is created +2. A pod that consumes the configuration data via environment variables is created +3. The pod is scheduled onto a node +4. The kubelet retrieves the `ConfigData` resource(s) referenced by the pod and starts the container + processes with the appropriate data in environment variables + +### Consuming `ConfigData` in Volumes + +Many programs read their configuration from configuration files. `ConfigData` should be possible +to consume in a volume. The rough series of events for consuming `ConfigData` this way +is: + +1. A `ConfigData` object is created +2. A new pod using the `ConfigData` via the volume plugin is created +3. The pod is scheduled onto a node +4. The Kubelet creates an instance of the volume plugin and calls its `Setup()` method +5. The volume plugin retrieves the `ConfigData` resource(s) referenced by the pod and projects + the appropriate data into the volume + +### Consuming `ConfigData` Updates + +Any long-running system has configuration that is mutated over time. Changes made to configuration +data must be made visible to pods consuming data in volumes so that they can respond to those +changes. + +The `resourceVersion` of the `ConfigData` object will be updated by the API server every time the +object is modified. After an update, modifications will be made visible to the consumer container: + +1. A `ConfigData` object is created +2. A new pod using the `ConfigData` via the volume plugin is created +3. The pod is scheduled onto a node +4. During the sync loop, the Kubelet creates an instance of the volume plugin and calls its + `Setup()` method +5. The volume plugin retrieves the `ConfigData` resource(s) referenced by the pod and projects + the appropriate data into the volume +6. The `ConfigData` referenced by the pod is updated +7. During the next iteration of the `syncLoop`, the Kubelet creates an instance of the volume plugin + and calls its `Setup()` method +8. The volume plugin projects the updated data into the volume atomically + +It is the consuming pod's responsibility to make use of the updated data once it is made visible. + +Because environment variables cannot be updated without restarting a container, configuration data +consumed in environment variables will not be updated. + +### Advantages + +* Easy to consume in pods; consumer-agnostic +* Configuration data is persistent and versioned +* Consumers of configuration data in volumes can respond to changes in the data + +## Proposed Design + +### API Resource + +The `ConfigData` resource will be added to the `extensions` API Group: + +```go +package api + +// ConfigData holds configuration data for pods to consume. +type ConfigData struct { + TypeMeta `json:",inline"` + ObjectMeta `json:"metadata,omitempty"` + + // Data contains the configuration data. Each key must be a valid DNS_SUBDOMAIN or leading + // dot followed by valid DNS_SUBDOMAIN. + Data map[string]string `json:"data,omitempty"` +} + +type ConfigDataList struct { + TypeMeta `json:",inline"` + ListMeta `json:"metadata,omitempty"` + + Items []ConfigData `json:"items"` +} +``` + +A `Registry` implementation for `ConfigData` will be added to `pkg/registry/configdata`. + +### Environment Variables + +The `EnvVarSource` will be extended with a new selector for config data: + +```go +package api + +// EnvVarSource represents a source for the value of an EnvVar. +type EnvVarSource struct { + // other fields omitted + + // Specifies a ConfigData key + ConfigData *ConfigDataSelector `json:"configData,omitempty"` +} + +// ConfigDataSelector selects a key of a ConfigData. +type ConfigDataSelector struct { + // The name of the ConfigData to select a key from. + ConfigDataName string `json:"configDataName"` + // The key of the ConfigData to select. + Key string `json:"key"` +} +``` + +### Volume Source + +The volume source will be addressed in a follow-up PR. + +## Examples + +#### Consuming `ConfigData` as Environment Variables + +```yaml +apiVersion: extensions/v1beta1 +kind: ConfigData +metadata: + name: etcd-env-config +data: + number_of_members: 1 + initial_cluster_state: new + initial_cluster_token: DUMMY_ETCD_INITIAL_CLUSTER_TOKEN + discovery_token: DUMMY_ETCD_DISCOVERY_TOKEN + discovery_url: http://etcd-discovery:2379 + etcdctl_peers: http://etcd:2379 +``` + +This pod consumes the `ConfigData` as environment variables: + +```yaml +apiVersion: v1 +kind: Pod +metadata: + name: config-env-example +spec: + containers: + - name: etcd + image: openshift/etcd-20-centos7 + ports: + - containerPort: 2379 + protocol: TCP + - containerPort: 2380 + protocol: TCP + env: + - name: ETCD_NUM_MEMBERS + valueFrom: + configData: + configDataName: etcd-env-config + key: number_of_members + - name: ETCD_INITIAL_CLUSTER_STATE + valueFrom: + configData: + configDataName: etcd-env-config + key: initial_cluster_state + - name: ETCD_DISCOVERY_TOKEN + valueFrom: + configData: + configDataName: etcd-env-config + key: discovery_token + - name: ETCD_DISCOVERY_URL + valueFrom: + configData: + configDataName: etcd-env-config + key: discovery_url + - name: ETCDCTL_PEERS + valueFrom: + configData: + configDataName: etcd-env-config + key: etcdctl_peers +``` + +### Future Improvements + +In the future, we may add the ability to specify an init-container that can watch the volume +contents for updates and respond to changes when they occur. + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/config_data.md?pixel)]() + -- cgit v1.2.3 From 442bba37e2d93ae560adffe6c20b626e5916751c Mon Sep 17 00:00:00 2001 From: Brendan Burns Date: Thu, 19 Nov 2015 13:35:24 -0800 Subject: Update daemon.md --- daemon.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/daemon.md b/daemon.md index 6e783d8f..29f7e913 100644 --- a/daemon.md +++ b/daemon.md @@ -71,7 +71,7 @@ The DaemonSet supports standard API features: - YAML example: ```YAML - apiVersion: v1 + apiVersion: extensions/v1beta1 kind: DaemonSet metadata: labels: -- cgit v1.2.3 From 9a95bb33a27abc8fc1bb25fd7ba4bd9e38bd32f3 Mon Sep 17 00:00:00 2001 From: Paul Morie Date: Thu, 19 Nov 2015 23:19:46 -0500 Subject: Proposal: config data volume source --- config_data.md | 70 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 69 insertions(+), 1 deletion(-) diff --git a/config_data.md b/config_data.md index d06d3b43..253f961e 100644 --- a/config_data.md +++ b/config_data.md @@ -173,7 +173,36 @@ type ConfigDataSelector struct { ### Volume Source -The volume source will be addressed in a follow-up PR. +A new `ConfigDataVolumeSource` type of volume source containing the `ConfigData` object will be +added to the `VolumeSource` struct in the API: + +```go +package api + +type VolumeSource struct { + // other fields omitted + ConfigData *ConfigDataVolumeSource `json:"configData,omitempty"` +} + +// ConfigDataVolumeSource represents a volume that holds configuration data +type ConfigDataVolumeSource struct { + // A list of config data keys to project into the volume in files + Files []ConfigDataVolumeFile `json:"files"` +} + +// ConfigDataVolumeFile represents a single file containing config data +type ConfigDataVolumeFile struct { + ConfigDataSelector `json:",inline"` + + // The relative path name of the file to be created. + // Must not be absolute or contain the '..' path. Must be utf-8 encoded. + // The first item of the relative path must not start with '..' + Path string `json:"path"` +} +``` + +**Note:** The update logic used in the downward API volume plug-in will be extracted and re-used in +the volume plug-in for `ConfigData`. ## Examples @@ -237,6 +266,45 @@ spec: key: etcdctl_peers ``` +### Consuming `ConfigData` as Volumes + +`redis-volume-config` is intended to be used as a volume containing a config file: + +```yaml +apiVersion: extensions/v1beta1 +kind: ConfigData +metadata: + name: redis-volume-config +data: + redis.conf: "pidfile /var/run/redis.pid\nport6379\ntcp-backlog 511\n databases 1\ntimeout 0\n" +``` + +The following pod consumes the `redis-volume-config` in a volume: + +```yaml +apiVersion: v1 +kind: Pod +metadata: + name: config-volume-example +spec: + containers: + - name: redis + image: kubernetes/redis + command: "redis-server /mnt/config-data/etc/redis.conf" + ports: + - containerPort: 6379 + volumeMounts: + - name: config-data-volume + mountPath: /mnt/config-data + volumes: + - name: config-data-volume + configData: + files: + - path: "etc/redis.conf" + configDataName: redis-volume-config + key: redis.conf +``` + ### Future Improvements In the future, we may add the ability to specify an init-container that can watch the volume -- cgit v1.2.3 From ccd0d84dc966dfe49553fcc88efd5c4c7c0fbac6 Mon Sep 17 00:00:00 2001 From: Hongchao Deng Date: Fri, 20 Nov 2015 10:30:50 -0800 Subject: Kubemark guide: add paragraph to describe '--delete-namespace=false' --- kubemark-guide.md | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/kubemark-guide.md b/kubemark-guide.md index 7a68f4e6..758963de 100644 --- a/kubemark-guide.md +++ b/kubemark-guide.md @@ -100,9 +100,15 @@ comming from other machines, including all the Nodes, not only from outside of t To run standard e2e test on your Kubemark cluster created in the previous step you execute `test/kubemark/run-e2e-tests.sh` script. It will configure ginkgo to use Kubemark cluster instead of something else and start an e2e test. This script should not need any changes to work on other cloud providers. -By default (if nothig will be passed to it) the script will run a Density '30 test. If you want to run a different e2e test you just need to provide flags you want to be +By default (if nothing will be passed to it) the script will run a Density '30 test. If you want to run a different e2e test you just need to provide flags you want to be passed to `hack/ginkgo-e2e.sh` script, e.g. `--ginkgo.focus="Load"` to run the Load test. +By default, at the end of each test, it will delete namespaces and everything under it (e.g. events, replication controllers) on Kubemark master, which takes a lot of time. +Such work aren't needed in most cases: if you delete your Kubemark cluster after running `run-e2e-tests.sh`; +you don't care about namespace deletion performance, specifically related to etcd; etc. +There is a flag that enables you to avoid namespace deletion: `--delete-namespace=false`. +Adding the flag should let you see in logs: `Found DeleteNamespace=false, skipping namespace deletion!` + ### Monitoring test execution and debugging problems Run-e2e-tests prints the same output on Kubemark as on ordinary e2e cluster, but if you need to dig deeper you need to learn how to debug HollowNodes and how Master -- cgit v1.2.3 From e1ded93ff37ab654682ff38c0e77e47c6a7681e6 Mon Sep 17 00:00:00 2001 From: "Tim St. Clair" Date: Mon, 23 Nov 2015 18:06:23 -0800 Subject: Clarify when pointers are used for optional types --- api-conventions.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/api-conventions.md b/api-conventions.md index 6628e998..43550903 100644 --- a/api-conventions.md +++ b/api-conventions.md @@ -387,7 +387,8 @@ Fields must be either optional or required. Optional fields have the following properties: - They have `omitempty` struct tag in Go. -- They are a pointer type in the Go definition (e.g. `bool *awesomeFlag`). +- They are a pointer type in the Go definition (e.g. `bool *awesomeFlag`) or have a built-in `nil` + value (e.g. maps and slices). - The API server should allow POSTing and PUTing a resource with this field unset. Required fields have the opposite properties, namely: @@ -409,7 +410,8 @@ codebase. However: - having a pointer consistently imply optional is clearer for users of the Go language client, and any other clients that use corresponding types -Therefore, we ask that pointers always be used with optional fields. +Therefore, we ask that pointers always be used with optional fields that do not have a built-in +`nil` value. ## Defaulting -- cgit v1.2.3 From 1ca8a8d8ff7e8dd27c091dc96c7c2ff3d9f8eb55 Mon Sep 17 00:00:00 2001 From: Prashanth Balasubramanian Date: Sat, 21 Nov 2015 17:37:32 -0800 Subject: Flannel server in static pod with private etcd. --- flannel-integration.md | 35 +++++++++++++++++++++++++++++++++++ 1 file changed, 35 insertions(+) create mode 100644 flannel-integration.md diff --git a/flannel-integration.md b/flannel-integration.md new file mode 100644 index 00000000..5f33ec30 --- /dev/null +++ b/flannel-integration.md @@ -0,0 +1,35 @@ +# Flannel integration with Kubernetes + +## Why? + +* Networking works out of the box. +* Cloud gateway configuration is regulated. +* Consistent bare metal and cloud experience. +* Lays foundation for integrating with networking backends and vendors. + +# How? + +``` +Master Node1 +---------------------|-------------------------------- +database | + | | +{10.250.0.0/16} | docker + | here's podcidr |restart with podcidr +apiserver <------------------- kubelet + | | |here's podcidr +flannel-server:10253 <------- flannel-daemon + --/16---> + <--watch-- [config iptables] + subscribe to new node subnets + --------> [config VXLan] + | +``` + +There is a tiny lie in the above diagram, as of now, the flannel server on the master maintains a private etcd. This will not be necessary once we have a generalized network resource, and a Kubernetes x flannel backend. + +# Limitations + +* Integration is experimental + +# Wishlist -- cgit v1.2.3 From 3c8100c2e22eaee7dca47b9417fb2288f9b5d10a Mon Sep 17 00:00:00 2001 From: Prashanth Balasubramanian Date: Sun, 22 Nov 2015 16:06:04 -0800 Subject: Docs etc --- flannel-integration.md | 164 ++++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 147 insertions(+), 17 deletions(-) diff --git a/flannel-integration.md b/flannel-integration.md index 5f33ec30..417cab1d 100644 --- a/flannel-integration.md +++ b/flannel-integration.md @@ -1,35 +1,165 @@ + + + + +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/proposals/flannel-integration.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- + + + + + # Flannel integration with Kubernetes ## Why? * Networking works out of the box. -* Cloud gateway configuration is regulated. +* Cloud gateway configuration is regulated by quota. * Consistent bare metal and cloud experience. * Lays foundation for integrating with networking backends and vendors. -# How? +## How? + +Thus: + +``` +Master | Node1 +---------------------------------------------------------------------- +{192.168.0.0/16, 256 /24} | docker + | | | restart with podcidr +apiserver <------------------ kubelet (sends podcidr) + | | | here's podcidr, mtu +flannel-server:10253 <------------------ flannel-daemon +Allocates a /24 ------------------> [config iptables, VXLan] + <------------------ [watch subnet leases] +I just allocated ------------------> [config VXLan] +another /24 | +``` + +## Proposal + +Explaining vxlan is out of the scope of this document, however it does take some basic understanding to grok the proposal. Assume some pod wants to communicate across nodes with the above setup. Check the flannel vxlan devices: + +```console +node1 $ ip -d link show flannel.1 +4: flannel.1: mtu 1410 qdisc noqueue state UNKNOWN mode DEFAULT + link/ether a2:53:86:b5:5f:c1 brd ff:ff:ff:ff:ff:ff + vxlan +node1 $ ip -d link show eth0 +2: eth0: mtu 1460 qdisc mq state UP mode DEFAULT qlen 1000 + link/ether 42:01:0a:f0:00:04 brd ff:ff:ff:ff:ff:ff + +node2 $ ip -d link show flannel.1 +4: flannel.1: mtu 1410 qdisc noqueue state UNKNOWN mode DEFAULT + link/ether 56:71:35:66:4a:d8 brd ff:ff:ff:ff:ff:ff + vxlan +node2 $ ip -d link show eth0 +2: eth0: mtu 1460 qdisc mq state UP mode DEFAULT qlen 1000 + link/ether 42:01:0a:f0:00:03 brd ff:ff:ff:ff:ff:ff +``` + +Note that we're ignoring cbr0 for the sake of simplicity. Spin-up a container on each node. We're using raw docker for this example only because we want control over where the container lands: ``` -Master Node1 ----------------------|-------------------------------- -database | - | | -{10.250.0.0/16} | docker - | here's podcidr |restart with podcidr -apiserver <------------------- kubelet - | | |here's podcidr -flannel-server:10253 <------- flannel-daemon - --/16---> - <--watch-- [config iptables] - subscribe to new node subnets - --------> [config VXLan] - | +node1 $ docker run -it radial/busyboxplus:curl /bin/sh +[ root@5ca3c154cde3:/ ]$ ip addr show +1: lo: mtu 65536 qdisc noqueue +8: eth0: mtu 1410 qdisc noqueue + link/ether 02:42:12:10:20:03 brd ff:ff:ff:ff:ff:ff + inet 192.168.32.3/24 scope global eth0 + valid_lft forever preferred_lft forever + +node2 $ docker run -it radial/busyboxplus:curl /bin/sh +[ root@d8a879a29f5d:/ ]$ ip addr show +1: lo: mtu 65536 qdisc noqueue +16: eth0: mtu 1410 qdisc noqueue + link/ether 02:42:12:10:0e:07 brd ff:ff:ff:ff:ff:ff + inet 192.168.14.7/24 scope global eth0 + valid_lft forever preferred_lft forever +[ root@d8a879a29f5d:/ ]$ ping 192.168.32.3 +PING 192.168.32.3 (192.168.32.3): 56 data bytes +64 bytes from 192.168.32.3: seq=0 ttl=62 time=1.190 ms ``` -There is a tiny lie in the above diagram, as of now, the flannel server on the master maintains a private etcd. This will not be necessary once we have a generalized network resource, and a Kubernetes x flannel backend. +__What happened?__: + +From 1000 feet: +* vxlan device driver starts up on node1 and creates a udp tunnel endpoint on 8472 +* container 192.168.32.3 pings 192.168.14.7 + - what's the MAC of 192.168.14.0? + - L2 miss, flannel looks up MAC of subnet + - Stores `192.168.14.0 <-> 56:71:35:66:4a:d8` in neighbor table + - what's tunnel endpoint of this MAC? + - L3 miss, flannel looks up destination VM ip + - Stores `10.240.0.3 <-> 56:71:35:66:4a:d8` in bridge database +* Sends `[56:71:35:66:4a:d8, 10.240.0.3][vxlan: port, vni][02:42:12:10:20:03, 192.168.14.7][icmp]` + +__But will it blend?__ + +Kubernetes integration is fairly straight-forward once we understand the pieces involved, and can be prioritized as follows: +* Kubelet understands flannel daemon in client mode, flannel server manages independent etcd store on master, node controller backs off cidr allocation +* Flannel server consults the Kubernetes master for everything network related +* Flannel daemon works through network plugins in a generic way without bothering the kubelet: needs CNI x Kubernetes standardization + +The first is accomplished in this PR, while a timeline for 2. and 3. are TDB. To implement the flannel api we can either run a proxy per node and get rid of the flannel server, or service all requests in the flannel server with something like a go-routine per node: +* `/network/config`: read network configuration and return +* `/network/leases`: + - Post: Return a lease as understood by flannel + - Lookip node by IP + - Store node metadata from [flannel request] (https://github.com/coreos/flannel/blob/master/subnet/subnet.go#L34) in annotations + - Return [Lease object] (https://github.com/coreos/flannel/blob/master/subnet/subnet.go#L40) reflecting node cidr + - Get: Handle a watch on leases +* `/network/leases/subnet`: + - Put: This is a request for a lease. If the nodecontroller is allocating CIDRs we can probably just no-op. +* `/network/reservations`: TDB, we can probably use this to accomodate node controller allocating CIDR instead of flannel requesting it + +The ick-iest part of this implementation is going to the the `GET /network/leases`, i.e the watch proxy. We can side-step by waiting for a more generic Kubernetes resource. However, we can also implement it as follows: +* Watch all nodes, ignore heartbeats +* On each change, figure out the lease for the node, construct a [lease watch result](https://github.com/coreos/flannel/blob/0bf263826eab1707be5262703a8092c7d15e0be4/subnet/subnet.go#L72), and send it down the watch with the RV from the node +* Implement a lease list that does a similar translation + +I say this is gross without an api objet because for each node->lease translation one has to store and retrieve the node metadata sent by flannel (eg: VTEP) from node annotations. [Reference implementation](https://github.com/bprashanth/kubernetes/blob/network_vxlan/pkg/kubelet/flannel_server.go) and [watch proxy](https://github.com/bprashanth/kubernetes/blob/network_vxlan/pkg/kubelet/watch_proxy.go). # Limitations * Integration is experimental +* Flannel etcd not stored in persistent disk +* CIDR allocation does *not* flow from Kubernetes down to nodes anymore # Wishlist + +This proposal is really just a call for community help in writing a Kubernetes x flannel backend. + +* CNI plugin integration +* Flannel daemon in privileged pod +* Flannel server talks to apiserver, described in proposal above +* HTTPs between flannel daemon/server +* Investigate flannel server runing on every node (as done in the reference implementation mentioned above) +* Use flannel reservation mode to support node controller podcidr alloction + + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/flannel-integration.md?pixel)]() + -- cgit v1.2.3 From ceef4793e3e31ba883c02ea88d9891acd01a80d2 Mon Sep 17 00:00:00 2001 From: Brad Erickson Date: Mon, 23 Nov 2015 19:01:03 -0800 Subject: Minion->Node rename: KUBERNETES_NODE_MEMORY, VAGRANT_NODE_NAMES, etc ENABLE_NODE_PUBLIC_IP NODE_ADDRESS NODE_BLOCK_DEVICE_MAPPINGS NODE_CONTAINER_ADDRS NODE_CONTAINER_NETMASKS NODE_CONTAINER_SUBNET_BASE NODE_CONTAINER_SUBNETS NODE_CPU --- developer-guides/vagrant.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/developer-guides/vagrant.md b/developer-guides/vagrant.md index 61560db7..291b85bc 100644 --- a/developer-guides/vagrant.md +++ b/developer-guides/vagrant.md @@ -369,7 +369,7 @@ If you need more granular control, you can set the amount of memory for the mast ```sh export KUBERNETES_MASTER_MEMORY=1536 -export KUBERNETES_MINION_MEMORY=2048 +export KUBERNETES_NODE_MEMORY=2048 ``` #### I ran vagrant suspend and nothing works! -- cgit v1.2.3 From 53e1173488dc198aad3424fc7526452dd71f8644 Mon Sep 17 00:00:00 2001 From: Brad Erickson Date: Mon, 23 Nov 2015 19:03:44 -0800 Subject: Minion->Node rename: NODE_IP_BASE, NODE_IP_RANGES, NODE_IP_RANGE, etc NODE_IPS NODE_IP NODE_MEMORY_MB --- networking.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/networking.md b/networking.md index 56009d5b..3259a83a 100644 --- a/networking.md +++ b/networking.md @@ -134,7 +134,7 @@ Example of GCE's advanced routing rules: ```sh gcloud compute routes add "${MINION_NAMES[$i]}" \ --project "${PROJECT}" \ - --destination-range "${MINION_IP_RANGES[$i]}" \ + --destination-range "${NODE_IP_RANGES[$i]}" \ --network "${NETWORK}" \ --next-hop-instance "${MINION_NAMES[$i]}" \ --next-hop-instance-zone "${ZONE}" & -- cgit v1.2.3 From 718787711fc99207d148873711743279af124215 Mon Sep 17 00:00:00 2001 From: Brad Erickson Date: Mon, 23 Nov 2015 19:04:40 -0800 Subject: Minion->Node rename: NODE_NAMES, NODE_NAME, NODE_PORT --- networking.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/networking.md b/networking.md index 3259a83a..b110ca75 100644 --- a/networking.md +++ b/networking.md @@ -132,11 +132,11 @@ differentiate it from `docker0`) is set up outside of Docker proper. Example of GCE's advanced routing rules: ```sh -gcloud compute routes add "${MINION_NAMES[$i]}" \ +gcloud compute routes add "${NODE_NAMES[$i]}" \ --project "${PROJECT}" \ --destination-range "${NODE_IP_RANGES[$i]}" \ --network "${NETWORK}" \ - --next-hop-instance "${MINION_NAMES[$i]}" \ + --next-hop-instance "${NODE_NAMES[$i]}" \ --next-hop-instance-zone "${ZONE}" & ``` -- cgit v1.2.3 From 2a9c9d4c4984dde0acebbef17383e26a20be1312 Mon Sep 17 00:00:00 2001 From: Brad Erickson Date: Mon, 23 Nov 2015 19:06:36 -0800 Subject: Minion->Node rename: NUM_NODES --- aws_under_the_hood.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/aws_under_the_hood.md b/aws_under_the_hood.md index 9fe46d6f..a55c09e3 100644 --- a/aws_under_the_hood.md +++ b/aws_under_the_hood.md @@ -250,7 +250,7 @@ cross-AZ-clusters are more convenient. * For auto-scaling, on each nodes it creates a launch configuration and group. The name for both is <*KUBE_AWS_INSTANCE_PREFIX*>-minion-group. The default name is kubernetes-minion-group. The auto-scaling group has a min and max size - that are both set to NUM_MINIONS. You can change the size of the auto-scaling + that are both set to NUM_NODES. You can change the size of the auto-scaling group to add or remove the total number of nodes from within the AWS API or Console. Each nodes self-configures, meaning that they come up; run Salt with the stored configuration; connect to the master; are assigned an internal CIDR; -- cgit v1.2.3 From bc465c1d0f7b0f1c7d405cf1c287f255172ce151 Mon Sep 17 00:00:00 2001 From: Brad Erickson Date: Mon, 23 Nov 2015 19:06:36 -0800 Subject: Minion->Node rename: NUM_NODES --- developer-guides/vagrant.md | 6 +++--- kubemark-guide.md | 2 +- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/developer-guides/vagrant.md b/developer-guides/vagrant.md index 291b85bc..2d628abb 100644 --- a/developer-guides/vagrant.md +++ b/developer-guides/vagrant.md @@ -301,7 +301,7 @@ Congratulations! The following will run all of the end-to-end testing scenarios assuming you set your environment in `cluster/kube-env.sh`: ```sh -NUM_MINIONS=3 hack/e2e-test.sh +NUM_NODES=3 hack/e2e-test.sh ``` ### Troubleshooting @@ -350,10 +350,10 @@ Are you sure you built a release first? Did you install `net-tools`? For more cl #### I want to change the number of nodes! -You can control the number of nodes that are instantiated via the environment variable `NUM_MINIONS` on your host machine. If you plan to work with replicas, we strongly encourage you to work with enough nodes to satisfy your largest intended replica size. If you do not plan to work with replicas, you can save some system resources by running with a single node. You do this, by setting `NUM_MINIONS` to 1 like so: +You can control the number of nodes that are instantiated via the environment variable `NUM_NODES` on your host machine. If you plan to work with replicas, we strongly encourage you to work with enough nodes to satisfy your largest intended replica size. If you do not plan to work with replicas, you can save some system resources by running with a single node. You do this, by setting `NUM_NODES` to 1 like so: ```sh -export NUM_MINIONS=1 +export NUM_NODES=1 ``` #### I want my VMs to have more memory! diff --git a/kubemark-guide.md b/kubemark-guide.md index 758963de..df0ecb96 100644 --- a/kubemark-guide.md +++ b/kubemark-guide.md @@ -73,7 +73,7 @@ To start a Kubemark cluster on GCE you need to create an external cluster (it ca `make quick-release`) and run `test/kubemark/start-kubemark.sh` script. This script will create a VM for master components, Pods for HollowNodes and do all the setup necessary to let them talk to each other. It will use the configuration stored in `cluster/kubemark/config-default.sh` - you can tweak it however you want, but note that some features may not be implemented yet, as implementation of Hollow components/mocks will probably be lagging behind ‘real’ one. For performance tests interesting variables are -`NUM_MINIONS` and `MASTER_SIZE`. After start-kubemark script is finished you’ll have a ready Kubemark cluster, a kubeconfig file for talking to the Kubemark +`NUM_NODES` and `MASTER_SIZE`. After start-kubemark script is finished you’ll have a ready Kubemark cluster, a kubeconfig file for talking to the Kubemark cluster is stored in `test/kubemark/kubeconfig.loc`. Currently we're running HollowNode with limit of 0.05 a CPU core and ~60MB or memory, which taking into account default cluster addons and fluentD running on an 'external' -- cgit v1.2.3 From 7c869b4d00b438bccece82d38ba3f13570ee8877 Mon Sep 17 00:00:00 2001 From: Ravi Gadde Date: Thu, 3 Sep 2015 23:50:14 -0700 Subject: Scheduler extension --- scheduler_extender.md | 117 ++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 117 insertions(+) create mode 100644 scheduler_extender.md diff --git a/scheduler_extender.md b/scheduler_extender.md new file mode 100644 index 00000000..0c10de59 --- /dev/null +++ b/scheduler_extender.md @@ -0,0 +1,117 @@ + + + + +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/design/scheduler_extender.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- + + + + + +# Scheduler extender + +There are three ways to add new scheduling rules (predicates and priority functions) to Kubernetes: (1) by adding these rules to the scheduler and recompiling (described here: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/scheduler.md), (2) implementing your own scheduler process that runs instead of, or alongside of, the standard Kubernetes scheduler, (3) implementing a "scheduler extender" process that the standard Kubernetes scheduler calls out to as a final pass when making scheduling decisions. + +This document describes the third approach. This approach is needed for use cases where scheduling decisions need to be made on resources not directly managed by the standard Kubernetes scheduler. The extender helps make scheduling decisions based on such resources. (Note that the three approaches are not mutually exclusive.) + +When scheduling a pod, the extender allows an external process to filter and prioritize nodes. Two separate http/https calls are issued to the extender, one for "filter" and one for "prioritize" actions. To use the extender, you must create a scheduler policy configuration file. The configuration specifies how to reach the extender, whether to use http or https and the timeout. + +```go +// Holds the parameters used to communicate with the extender. If a verb is unspecified/empty, +// it is assumed that the extender chose not to provide that extension. +type ExtenderConfig struct { + // URLPrefix at which the extender is available + URLPrefix string `json:"urlPrefix"` + // Verb for the filter call, empty if not supported. This verb is appended to the URLPrefix when issuing the filter call to extender. + FilterVerb string `json:"filterVerb,omitempty"` + // Verb for the prioritize call, empty if not supported. This verb is appended to the URLPrefix when issuing the prioritize call to extender. + PrioritizeVerb string `json:"prioritizeVerb,omitempty"` + // The numeric multiplier for the node scores that the prioritize call generates. + // The weight should be a positive integer + Weight int `json:"weight,omitempty"` + // EnableHttps specifies whether https should be used to communicate with the extender + EnableHttps bool `json:"enableHttps,omitempty"` + // TLSConfig specifies the transport layer security config + TLSConfig *client.TLSClientConfig `json:"tlsConfig,omitempty"` + // HTTPTimeout specifies the timeout duration for a call to the extender. Filter timeout fails the scheduling of the pod. Prioritize + // timeout is ignored, k8s/other extenders priorities are used to select the node. + HTTPTimeout time.Duration `json:"httpTimeout,omitempty"` +} +``` + +A sample scheduler policy file with extender configuration: + +```json +{ + "predicates": [ + { + "name": "HostName" + }, + { + "name": "MatchNodeSelector" + }, + { + "name": "PodFitsResources" + } + ], + "priorities": [ + { + "name": "LeastRequestedPriority", + "weight": 1 + } + ], + "extenders": [ + { + "urlPrefix": "http://127.0.0.1:12345/api/scheduler", + "filterVerb": "filter", + "enableHttps": false + } + ] +} +``` + +Arguments passed to the FilterVerb endpoint on the extender are the set of nodes filtered through the k8s predicates and the pod. Arguments passed to the PrioritizeVerb endpoint on the extender are the set of nodes filtered through the k8s predicates and extender predicates and the pod. + +```go +// ExtenderArgs represents the arguments needed by the extender to filter/prioritize +// nodes for a pod. +type ExtenderArgs struct { + // Pod being scheduled + Pod api.Pod `json:"pod"` + // List of candidate nodes where the pod can be scheduled + Nodes api.NodeList `json:"nodes"` +} +``` + +The "filter" call returns a list of nodes (api.NodeList). The "prioritize" call returns priorities for each node (schedulerapi.HostPriorityList). + +The "filter" call may prune the set of nodes based on its predicates. Scores returned by the "prioritize" call are added to the k8s scores (computed through its priority functions) and used for final host selection. + +Multiple extenders can be configured in the scheduler policy. + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/scheduler_extender.md?pixel)]() + -- cgit v1.2.3 From b0542299ca51e3cbefd0c36b042a392ca407c098 Mon Sep 17 00:00:00 2001 From: Chao Xu Date: Fri, 30 Oct 2015 15:32:44 -0700 Subject: change the "too old resource version" error from InternalError to 410 Gone. --- api-conventions.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/api-conventions.md b/api-conventions.md index cf389231..9a71fe1c 100644 --- a/api-conventions.md +++ b/api-conventions.md @@ -547,6 +547,10 @@ The following HTTP status codes may be returned by the API. * * If updating an existing resource: * See `Conflict` from the `status` response section below on how to retrieve more information about the nature of the conflict. * GET and compare the fields in the pre-existing object, merge changes (if still valid according to preconditions), and retry with the updated request (including `ResourceVersion`). +* `410 StatusGone` + * Indicates that the item is no longer available at the server and no forwarding address is known. + * Suggested client recovery behavior + * Do not retry. Fix the request. * `422 StatusUnprocessableEntity` * Indicates that the requested create or update operation cannot be completed due to invalid data provided as part of the request. * Suggested client recovery behavior -- cgit v1.2.3 From 0f4b7ce1b071c0eb3e18a63067e953f3f896cd86 Mon Sep 17 00:00:00 2001 From: Isaac Hollander McCreery Date: Wed, 25 Nov 2015 16:22:11 -0800 Subject: Add "supported releases" language to versioning.md --- versioning.md | 23 ++++++++++++++++++----- 1 file changed, 18 insertions(+), 5 deletions(-) diff --git a/versioning.md b/versioning.md index def20a03..399752d8 100644 --- a/versioning.md +++ b/versioning.md @@ -45,7 +45,7 @@ Legend: * Kube X.Y.0-alpha.W, W > 0: Alpha releases are released roughly every two weeks directly from the master branch. No cherrypick releases. If there is a critical bugfix, a new release from master can be created ahead of schedule. * Kube X.Y.Z-beta.W: When master is feature-complete for Kube X.Y, we will cut the release-X.Y branch 2 weeks prior to the desired X.Y.0 date and cherrypick only PRs essential to X.Y. This cut will be marked as X.Y.0-beta.0, and master will be revved to X.Y+1.0-alpha.0. If we're not satisfied with X.Y.0-beta.0, we'll release other beta releases, (X.Y.0-beta.W | W > 0) as necessary. * Kube X.Y.0: Final release, cut from the release-X.Y branch cut two weeks prior. X.Y.1-beta.0 will be tagged at the same commit on the same branch. X.Y.0 occur 3 to 4 months after X.Y-1.0. -* Kube X.Y.Z, Z > 0: [Patch releases](#patches) are released as we cherrypick commits into the release-X.Y branch, (which is at X.Y.Z-beta.W,) as needed. X.Y.Z is cut straight from the release-X.Y branch, and X.Y.Z+1-beta.0 is tagged on the same commit. +* Kube X.Y.Z, Z > 0: [Patch releases](#patch-releases) are released as we cherrypick commits into the release-X.Y branch, (which is at X.Y.Z-beta.W,) as needed. X.Y.Z is cut straight from the release-X.Y branch, and X.Y.Z+1-beta.0 is tagged on the same commit. ### Major version timeline @@ -55,7 +55,19 @@ There is no mandated timeline for major versions. They only occur when we need t * Continuous integration versions also exist, and are versioned off of alpha and beta releases. X.Y.Z-alpha.W.C+aaaa is C commits after X.Y.Z-alpha.W, with an additional +aaaa build suffix added; X.Y.Z-beta.W.C+bbbb is C commits after X.Y.Z-beta.W, with an additional +bbbb build suffix added. Furthermore, builds that are built off of a dirty build tree, (during development, with things in the tree that are not checked it,) it will be appended with -dirty. -## Release versions as related to API versions +### Supported releases + +We expect users to stay reasonably up-to-date with the versions of Kubernetes they use in production, but understand that it may take time to upgrade. + +We expect users to be running approximately the latest patch release of a given minor release; we often include critical bug fixes in [patch releases](#patch-release), and so encourage users to upgrade as soon as possible. Furthermore, we expect to "support" three minor releases at a time. With minor releases happening approximately every three months, that means a minor release is supported for approximately nine months. For example, when v1.3 comes out, v1.0 will no longer be considered "fit for use": basically, that means that the reasonable response to the question "my v1.0 cluster isn't working," is, "you should probably upgrade it, (and probably should have some time ago)". + +This does *not* mean that we expect to introduce breaking changes between v1.0 and v1.3, but it does mean that we probably won't have reasonable confidence in clusters where some components are running at v1.0 and others running at v1.3. + +This policy is in line with [GKE's supported upgrades policy](https://cloud.google.com/container-engine/docs/clusters/upgrade). + +## API versioning + +### Release versions as related to API versions Here is an example major release cycle: @@ -67,11 +79,11 @@ Here is an example major release cycle: * Before Kube 2.0 is cut, API v2 must be released in 1.x. This enables two things: (1) users can upgrade to API v2 when running Kube 1.x and then switch over to Kube 2.x transparently, and (2) in the Kube 2.0 release itself we can cleanup and remove all API v2beta\* versions because no one should have v2beta\* objects left in their database. As mentioned above, tooling will exist to make sure there are no calls or references to a given API version anywhere inside someone's kube installation before someone upgrades. * Kube 2.0 must include the v1 API, but Kube 3.0 must include the v2 API only. It *may* include the v1 API as well if the burden is not high - this will be determined on a per-major-version basis. -### Rationale for API v2 being complete before v2.0's release +#### Rationale for API v2 being complete before v2.0's release It may seem a bit strange to complete the v2 API before v2.0 is released, but *adding* a v2 API is not a breaking change. *Removing* the v2beta\* APIs *is* a breaking change, which is what necessitates the major version bump. There are other ways to do this, but having the major release be the fresh start of that release's API without the baggage of its beta versions seems most intuitive out of the available options. -## Patches +## Patch releases Patch releases are intended for critical bug fixes to the latest minor version, such as addressing security vulnerabilities, fixes to problems affecting a large number of users, severe problems with no workaround, and blockers for products based on Kubernetes. @@ -82,8 +94,9 @@ Dependencies, such as Docker or Etcd, should also not be changed unless absolute ## Upgrades * Users can upgrade from any Kube 1.x release to any other Kube 1.x release as a rolling upgrade across their cluster. (Rolling upgrade means being able to upgrade the master first, then one node at a time. See #4855 for details.) + * However, we do not recommend upgrading more than two minor releases at a time (see [Supported releases](#supported-releases)), and do not recommend running non-latest patch releases of a given minor release. * No hard breaking changes over version boundaries. - * For example, if a user is at Kube 1.x, we may require them to upgrade to Kube 1.x+y before upgrading to Kube 2.x. In others words, an upgrade across major versions (e.g. Kube 1.x to Kube 2.x) should effectively be a no-op and as graceful as an upgrade from Kube 1.x to Kube 1.x+1. But you can require someone to go from 1.x to 1.x+y before they go to 2.x. + * For example, if a user is at Kube 1.x, we may require them to upgrade to Kube 1.x+y before upgrading to Kube 2.x. In others words, an upgrade across major versions (e.g. Kube 1.x to Kube 2.x) should effectively be a no-op and as graceful as an upgrade from Kube 1.x to Kube 1.x+1. But you can require someone to go from 1.x to 1.x+y before they go to 2.x. There is a separate question of how to track the capabilities of a kubelet to facilitate rolling upgrades. That is not addressed here. -- cgit v1.2.3 From 93bf8f287b291a432252d449ce91557389adb791 Mon Sep 17 00:00:00 2001 From: mqliang Date: Fri, 27 Nov 2015 23:00:31 +0800 Subject: optimize priority functions --- multiple-schedulers.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/multiple-schedulers.md b/multiple-schedulers.md index 51466008..4fadd601 100644 --- a/multiple-schedulers.md +++ b/multiple-schedulers.md @@ -152,6 +152,12 @@ which the client has set a scheduler annotation that does not correspond to a ru 2. Dynamic launching scheduler(s) and registering to admission controller (as an external call). This also requires some work on authorization and authentication to control what schedulers can write the /binding subresource of which pods. +3. Optimize the behaviors of priority functions in multi-scheduler scenario. In the case where multiple schedulers have +the same predicate and priority functions (for example, when using multiple schedulers for parallelism rather than to +customize the scheduling policies), all schedulers would tend to pick the same node as "best" when scheduling identical +pods and therefore would be likely to conflict on the Kubelet. To solve this problem, we can pass +an optional flag such as `--randomize-node-selection=N` to scheduler, setting this flag would cause the scheduler to pick +randomly among the top N nodes instead of the one with the highest score. ## Other issues/discussions related to scheduler design -- cgit v1.2.3 From 5a8872ff13c3b70dea5458c481b39e2fee2bf489 Mon Sep 17 00:00:00 2001 From: Isaac Hollander McCreery Date: Tue, 1 Dec 2015 14:07:23 -0800 Subject: Clarify what is meant by 'support' --- versioning.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/versioning.md b/versioning.md index 399752d8..ab7d7ecb 100644 --- a/versioning.md +++ b/versioning.md @@ -59,7 +59,7 @@ There is no mandated timeline for major versions. They only occur when we need t We expect users to stay reasonably up-to-date with the versions of Kubernetes they use in production, but understand that it may take time to upgrade. -We expect users to be running approximately the latest patch release of a given minor release; we often include critical bug fixes in [patch releases](#patch-release), and so encourage users to upgrade as soon as possible. Furthermore, we expect to "support" three minor releases at a time. With minor releases happening approximately every three months, that means a minor release is supported for approximately nine months. For example, when v1.3 comes out, v1.0 will no longer be considered "fit for use": basically, that means that the reasonable response to the question "my v1.0 cluster isn't working," is, "you should probably upgrade it, (and probably should have some time ago)". +We expect users to be running approximately the latest patch release of a given minor release; we often include critical bug fixes in [patch releases](#patch-release), and so encourage users to upgrade as soon as possible. Furthermore, we expect to "support" three minor releases at a time. "Support" means we expect users to be running that version in production, though we may not port fixes back before the latest minor version. For example, when v1.3 comes out, v1.0 will no longer be supported: basically, that means that the reasonable response to the question "my v1.0 cluster isn't working," is, "you should probably upgrade it, (and probably should have some time ago)". With minor releases happening approximately every three months, that means a minor release is supported for approximately nine months. This does *not* mean that we expect to introduce breaking changes between v1.0 and v1.3, but it does mean that we probably won't have reasonable confidence in clusters where some components are running at v1.0 and others running at v1.3. -- cgit v1.2.3 From c821f1f430b0525b76e27ab346b87fcb323b2455 Mon Sep 17 00:00:00 2001 From: Alex Robinson Date: Tue, 1 Dec 2015 22:24:58 -0800 Subject: Typo fixes in docs --- extending-api.md | 28 ++++++++++++++-------------- 1 file changed, 14 insertions(+), 14 deletions(-) diff --git a/extending-api.md b/extending-api.md index 303ebeac..1f76235f 100644 --- a/extending-api.md +++ b/extending-api.md @@ -41,7 +41,7 @@ This document describes the design for implementing the storage of custom API ty ### The ThirdPartyResource The `ThirdPartyResource` resource describes the multiple versions of a custom resource that the user wants to add -to the Kubernetes API. `ThirdPartyResource` is a non-namespaced resource, attempting to place it in a resource +to the Kubernetes API. `ThirdPartyResource` is a non-namespaced resource; attempting to place it in a namespace will return an error. Each `ThirdPartyResource` resource has the following: @@ -63,18 +63,18 @@ only specifies: Every object that is added to a third-party Kubernetes object store is expected to contain Kubernetes compatible [object metadata](../devel/api-conventions.md#metadata). This requirement enables the Kubernetes API server to provide the following features: - * Filtering lists of objects via LabelQueries + * Filtering lists of objects via label queries * `resourceVersion`-based optimistic concurrency via compare-and-swap * Versioned storage * Event recording - * Integration with basic `kubectl` command line tooling. - * Watch for resource changes. + * Integration with basic `kubectl` command line tooling + * Watch for resource changes The `Kind` for an instance of a third-party object (e.g. CronTab) below is expected to be programmatically convertible to the name of the resource using -the following conversion. Kinds are expected to be of the form ``, the +the following conversion. Kinds are expected to be of the form ``, and the `APIVersion` for the object is expected to be `/`. To -prevent collisions, it's expected that you'll use a fulling qualified domain +prevent collisions, it's expected that you'll use a fully qualified domain name for the API group, e.g. `example.com`. For example `stable.example.com/v1` @@ -106,8 +106,8 @@ This is also the reason why `ThirdPartyResource` is not namespaced. ## Usage When a user creates a new `ThirdPartyResource`, the Kubernetes API Server reacts by creating a new, namespaced -RESTful resource path. For now, non-namespaced objects are not supported. As with existing built-in objects -deleting a namespace, deletes all third party resources in that namespace. +RESTful resource path. For now, non-namespaced objects are not supported. As with existing built-in objects, +deleting a namespace deletes all third party resources in that namespace. For example, if a user creates: @@ -136,7 +136,7 @@ Now that this schema has been created, a user can `POST`: "apiVersion": "stable.example.com/v1", "kind": "CronTab", "cronSpec": "* * * * /5", - "image": "my-awesome-chron-image" + "image": "my-awesome-cron-image" } ``` @@ -171,14 +171,14 @@ and get back: "apiVersion": "stable.example.com/v1", "kind": "CronTab", "cronSpec": "* * * * /5", - "image": "my-awesome-chron-image" + "image": "my-awesome-cron-image" } ] } ``` Because all objects are expected to contain standard Kubernetes metadata fields, these -list operations can also use `Label` queries to filter requests down to specific subsets. +list operations can also use label queries to filter requests down to specific subsets. Likewise, clients can use watch endpoints to watch for changes to stored objects. @@ -196,10 +196,10 @@ Each custom object stored by the API server needs a custom key in storage, this #### Definitions - * `resource-namespace` : the namespace of the particular resource that is being stored + * `resource-namespace`: the namespace of the particular resource that is being stored * `resource-name`: the name of the particular resource being stored - * `third-party-resource-namespace`: the namespace of the `ThirdPartyResource` resource that represents the type for the specific instance being stored. - * `third-party-resource-name`: the name of the `ThirdPartyResource` resource that represents the type for the specific instance being stored. + * `third-party-resource-namespace`: the namespace of the `ThirdPartyResource` resource that represents the type for the specific instance being stored + * `third-party-resource-name`: the name of the `ThirdPartyResource` resource that represents the type for the specific instance being stored #### Key -- cgit v1.2.3 From a608d8c1bd88bee419ca4ab64bb174f670ec90d7 Mon Sep 17 00:00:00 2001 From: Brad Erickson Date: Sun, 8 Nov 2015 23:08:58 -0800 Subject: Minion->Name rename: cluster/vagrant, docs and Vagrantfile --- developer-guides/vagrant.md | 42 +++++++++++++++++++++--------------------- 1 file changed, 21 insertions(+), 21 deletions(-) diff --git a/developer-guides/vagrant.md b/developer-guides/vagrant.md index 2d628abb..74e29e3a 100644 --- a/developer-guides/vagrant.md +++ b/developer-guides/vagrant.md @@ -47,7 +47,7 @@ Running kubernetes with Vagrant (and VirtualBox) is an easy way to run/test/deve ### Setup -By default, the Vagrant setup will create a single master VM (called kubernetes-master) and one node (called kubernetes-minion-1). Each VM will take 1 GB, so make sure you have at least 2GB to 4GB of free memory (plus appropriate free disk space). To start your local cluster, open a shell and run: +By default, the Vagrant setup will create a single master VM (called kubernetes-master) and one node (called kubernetes-node-1). Each VM will take 1 GB, so make sure you have at least 2GB to 4GB of free memory (plus appropriate free disk space). To start your local cluster, open a shell and run: ```sh cd kubernetes @@ -74,14 +74,14 @@ To access the master or any node: ```sh vagrant ssh master -vagrant ssh minion-1 +vagrant ssh node-1 ``` If you are running more than one nodes, you can access the others by: ```sh -vagrant ssh minion-2 -vagrant ssh minion-3 +vagrant ssh node-2 +vagrant ssh node-3 ``` To view the service status and/or logs on the kubernetes-master: @@ -101,11 +101,11 @@ $ vagrant ssh master To view the services on any of the nodes: ```console -$ vagrant ssh minion-1 -[vagrant@kubernetes-minion-1] $ sudo systemctl status docker -[vagrant@kubernetes-minion-1] $ sudo journalctl -r -u docker -[vagrant@kubernetes-minion-1] $ sudo systemctl status kubelet -[vagrant@kubernetes-minion-1] $ sudo journalctl -r -u kubelet +$ vagrant ssh node-1 +[vagrant@kubernetes-node-1] $ sudo systemctl status docker +[vagrant@kubernetes-node-1] $ sudo journalctl -r -u docker +[vagrant@kubernetes-node-1] $ sudo systemctl status kubelet +[vagrant@kubernetes-node-1] $ sudo journalctl -r -u kubelet ``` ### Interacting with your Kubernetes cluster with Vagrant. @@ -139,9 +139,9 @@ You may need to build the binaries first, you can do this with `make` $ ./cluster/kubectl.sh get nodes NAME LABELS STATUS -kubernetes-minion-0whl kubernetes.io/hostname=kubernetes-minion-0whl Ready -kubernetes-minion-4jdf kubernetes.io/hostname=kubernetes-minion-4jdf Ready -kubernetes-minion-epbe kubernetes.io/hostname=kubernetes-minion-epbe Ready +kubernetes-node-0whl kubernetes.io/hostname=kubernetes-node-0whl Ready +kubernetes-node-4jdf kubernetes.io/hostname=kubernetes-node-4jdf Ready +kubernetes-node-epbe kubernetes.io/hostname=kubernetes-node-epbe Ready ``` ### Interacting with your Kubernetes cluster with the `kube-*` scripts. @@ -206,9 +206,9 @@ Your cluster is running, you can list the nodes in your cluster: $ ./cluster/kubectl.sh get nodes NAME LABELS STATUS -kubernetes-minion-0whl kubernetes.io/hostname=kubernetes-minion-0whl Ready -kubernetes-minion-4jdf kubernetes.io/hostname=kubernetes-minion-4jdf Ready -kubernetes-minion-epbe kubernetes.io/hostname=kubernetes-minion-epbe Ready +kubernetes-node-0whl kubernetes.io/hostname=kubernetes-node-0whl Ready +kubernetes-node-4jdf kubernetes.io/hostname=kubernetes-node-4jdf Ready +kubernetes-node-epbe kubernetes.io/hostname=kubernetes-node-epbe Ready ``` Now start running some containers! @@ -245,11 +245,11 @@ my-nginx-kqdjk 1/1 Waiting 0 33s my-nginx-nyj3x 1/1 Waiting 0 33s ``` -You need to wait for the provisioning to complete, you can monitor the minions by doing: +You need to wait for the provisioning to complete, you can monitor the nodes by doing: ```console -$ sudo salt '*minion-1' cmd.run 'docker images' -kubernetes-minion-1: +$ sudo salt '*node-1' cmd.run 'docker images' +kubernetes-node-1: REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE 96864a7d2df3 26 hours ago 204.4 MB kubernetes/pause latest 6c4579af347b 8 weeks ago 239.8 kB @@ -258,8 +258,8 @@ kubernetes-minion-1: Once the docker image for nginx has been downloaded, the container will start and you can list it: ```console -$ sudo salt '*minion-1' cmd.run 'docker ps' -kubernetes-minion-1: +$ sudo salt '*node-1' cmd.run 'docker ps' +kubernetes-node-1: CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES dbe79bf6e25b nginx:latest "nginx" 21 seconds ago Up 19 seconds k8s--mynginx.8c5b8a3a--7813c8bd_-_3ffe_-_11e4_-_9036_-_0800279696e1.etcd--7813c8bd_-_3ffe_-_11e4_-_9036_-_0800279696e1--fcfa837f fa0e29c94501 kubernetes/pause:latest "/pause" 8 minutes ago Up 8 minutes 0.0.0.0:8080->80/tcp k8s--net.a90e7ce4--7813c8bd_-_3ffe_-_11e4_-_9036_-_0800279696e1.etcd--7813c8bd_-_3ffe_-_11e4_-_9036_-_0800279696e1--baf5b21b @@ -346,7 +346,7 @@ It's very likely you see a build error due to an error in your source files! #### I have brought Vagrant up but the nodes won't validate! -Are you sure you built a release first? Did you install `net-tools`? For more clues, login to one of the nodes (`vagrant ssh minion-1`) and inspect the salt minion log (`sudo cat /var/log/salt/minion`). +Are you sure you built a release first? Did you install `net-tools`? For more clues, login to one of the nodes (`vagrant ssh node-1`) and inspect the salt minion log (`sudo cat /var/log/salt/minion`). #### I want to change the number of nodes! -- cgit v1.2.3 From 89a325502edadb78a52e29d6a765c162836c1354 Mon Sep 17 00:00:00 2001 From: Jerzy Szczepkowski Date: Wed, 18 Nov 2015 15:15:05 +0100 Subject: Design proposal for custom metrics. Design proposal for custom metrics. --- custom-metrics.md | 134 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 134 insertions(+) create mode 100644 custom-metrics.md diff --git a/custom-metrics.md b/custom-metrics.md new file mode 100644 index 00000000..6cdf1624 --- /dev/null +++ b/custom-metrics.md @@ -0,0 +1,134 @@ + + + + +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/proposals/custom-metrics.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- + + + + + +# Custom Metrics in Kubernetes + + +## Preface + +Our aim is to create a mechanism in Kubernetes that will allow pods to expose custom system metrics, collect them, and make them accessible. +Custom metrics are needed by: +* horizontal pod autoscaler, for autoscaling the number of pods based on them; +* scheduler, for using them in a sophisticated scheduling algorithm. + +High level goals for our solution for version 1.2 are: +* easy to use (it should be easy to export custom metric from user's application), +* works for most application (it should be easy to configure monitoring for third-party applications), +* performance & scalability (the largest supported cluster should be able to handle ~5 custom metrics per pod with reporting latency ~30 seconds). + +For version 1.2, we are not going to address the following issues (non-goals): +* security of access to custom metrics, +* general monitoring of application health by a user +(out of the heapster scope, see [#665](https://github.com/kubernetes/heapster/issues/665)). + +## Design + +For the Kubernetes version 1.2, we plan to implement aggregation of pod custom metrics in Prometheus format by pull. + +Each pod, to expose custom metrics, will expose a set of Prometheus endpoints. +(For version 1.2, we assume that custom metrics are not private information and they are accessible by everyone. +In future, we may restrict it by making the endpoints accessible only by kubelet/cAdvisor). +CAdvisor will collect metrics from such endpoints of all pods on each node by pulling, and expose them to Heapster. +Heapster will: +* collect custom metrics from all CAdvisors in the cluster, together with pulling system metrics +(for version 1.2: we assume pooling period of ~30 seconds), +* store them in a metrics backend (influxDB, Prometheus, Hawkular, GCM, …), +* expose the latest snapshot of custom metrics for queries (by HPA/scheduler/…) using [model API](https://github.com/kubernetes/heapster/blob/master/docs/model.md). + +User can easily expose Prometheus metrics for her own application by using Prometheus [client](http://prometheus.io/docs/instrumenting/clientlibs/) library. +To monitor third-party applications, Prometheus [exporters](http://prometheus.io/docs/instrumenting/exporters/) run as side-cars containers may be used. + +For version 1.2, to prevent a huge number of metrics negatively affect the system performance, +the number of metrics that can be exposed by each pod will be limited to the configurable value (default: 5). +In future, we will need a way to cap the number of exposed metrics per pod, +one of possible solutions is using LimtRanger admission control plugin. + +In future versions (later than 1.2), we want to extend our solution by: +* accepting pod metrics exposed in different formats than Prometheus +(collecting of the different formats will need to be supported by cAdvisor), +* support push metrics by exposing push API on heapster (e.g. in StatsD format) or on a local node collector +(if heapster performance is insufficient), +* support metrics not associated with an individual pod. + + +## API + +For Kubernetes 1.2, defining pod Prometheus endpoints will be done using annotations. +Later, when we are sure that our API is correct and stable, we will make it a part of `PodSpec`. + +We will add a new optional pod annotation with the following key: `metrics.alpha.kubernetes.io/custom-endpoints`. +It will contain a string-value in JSON format. +The value will be a list of tuples defining ports, paths and API +(currently, we will support only Prometheus API, this will be also the default value if format is empty) +of metrics endpoints exposed by the pod, and names of metrics which should be taken from the endpoint (obligatory, no more than the configurable limit). + +The annotation will be interpreted by kubelet during pod creation. +It will not be possible to add/delete/edit it during the life time of a pod: such operations will be rejected. + +For example, the following configuration: + +``` +"metrics.alpha.kubernetes.io/custom-endpoints" = [ + { + "api": "prometheus", + "path": "/status", + "port": "8080", + "names": ["qps", "activeConnections"] + }, + { + "path": "/metrics", + "port": "9090" + "names": ["myMetric"] + } +] +``` + +will expose metrics with names `qps` and `activeConnections` from `localhost:8080/status` and metric `myMetric` from `localhost:9090/metrics`. +Please note that both endpoints are in Prometheus format. + + +## Implementation notes + +1. Kubelet will parse value of `metrics.alpha.kubernetes.io/custom-endpoints` annotation for pods. +In case of error, pod will not be started (will be marked as failed) and kubelet will generate `FailedToCreateContainer` event with appropriate message +(we will not introduce any new event type, as types of events are considered a part of kubelet API and we do not want to change it). + +1. Kubelet will use application metrics in CAdvisor for implementation: + * It will create a configuration file for CAdvisor based on the annotation, + * It will mount this file as a part of a docker image to run, + * It will set a docker label for the image to point CAdvisor to this file. + + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/custom-metrics.md?pixel)]() + -- cgit v1.2.3 From 7aa2f920eaa3eeffee040b117ecf3c28e5820337 Mon Sep 17 00:00:00 2001 From: deads2k Date: Fri, 23 Jan 2015 10:37:11 -0500 Subject: enhance pluggable policy --- enhance-pluggable-policy.md | 379 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 379 insertions(+) create mode 100644 enhance-pluggable-policy.md diff --git a/enhance-pluggable-policy.md b/enhance-pluggable-policy.md new file mode 100644 index 00000000..6a881250 --- /dev/null +++ b/enhance-pluggable-policy.md @@ -0,0 +1,379 @@ + + + + +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/design/enhance-pluggable-policy.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- + + + + + +# Enhance Pluggable Policy + +While trying to develop an authorization plugin for Kubernetes, we found a few places where API extensions would ease development and add power. There are a few goals: + 1. Provide an authorization plugin that can evaluate a .Authorize() call based on the full content of the request to RESTStorage. This includes information like the full verb, the content of creates and updates, and the names of resources being acted upon. + 1. Provide a way to ask whether a user is permitted to take an action without running in process with the API Authorizer. For instance, a proxy for exec calls could ask whether a user can run the exec they are requesting. + 1. Provide a way to ask who can perform a given action on a given resource. This is useful for answering questions like, "who can create replication controllers in my namespace". + +This proposal adds to and extends the existing API to so that authorizers may provide the functionality described above. It does not attempt to describe how the policies themselves can be expressed, that is up the authorization plugins themselves. + + +## Enhancements to existing Authorization interfaces + +The existing Authorization interfaces are described here: [docs/admin/authorization.md](../admin/authorization.md). A couple additions will allow the development of an Authorizer that matches based on different rules than the existing implementation. + +### Request Attributes + +The existing authorizer.Attributes only has 5 attributes (user, groups, isReadOnly, kind, and namespace). If we add more detailed verbs, content, and resource names, then Authorizer plugins will have the same level of information available to RESTStorage components in order to express more detailed policy. The replacement excerpt is below. + +An API request has the following attributes that can be considered for authorization: + - user - the user-string which a user was authenticated as. This is included in the Context. + - groups - the groups to which the user belongs. This is included in the Context. + - verb - string describing the requesting action. Today we have: get, list, watch, create, update, and delete. The old `readOnly` behavior is equivalent to allowing get, list, watch. + - namespace - the namespace of the object being access, or the empty string if the endpoint does not support namespaced objects. This is included in the Context. + - resourceGroup - the API group of the resource being accessed + - resourceVersion - the API version of the resource being accessed + - resource - which resource is being accessed + - applies only to the API endpoints, such as + `/api/v1beta1/pods`. For miscelaneous endpoints, like `/version`, the kind is the empty string. + - resourceName - the name of the resource during a get, update, or delete action. + - subresource - which subresource is being accessed + +A non-API request has 2 attributes: + - verb - the HTTP verb of the request + - path - the path of the URL being requested + + +### Authorizer Interface + +The existing Authorizer interface is very simple, but there isn't a way to provide details about allows, denies, or failures. The extended detail is useful for UIs that want to describe why certain actions are allowed or disallowed. Not all Authorizers will want to provide that information, but for those that do, having that capability is useful. In addition, adding a `GetAllowedSubjects` method that returns back the users and groups that can perform a particular action makes it possible to answer questions like, "who can see resources in my namespace" (see [ResourceAccessReview](#ResourceAccessReview) further down). + +```go +// OLD +type Authorizer interface { + Authorize(a Attributes) error +} +``` + +```go +// NEW +// Authorizer provides the ability to determine if a particular user can perform a particular action +type Authorizer interface { + // Authorize takes a Context (for namespace, user, and traceability) and Attributes to make a policy determination. + // reason is an optional return value that can describe why a policy decision was made. Reasons are useful during + // debugging when trying to figure out why a user or group has access to perform a particular action. + Authorize(ctx api.Context, a Attributes) (allowed bool, reason string, evaluationError error) +} + +// AuthorizerIntrospection is an optional interface that provides the ability to determine which users and groups can perform a particular action. +// This is useful for building caches of who can see what: for instance, "which namespaces can this user see". That would allow +// someone to see only the namespaces they are allowed to view instead of having to choose between listing them all or listing none. +type AuthorizerIntrospection interface { + // GetAllowedSubjects takes a Context (for namespace and traceability) and Attributes to determine which users and + // groups are allowed to perform the described action in the namespace. This API enables the ResourceBasedReview requests below + GetAllowedSubjects(ctx api.Context, a Attributes) (users util.StringSet, groups util.StringSet, evaluationError error) +} +``` + +### SubjectAccessReviews + +This set of APIs answers the question: can a user or group (use authenticated user if none is specified) perform a given action. Given the Authorizer interface (proposed or existing), this endpoint can be implemented generically against any Authorizer by creating the correct Attributes and making an .Authorize() call. + +There are three different flavors: + +1. `/apis/authorization.kubernetes.io/{version}/subjectAccessReviews` - this checks to see if a specified user or group can perform a given action at the cluster scope or across all namespaces. +This is a highly privileged operation. It allows a cluster-admin to inspect rights of any person across the entire cluster and against cluster level resources. +2. `/apis/authorization.kubernetes.io/{version}/personalSubjectAccessReviews` - this checks to see if the current user (including his groups) can perform a given action at any specified scope. +This is an unprivileged operation. It doesn't expose any information that a user couldn't discover simply by trying an endpoint themselves. +3. `/apis/authorization.kubernetes.io/{version}/ns/{namespace}/localSubjectAccessReviews` - this checks to see if a specified user or group can perform a given action in **this** namespace. +This is a moderately privileged operation. In a multi-tenant environment, have a namespace scoped resource makes it very easy to reason about powers granted to a namespace admin. +This allows a namespace admin (someone able to manage permissions inside of one namespaces, but not all namespaces), the power to inspect whether a given user or group +can manipulate resources in his namespace. + + +SubjectAccessReview is runtime.Object with associated RESTStorage that only accepts creates. The caller POSTs a SubjectAccessReview to this URL and he gets a SubjectAccessReviewResponse back. Here is an example of a call and its corresponding return. + +``` +// input +{ + "kind": "SubjectAccessReview", + "apiVersion": "authorization.kubernetes.io/v1", + "authorizationAttributes": { + "verb": "create", + "resource": "pods", + "user": "Clark", + "groups": ["admins", "managers"] + } +} + +// POSTed like this +curl -X POST /apis/authorization.kubernetes.io/{version}/subjectAccessReviews -d @subject-access-review.json +// or +accessReviewResult, err := Client.SubjectAccessReviews().Create(subjectAccessReviewObject) + +// output +{ + "kind": "SubjectAccessReviewResponse", + "apiVersion": "authorization.kubernetes.io/v1", + "allowed": true +} + +PersonalSubjectAccessReview is runtime.Object with associated RESTStorage that only accepts creates. The caller POSTs a PersonalSubjectAccessReview to this URL and he gets a SubjectAccessReviewResponse back. Here is an example of a call and its corresponding return. +``` + +// input +{ + "kind": "PersonalSubjectAccessReview", + "apiVersion": "authorization.kubernetes.io/v1", + "authorizationAttributes": { + "verb": "create", + "resource": "pods", + "namespace": "any-ns", + } +} + +// POSTed like this +curl -X POST /apis/authorization.kubernetes.io/{version}/personalSubjectAccessReviews -d @personal-subject-access-review.json +// or +accessReviewResult, err := Client.PersonalSubjectAccessReviews().Create(subjectAccessReviewObject) + +// output +{ + "kind": "PersonalSubjectAccessReviewResponse", + "apiVersion": "authorization.kubernetes.io/v1", + "allowed": true +} + +LocalSubjectAccessReview is runtime.Object with associated RESTStorage that only accepts creates. The caller POSTs a LocalSubjectAccessReview to this URL and he gets a LocalSubjectAccessReviewResponse back. Here is an example of a call and its corresponding return. + +``` +// input +{ + "kind": "LocalSubjectAccessReview", + "apiVersion": "authorization.kubernetes.io/v1", + "namespace": "my-ns" + "authorizationAttributes": { + "verb": "create", + "resource": "pods", + "user": "Clark", + "groups": ["admins", "managers"] + } +} + +// POSTed like this +curl -X POST /apis/authorization.kubernetes.io/{version}/localSubjectAccessReviews -d @local-subject-access-review.json +// or +accessReviewResult, err := Client.LocalSubjectAccessReviews().Create(localSubjectAccessReviewObject) + +// output +{ + "kind": "LocalSubjectAccessReviewResponse", + "apiVersion": "authorization.kubernetes.io/v1", + "namespace": "my-ns" + "allowed": true +} + + +``` + +The actual Go objects look like this: + +```go +type AuthorizationAttributes struct { + // Namespace is the namespace of the action being requested. Currently, there is no distinction between no namespace and all namespaces + Namespace string `json:"namespace" description:"namespace of the action being requested"` + // Verb is one of: get, list, watch, create, update, delete + Verb string `json:"verb" description:"one of get, list, watch, create, update, delete"` + // Resource is one of the existing resource types + ResourceGroup string `json:"resourceGroup" description:"group of the resource being requested"` + // ResourceVersion is the version of resource + ResourceVersion string `json:"resourceVersion" description:"version of the resource being requested"` + // Resource is one of the existing resource types + Resource string `json:"resource" description:"one of the existing resource types"` + // ResourceName is the name of the resource being requested for a "get" or deleted for a "delete" + ResourceName string `json:"resourceName" description:"name of the resource being requested for a get or delete"` + // Subresource is one of the existing subresources types + Subresource string `json:"subresource" description:"one of the existing subresources"` +} + +// SubjectAccessReview is an object for requesting information about whether a user or group can perform an action +type SubjectAccessReview struct { + kapi.TypeMeta `json:",inline"` + + // AuthorizationAttributes describes the action being tested. + AuthorizationAttributes `json:"authorizationAttributes" description:"the action being tested"` + // User is optional, but at least one of User or Groups must be specified + User string `json:"user" description:"optional, user to check"` + // Groups is optional, but at least one of User or Groups must be specified + Groups []string `json:"groups" description:"optional, list of groups to which the user belongs"` +} + +// SubjectAccessReviewResponse describes whether or not a user or group can perform an action +type SubjectAccessReviewResponse struct { + kapi.TypeMeta + + // Allowed is required. True if the action would be allowed, false otherwise. + Allowed bool + // Reason is optional. It indicates why a request was allowed or denied. + Reason string +} + +// PersonalSubjectAccessReview is an object for requesting information about whether a user or group can perform an action +type PersonalSubjectAccessReview struct { + kapi.TypeMeta `json:",inline"` + + // AuthorizationAttributes describes the action being tested. + AuthorizationAttributes `json:"authorizationAttributes" description:"the action being tested"` +} + +// PersonalSubjectAccessReviewResponse describes whether this user can perform an action +type PersonalSubjectAccessReviewResponse struct { + kapi.TypeMeta + + // Namespace is the namespace used for the access review + Namespace string + // Allowed is required. True if the action would be allowed, false otherwise. + Allowed bool + // Reason is optional. It indicates why a request was allowed or denied. + Reason string +} + +// LocalSubjectAccessReview is an object for requesting information about whether a user or group can perform an action +type LocalSubjectAccessReview struct { + kapi.TypeMeta `json:",inline"` + + // AuthorizationAttributes describes the action being tested. + AuthorizationAttributes `json:"authorizationAttributes" description:"the action being tested"` + // User is optional, but at least one of User or Groups must be specified + User string `json:"user" description:"optional, user to check"` + // Groups is optional, but at least one of User or Groups must be specified + Groups []string `json:"groups" description:"optional, list of groups to which the user belongs"` +} + +// LocalSubjectAccessReviewResponse describes whether or not a user or group can perform an action +type LocalSubjectAccessReviewResponse struct { + kapi.TypeMeta + + // Namespace is the namespace used for the access review + Namespace string + // Allowed is required. True if the action would be allowed, false otherwise. + Allowed bool + // Reason is optional. It indicates why a request was allowed or denied. + Reason string +} +``` + + +### ResourceAccessReview + +This set of APIs nswers the question: which users and groups can perform the specified verb on the specified resourceKind. Given the Authorizer interface described above, this endpoint can be implemented generically against any Authorizer by calling the .GetAllowedSubjects() function. + +There are two different flavors: + +1. `/apis/authorization.kubernetes.io/{version}/resourceAccessReview` - this checks to see which users and groups can perform a given action at the cluster scope or across all namespaces. +This is a highly privileged operation. It allows a cluster-admin to inspect rights of all subjects across the entire cluster and against cluster level resources. +2. `/apis/authorization.kubernetes.io/{version}/ns/{namespace}/localResourceAccessReviews` - this checks to see which users and groups can perform a given action in **this** namespace. +This is a moderately privileged operation. In a multi-tenant environment, have a namespace scoped resource makes it very easy to reason about powers granted to a namespace admin. +This allows a namespace admin (someone able to manage permissions inside of one namespaces, but not all namespaces), the power to inspect which users and groups +can manipulate resources in his namespace. + +ResourceAccessReview is a runtime.Object with associated RESTStorage that only accepts creates. The caller POSTs a ResourceAccessReview to this URL and he gets a ResourceAccessReviewResponse back. Here is an example of a call and its corresponding return. + +``` +// input +{ + "kind": "ResourceAccessReview", + "apiVersion": "authorization.kubernetes.io/v1", + "authorizationAttributes": { + "verb": "list", + "resource": "replicationcontrollers" + } +} + +// POSTed like this +curl -X POST /apis/authorization.kubernetes.io/{version}/resourceAccessReviews -d @resource-access-review.json +// or +accessReviewResult, err := Client.ResourceAccessReviews().Create(resourceAccessReviewObject) + +// output +{ + "kind": "ResourceAccessReviewResponse", + "apiVersion": "authorization.kubernetes.io/v1", + "namespace": "default" + "users": ["Clark", "Hubert"], + "groups": ["cluster-admins"] +} +``` + +The actual Go objects look like this: + +```go +// ResourceAccessReview is a means to request a list of which users and groups are authorized to perform the +// action specified by spec +type ResourceAccessReview struct { + kapi.TypeMeta `json:",inline"` + + // AuthorizationAttributes describes the action being tested. + AuthorizationAttributes `json:"authorizationAttributes" description:"the action being tested"` +} + +// ResourceAccessReviewResponse describes who can perform the action +type ResourceAccessReviewResponse struct { + kapi.TypeMeta + + // Users is the list of users who can perform the action + Users []string + // Groups is the list of groups who can perform the action + Groups []string +} + +// LocalResourceAccessReview is a means to request a list of which users and groups are authorized to perform the +// action specified in a specific namespace +type LocalResourceAccessReview struct { + kapi.TypeMeta `json:",inline"` + + // AuthorizationAttributes describes the action being tested. + AuthorizationAttributes `json:"authorizationAttributes" description:"the action being tested"` +} + +// LocalResourceAccessReviewResponse describes who can perform the action +type LocalResourceAccessReviewResponse struct { + kapi.TypeMeta + + // Namespace is the namespace used for the access review + Namespace string + // Users is the list of users who can perform the action + Users []string + // Groups is the list of groups who can perform the action + Groups []string +} + +``` + + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/enhance-pluggable-policy.md?pixel)]() + -- cgit v1.2.3 From 72006b384dd3e13ecabced57cd27c59d7f61c247 Mon Sep 17 00:00:00 2001 From: Brad Erickson Date: Thu, 3 Dec 2015 15:42:10 -0800 Subject: Minion->Node rename: docs/ machine names only, except gce/aws --- compute-resource-metrics-api.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/compute-resource-metrics-api.md b/compute-resource-metrics-api.md index b716dbd8..25ec76cd 100644 --- a/compute-resource-metrics-api.md +++ b/compute-resource-metrics-api.md @@ -68,7 +68,7 @@ user via a periodically refreshing interface similar to `top` on Unix-like systems. This info could let users assign resource limits more efficiently. ``` -$ kubectl top kubernetes-minion-abcd +$ kubectl top kubernetes-node-abcd POD CPU MEM monitoring-heapster-abcde 0.12 cores 302 MB kube-ui-v1-nd7in 0.07 cores 130 MB -- cgit v1.2.3 From 77f62c05d2577f3eae2c07fa513d7334e8241e98 Mon Sep 17 00:00:00 2001 From: Brad Erickson Date: Thu, 3 Dec 2015 15:42:10 -0800 Subject: Minion->Node rename: docs/ machine names only, except gce/aws --- developer-guides/vagrant.md | 12 ++++++------ flaky-tests.md | 6 +++--- 2 files changed, 9 insertions(+), 9 deletions(-) diff --git a/developer-guides/vagrant.md b/developer-guides/vagrant.md index 74e29e3a..14ccfe6b 100644 --- a/developer-guides/vagrant.md +++ b/developer-guides/vagrant.md @@ -139,9 +139,9 @@ You may need to build the binaries first, you can do this with `make` $ ./cluster/kubectl.sh get nodes NAME LABELS STATUS -kubernetes-node-0whl kubernetes.io/hostname=kubernetes-node-0whl Ready -kubernetes-node-4jdf kubernetes.io/hostname=kubernetes-node-4jdf Ready -kubernetes-node-epbe kubernetes.io/hostname=kubernetes-node-epbe Ready +kubernetes-node-0whl kubernetes.io/hostname=kubernetes-node-0whl Ready +kubernetes-node-4jdf kubernetes.io/hostname=kubernetes-node-4jdf Ready +kubernetes-node-epbe kubernetes.io/hostname=kubernetes-node-epbe Ready ``` ### Interacting with your Kubernetes cluster with the `kube-*` scripts. @@ -206,9 +206,9 @@ Your cluster is running, you can list the nodes in your cluster: $ ./cluster/kubectl.sh get nodes NAME LABELS STATUS -kubernetes-node-0whl kubernetes.io/hostname=kubernetes-node-0whl Ready -kubernetes-node-4jdf kubernetes.io/hostname=kubernetes-node-4jdf Ready -kubernetes-node-epbe kubernetes.io/hostname=kubernetes-node-epbe Ready +kubernetes-node-0whl kubernetes.io/hostname=kubernetes-node-0whl Ready +kubernetes-node-4jdf kubernetes.io/hostname=kubernetes-node-4jdf Ready +kubernetes-node-epbe kubernetes.io/hostname=kubernetes-node-epbe Ready ``` Now start running some containers! diff --git a/flaky-tests.md b/flaky-tests.md index 27c788aa..d5cc6a45 100644 --- a/flaky-tests.md +++ b/flaky-tests.md @@ -80,9 +80,9 @@ You can use this script to automate checking for failures, assuming your cluster ```sh echo "" > output.txt for i in {1..4}; do - echo "Checking kubernetes-minion-${i}" - echo "kubernetes-minion-${i}:" >> output.txt - gcloud compute ssh "kubernetes-minion-${i}" --command="sudo docker ps -a" >> output.txt + echo "Checking kubernetes-node-${i}" + echo "kubernetes-node-${i}:" >> output.txt + gcloud compute ssh "kubernetes-node-${i}" --command="sudo docker ps -a" >> output.txt done grep "Exited ([^0])" output.txt ``` -- cgit v1.2.3 From 4634819e313be2270469f45a4c4d4afe53dc6c65 Mon Sep 17 00:00:00 2001 From: Brad Erickson Date: Thu, 3 Dec 2015 15:42:10 -0800 Subject: Minion->Node rename: docs/ machine names only, except gce/aws --- event_compression.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/event_compression.md b/event_compression.md index c7982712..a8c5916b 100644 --- a/event_compression.md +++ b/event_compression.md @@ -119,17 +119,17 @@ Sample kubectl output ```console FIRSTSEEN LASTSEEN COUNT NAME KIND SUBOBJECT REASON SOURCE MESSAGE -Thu, 12 Feb 2015 01:13:02 +0000 Thu, 12 Feb 2015 01:13:02 +0000 1 kubernetes-minion-4.c.saad-dev-vms.internal Minion starting {kubelet kubernetes-minion-4.c.saad-dev-vms.internal} Starting kubelet. -Thu, 12 Feb 2015 01:13:09 +0000 Thu, 12 Feb 2015 01:13:09 +0000 1 kubernetes-minion-1.c.saad-dev-vms.internal Minion starting {kubelet kubernetes-minion-1.c.saad-dev-vms.internal} Starting kubelet. -Thu, 12 Feb 2015 01:13:09 +0000 Thu, 12 Feb 2015 01:13:09 +0000 1 kubernetes-minion-3.c.saad-dev-vms.internal Minion starting {kubelet kubernetes-minion-3.c.saad-dev-vms.internal} Starting kubelet. -Thu, 12 Feb 2015 01:13:09 +0000 Thu, 12 Feb 2015 01:13:09 +0000 1 kubernetes-minion-2.c.saad-dev-vms.internal Minion starting {kubelet kubernetes-minion-2.c.saad-dev-vms.internal} Starting kubelet. +Thu, 12 Feb 2015 01:13:02 +0000 Thu, 12 Feb 2015 01:13:02 +0000 1 kubernetes-node-4.c.saad-dev-vms.internal Minion starting {kubelet kubernetes-node-4.c.saad-dev-vms.internal} Starting kubelet. +Thu, 12 Feb 2015 01:13:09 +0000 Thu, 12 Feb 2015 01:13:09 +0000 1 kubernetes-node-1.c.saad-dev-vms.internal Minion starting {kubelet kubernetes-node-1.c.saad-dev-vms.internal} Starting kubelet. +Thu, 12 Feb 2015 01:13:09 +0000 Thu, 12 Feb 2015 01:13:09 +0000 1 kubernetes-node-3.c.saad-dev-vms.internal Minion starting {kubelet kubernetes-node-3.c.saad-dev-vms.internal} Starting kubelet. +Thu, 12 Feb 2015 01:13:09 +0000 Thu, 12 Feb 2015 01:13:09 +0000 1 kubernetes-node-2.c.saad-dev-vms.internal Minion starting {kubelet kubernetes-node-2.c.saad-dev-vms.internal} Starting kubelet. Thu, 12 Feb 2015 01:13:05 +0000 Thu, 12 Feb 2015 01:13:12 +0000 4 monitoring-influx-grafana-controller-0133o Pod failedScheduling {scheduler } Error scheduling: no nodes available to schedule pods Thu, 12 Feb 2015 01:13:05 +0000 Thu, 12 Feb 2015 01:13:12 +0000 4 elasticsearch-logging-controller-fplln Pod failedScheduling {scheduler } Error scheduling: no nodes available to schedule pods Thu, 12 Feb 2015 01:13:05 +0000 Thu, 12 Feb 2015 01:13:12 +0000 4 kibana-logging-controller-gziey Pod failedScheduling {scheduler } Error scheduling: no nodes available to schedule pods Thu, 12 Feb 2015 01:13:05 +0000 Thu, 12 Feb 2015 01:13:12 +0000 4 skydns-ls6k1 Pod failedScheduling {scheduler } Error scheduling: no nodes available to schedule pods Thu, 12 Feb 2015 01:13:05 +0000 Thu, 12 Feb 2015 01:13:12 +0000 4 monitoring-heapster-controller-oh43e Pod failedScheduling {scheduler } Error scheduling: no nodes available to schedule pods -Thu, 12 Feb 2015 01:13:20 +0000 Thu, 12 Feb 2015 01:13:20 +0000 1 kibana-logging-controller-gziey BoundPod implicitly required container POD pulled {kubelet kubernetes-minion-4.c.saad-dev-vms.internal} Successfully pulled image "kubernetes/pause:latest" -Thu, 12 Feb 2015 01:13:20 +0000 Thu, 12 Feb 2015 01:13:20 +0000 1 kibana-logging-controller-gziey Pod scheduled {scheduler } Successfully assigned kibana-logging-controller-gziey to kubernetes-minion-4.c.saad-dev-vms.internal +Thu, 12 Feb 2015 01:13:20 +0000 Thu, 12 Feb 2015 01:13:20 +0000 1 kibana-logging-controller-gziey BoundPod implicitly required container POD pulled {kubelet kubernetes-node-4.c.saad-dev-vms.internal} Successfully pulled image "kubernetes/pause:latest" +Thu, 12 Feb 2015 01:13:20 +0000 Thu, 12 Feb 2015 01:13:20 +0000 1 kibana-logging-controller-gziey Pod scheduled {scheduler } Successfully assigned kibana-logging-controller-gziey to kubernetes-node-4.c.saad-dev-vms.internal ``` This demonstrates what would have been 20 separate entries (indicating scheduling failure) collapsed/compressed down to 5 entries. -- cgit v1.2.3 From 4d51e2295c9a213c6ee80eb6bd667743ba73268b Mon Sep 17 00:00:00 2001 From: Tamer Tas Date: Fri, 4 Dec 2015 18:16:01 +0200 Subject: Rename ConfigData proposal to ConfigMap --- config_data.md | 136 ++++++++++++++++++++++++++++----------------------------- 1 file changed, 68 insertions(+), 68 deletions(-) diff --git a/config_data.md b/config_data.md index 253f961e..aa36f7a7 100644 --- a/config_data.md +++ b/config_data.md @@ -20,7 +20,7 @@ refer to the docs that go with that version. The latest release of this document can be found -[here](http://releases.k8s.io/release-1.1/docs/proposals/config_data.md). +[here](http://releases.k8s.io/release-1.1/docs/proposals/configmap.md). Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). @@ -35,7 +35,7 @@ Documentation for other releases can be found at ## Abstract -This proposal proposes a new API resource, `ConfigData`, that stores data used for the configuration +This proposal proposes a new API resource, `ConfigMap`, that stores data used for the configuration of applications deployed on `Kubernetes`. The main focus points of this proposal are: @@ -50,9 +50,9 @@ A `Secret`-like API resource is needed to store configuration data that pods can Goals of this design: -1. Describe a `ConfigData` API resource -2. Describe the semantics of consuming `ConfigData` as environment variables -3. Describe the semantics of consuming `ConfigData` as files in a volume +1. Describe a `ConfigMap` API resource +2. Describe the semantics of consuming `ConfigMap` as environment variables +3. Describe the semantics of consuming `ConfigMap` as files in a volume ## Use Cases @@ -61,48 +61,48 @@ Goals of this design: 3. As a user, I want my view of configuration data in files to be eventually consistent with changes to the data -### Consuming `ConfigData` as Environment Variables +### Consuming `ConfigMap` as Environment Variables -Many programs read their configuration from environment variables. `ConfigData` should be possible -to consume in environment variables. The rough series of events for consuming `ConfigData` this way +Many programs read their configuration from environment variables. `ConfigMap` should be possible +to consume in environment variables. The rough series of events for consuming `ConfigMap` this way is: -1. A `ConfigData` object is created +1. A `ConfigMap` object is created 2. A pod that consumes the configuration data via environment variables is created 3. The pod is scheduled onto a node -4. The kubelet retrieves the `ConfigData` resource(s) referenced by the pod and starts the container +4. The kubelet retrieves the `ConfigMap` resource(s) referenced by the pod and starts the container processes with the appropriate data in environment variables -### Consuming `ConfigData` in Volumes +### Consuming `ConfigMap` in Volumes -Many programs read their configuration from configuration files. `ConfigData` should be possible -to consume in a volume. The rough series of events for consuming `ConfigData` this way +Many programs read their configuration from configuration files. `ConfigMap` should be possible +to consume in a volume. The rough series of events for consuming `ConfigMap` this way is: -1. A `ConfigData` object is created -2. A new pod using the `ConfigData` via the volume plugin is created +1. A `ConfigMap` object is created +2. A new pod using the `ConfigMap` via the volume plugin is created 3. The pod is scheduled onto a node 4. The Kubelet creates an instance of the volume plugin and calls its `Setup()` method -5. The volume plugin retrieves the `ConfigData` resource(s) referenced by the pod and projects +5. The volume plugin retrieves the `ConfigMap` resource(s) referenced by the pod and projects the appropriate data into the volume -### Consuming `ConfigData` Updates +### Consuming `ConfigMap` Updates Any long-running system has configuration that is mutated over time. Changes made to configuration data must be made visible to pods consuming data in volumes so that they can respond to those changes. -The `resourceVersion` of the `ConfigData` object will be updated by the API server every time the +The `resourceVersion` of the `ConfigMap` object will be updated by the API server every time the object is modified. After an update, modifications will be made visible to the consumer container: -1. A `ConfigData` object is created -2. A new pod using the `ConfigData` via the volume plugin is created +1. A `ConfigMap` object is created +2. A new pod using the `ConfigMap` via the volume plugin is created 3. The pod is scheduled onto a node 4. During the sync loop, the Kubelet creates an instance of the volume plugin and calls its `Setup()` method -5. The volume plugin retrieves the `ConfigData` resource(s) referenced by the pod and projects +5. The volume plugin retrieves the `ConfigMap` resource(s) referenced by the pod and projects the appropriate data into the volume -6. The `ConfigData` referenced by the pod is updated +6. The `ConfigMap` referenced by the pod is updated 7. During the next iteration of the `syncLoop`, the Kubelet creates an instance of the volume plugin and calls its `Setup()` method 8. The volume plugin projects the updated data into the volume atomically @@ -122,13 +122,13 @@ consumed in environment variables will not be updated. ### API Resource -The `ConfigData` resource will be added to the `extensions` API Group: +The `ConfigMap` resource will be added to the `extensions` API Group: ```go package api -// ConfigData holds configuration data for pods to consume. -type ConfigData struct { +// ConfigMap holds configuration data for pods to consume. +type ConfigMap struct { TypeMeta `json:",inline"` ObjectMeta `json:"metadata,omitempty"` @@ -137,19 +137,19 @@ type ConfigData struct { Data map[string]string `json:"data,omitempty"` } -type ConfigDataList struct { +type ConfigMapList struct { TypeMeta `json:",inline"` ListMeta `json:"metadata,omitempty"` - Items []ConfigData `json:"items"` + Items []ConfigMap `json:"items"` } ``` -A `Registry` implementation for `ConfigData` will be added to `pkg/registry/configdata`. +A `Registry` implementation for `ConfigMap` will be added to `pkg/registry/configmap`. ### Environment Variables -The `EnvVarSource` will be extended with a new selector for config data: +The `EnvVarSource` will be extended with a new selector for `ConfigMap`: ```go package api @@ -158,22 +158,22 @@ package api type EnvVarSource struct { // other fields omitted - // Specifies a ConfigData key - ConfigData *ConfigDataSelector `json:"configData,omitempty"` + // Specifies a ConfigMap key + ConfigMap *ConfigMapSelector `json:"configMap,omitempty"` } -// ConfigDataSelector selects a key of a ConfigData. -type ConfigDataSelector struct { - // The name of the ConfigData to select a key from. - ConfigDataName string `json:"configDataName"` - // The key of the ConfigData to select. +// ConfigMapSelector selects a key of a ConfigMap. +type ConfigMapSelector struct { + // The name of the ConfigMap to select a key from. + ConfigMapName string `json:"configMapName"` + // The key of the ConfigMap to select. Key string `json:"key"` } ``` ### Volume Source -A new `ConfigDataVolumeSource` type of volume source containing the `ConfigData` object will be +A new `ConfigMapVolumeSource` type of volume source containing the `ConfigMap` object will be added to the `VolumeSource` struct in the API: ```go @@ -181,18 +181,18 @@ package api type VolumeSource struct { // other fields omitted - ConfigData *ConfigDataVolumeSource `json:"configData,omitempty"` + ConfigMap *ConfigMapVolumeSource `json:"configMap,omitempty"` } -// ConfigDataVolumeSource represents a volume that holds configuration data -type ConfigDataVolumeSource struct { - // A list of config data keys to project into the volume in files - Files []ConfigDataVolumeFile `json:"files"` +// ConfigMapVolumeSource represents a volume that holds configuration data +type ConfigMapVolumeSource struct { + // A list of configuration data keys to project into the volume in files + Files []ConfigMapVolumeFile `json:"files"` } -// ConfigDataVolumeFile represents a single file containing config data -type ConfigDataVolumeFile struct { - ConfigDataSelector `json:",inline"` +// ConfigMapVolumeFile represents a single file containing configuration data +type ConfigMapVolumeFile struct { + ConfigMapSelector `json:",inline"` // The relative path name of the file to be created. // Must not be absolute or contain the '..' path. Must be utf-8 encoded. @@ -202,15 +202,15 @@ type ConfigDataVolumeFile struct { ``` **Note:** The update logic used in the downward API volume plug-in will be extracted and re-used in -the volume plug-in for `ConfigData`. +the volume plug-in for `ConfigMap`. ## Examples -#### Consuming `ConfigData` as Environment Variables +#### Consuming `ConfigMap` as Environment Variables ```yaml apiVersion: extensions/v1beta1 -kind: ConfigData +kind: ConfigMap metadata: name: etcd-env-config data: @@ -222,7 +222,7 @@ data: etcdctl_peers: http://etcd:2379 ``` -This pod consumes the `ConfigData` as environment variables: +This pod consumes the `ConfigMap` as environment variables: ```yaml apiVersion: v1 @@ -241,38 +241,38 @@ spec: env: - name: ETCD_NUM_MEMBERS valueFrom: - configData: - configDataName: etcd-env-config + configMap: + configMapName: etcd-env-config key: number_of_members - name: ETCD_INITIAL_CLUSTER_STATE valueFrom: - configData: - configDataName: etcd-env-config + configMap: + configMapName: etcd-env-config key: initial_cluster_state - name: ETCD_DISCOVERY_TOKEN valueFrom: - configData: - configDataName: etcd-env-config + configMap: + configMapName: etcd-env-config key: discovery_token - name: ETCD_DISCOVERY_URL valueFrom: - configData: - configDataName: etcd-env-config + configMap: + configMapName: etcd-env-config key: discovery_url - name: ETCDCTL_PEERS valueFrom: - configData: - configDataName: etcd-env-config + configMap: + configMapName: etcd-env-config key: etcdctl_peers ``` -### Consuming `ConfigData` as Volumes +### Consuming `ConfigMap` as Volumes `redis-volume-config` is intended to be used as a volume containing a config file: ```yaml apiVersion: extensions/v1beta1 -kind: ConfigData +kind: ConfigMap metadata: name: redis-volume-config data: @@ -290,18 +290,18 @@ spec: containers: - name: redis image: kubernetes/redis - command: "redis-server /mnt/config-data/etc/redis.conf" + command: "redis-server /mnt/config-map/etc/redis.conf" ports: - containerPort: 6379 volumeMounts: - - name: config-data-volume - mountPath: /mnt/config-data + - name: config-map-volume + mountPath: /mnt/config-map volumes: - - name: config-data-volume - configData: + - name: config-map-volume + configMap: files: - path: "etc/redis.conf" - configDataName: redis-volume-config + configMapName: redis-volume-config key: redis.conf ``` @@ -311,5 +311,5 @@ In the future, we may add the ability to specify an init-container that can watc contents for updates and respond to changes when they occur. -[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/config_data.md?pixel)]() +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/configmap.md?pixel)]() -- cgit v1.2.3 From 6dc63dfa2249e9fc206f9898bebfe96abbb69443 Mon Sep 17 00:00:00 2001 From: Tamer Tas Date: Fri, 4 Dec 2015 18:17:01 +0200 Subject: Rename config_data.md to configmap.md --- config_data.md | 315 --------------------------------------------------------- configmap.md | 315 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 315 insertions(+), 315 deletions(-) delete mode 100644 config_data.md create mode 100644 configmap.md diff --git a/config_data.md b/config_data.md deleted file mode 100644 index aa36f7a7..00000000 --- a/config_data.md +++ /dev/null @@ -1,315 +0,0 @@ - - - - -WARNING -WARNING -WARNING -WARNING -WARNING - -

PLEASE NOTE: This document applies to the HEAD of the source tree

- -If you are using a released version of Kubernetes, you should -refer to the docs that go with that version. - - -The latest release of this document can be found -[here](http://releases.k8s.io/release-1.1/docs/proposals/configmap.md). - -Documentation for other releases can be found at -[releases.k8s.io](http://releases.k8s.io). - --- - - - - - -# Generic Configuration Object - -## Abstract - -This proposal proposes a new API resource, `ConfigMap`, that stores data used for the configuration -of applications deployed on `Kubernetes`. - -The main focus points of this proposal are: - -* Dynamic distribution of configuration data to deployed applications. -* Encapsulate configuration information and simplify `Kubernetes` deployments. -* Create a flexible configuration model for `Kubernetes`. - -## Motivation - -A `Secret`-like API resource is needed to store configuration data that pods can consume. - -Goals of this design: - -1. Describe a `ConfigMap` API resource -2. Describe the semantics of consuming `ConfigMap` as environment variables -3. Describe the semantics of consuming `ConfigMap` as files in a volume - -## Use Cases - -1. As a user, I want to be able to consume configuration data as environment variables -2. As a user, I want to be able to consume configuration data as files in a volume -3. As a user, I want my view of configuration data in files to be eventually consistent with changes - to the data - -### Consuming `ConfigMap` as Environment Variables - -Many programs read their configuration from environment variables. `ConfigMap` should be possible -to consume in environment variables. The rough series of events for consuming `ConfigMap` this way -is: - -1. A `ConfigMap` object is created -2. A pod that consumes the configuration data via environment variables is created -3. The pod is scheduled onto a node -4. The kubelet retrieves the `ConfigMap` resource(s) referenced by the pod and starts the container - processes with the appropriate data in environment variables - -### Consuming `ConfigMap` in Volumes - -Many programs read their configuration from configuration files. `ConfigMap` should be possible -to consume in a volume. The rough series of events for consuming `ConfigMap` this way -is: - -1. A `ConfigMap` object is created -2. A new pod using the `ConfigMap` via the volume plugin is created -3. The pod is scheduled onto a node -4. The Kubelet creates an instance of the volume plugin and calls its `Setup()` method -5. The volume plugin retrieves the `ConfigMap` resource(s) referenced by the pod and projects - the appropriate data into the volume - -### Consuming `ConfigMap` Updates - -Any long-running system has configuration that is mutated over time. Changes made to configuration -data must be made visible to pods consuming data in volumes so that they can respond to those -changes. - -The `resourceVersion` of the `ConfigMap` object will be updated by the API server every time the -object is modified. After an update, modifications will be made visible to the consumer container: - -1. A `ConfigMap` object is created -2. A new pod using the `ConfigMap` via the volume plugin is created -3. The pod is scheduled onto a node -4. During the sync loop, the Kubelet creates an instance of the volume plugin and calls its - `Setup()` method -5. The volume plugin retrieves the `ConfigMap` resource(s) referenced by the pod and projects - the appropriate data into the volume -6. The `ConfigMap` referenced by the pod is updated -7. During the next iteration of the `syncLoop`, the Kubelet creates an instance of the volume plugin - and calls its `Setup()` method -8. The volume plugin projects the updated data into the volume atomically - -It is the consuming pod's responsibility to make use of the updated data once it is made visible. - -Because environment variables cannot be updated without restarting a container, configuration data -consumed in environment variables will not be updated. - -### Advantages - -* Easy to consume in pods; consumer-agnostic -* Configuration data is persistent and versioned -* Consumers of configuration data in volumes can respond to changes in the data - -## Proposed Design - -### API Resource - -The `ConfigMap` resource will be added to the `extensions` API Group: - -```go -package api - -// ConfigMap holds configuration data for pods to consume. -type ConfigMap struct { - TypeMeta `json:",inline"` - ObjectMeta `json:"metadata,omitempty"` - - // Data contains the configuration data. Each key must be a valid DNS_SUBDOMAIN or leading - // dot followed by valid DNS_SUBDOMAIN. - Data map[string]string `json:"data,omitempty"` -} - -type ConfigMapList struct { - TypeMeta `json:",inline"` - ListMeta `json:"metadata,omitempty"` - - Items []ConfigMap `json:"items"` -} -``` - -A `Registry` implementation for `ConfigMap` will be added to `pkg/registry/configmap`. - -### Environment Variables - -The `EnvVarSource` will be extended with a new selector for `ConfigMap`: - -```go -package api - -// EnvVarSource represents a source for the value of an EnvVar. -type EnvVarSource struct { - // other fields omitted - - // Specifies a ConfigMap key - ConfigMap *ConfigMapSelector `json:"configMap,omitempty"` -} - -// ConfigMapSelector selects a key of a ConfigMap. -type ConfigMapSelector struct { - // The name of the ConfigMap to select a key from. - ConfigMapName string `json:"configMapName"` - // The key of the ConfigMap to select. - Key string `json:"key"` -} -``` - -### Volume Source - -A new `ConfigMapVolumeSource` type of volume source containing the `ConfigMap` object will be -added to the `VolumeSource` struct in the API: - -```go -package api - -type VolumeSource struct { - // other fields omitted - ConfigMap *ConfigMapVolumeSource `json:"configMap,omitempty"` -} - -// ConfigMapVolumeSource represents a volume that holds configuration data -type ConfigMapVolumeSource struct { - // A list of configuration data keys to project into the volume in files - Files []ConfigMapVolumeFile `json:"files"` -} - -// ConfigMapVolumeFile represents a single file containing configuration data -type ConfigMapVolumeFile struct { - ConfigMapSelector `json:",inline"` - - // The relative path name of the file to be created. - // Must not be absolute or contain the '..' path. Must be utf-8 encoded. - // The first item of the relative path must not start with '..' - Path string `json:"path"` -} -``` - -**Note:** The update logic used in the downward API volume plug-in will be extracted and re-used in -the volume plug-in for `ConfigMap`. - -## Examples - -#### Consuming `ConfigMap` as Environment Variables - -```yaml -apiVersion: extensions/v1beta1 -kind: ConfigMap -metadata: - name: etcd-env-config -data: - number_of_members: 1 - initial_cluster_state: new - initial_cluster_token: DUMMY_ETCD_INITIAL_CLUSTER_TOKEN - discovery_token: DUMMY_ETCD_DISCOVERY_TOKEN - discovery_url: http://etcd-discovery:2379 - etcdctl_peers: http://etcd:2379 -``` - -This pod consumes the `ConfigMap` as environment variables: - -```yaml -apiVersion: v1 -kind: Pod -metadata: - name: config-env-example -spec: - containers: - - name: etcd - image: openshift/etcd-20-centos7 - ports: - - containerPort: 2379 - protocol: TCP - - containerPort: 2380 - protocol: TCP - env: - - name: ETCD_NUM_MEMBERS - valueFrom: - configMap: - configMapName: etcd-env-config - key: number_of_members - - name: ETCD_INITIAL_CLUSTER_STATE - valueFrom: - configMap: - configMapName: etcd-env-config - key: initial_cluster_state - - name: ETCD_DISCOVERY_TOKEN - valueFrom: - configMap: - configMapName: etcd-env-config - key: discovery_token - - name: ETCD_DISCOVERY_URL - valueFrom: - configMap: - configMapName: etcd-env-config - key: discovery_url - - name: ETCDCTL_PEERS - valueFrom: - configMap: - configMapName: etcd-env-config - key: etcdctl_peers -``` - -### Consuming `ConfigMap` as Volumes - -`redis-volume-config` is intended to be used as a volume containing a config file: - -```yaml -apiVersion: extensions/v1beta1 -kind: ConfigMap -metadata: - name: redis-volume-config -data: - redis.conf: "pidfile /var/run/redis.pid\nport6379\ntcp-backlog 511\n databases 1\ntimeout 0\n" -``` - -The following pod consumes the `redis-volume-config` in a volume: - -```yaml -apiVersion: v1 -kind: Pod -metadata: - name: config-volume-example -spec: - containers: - - name: redis - image: kubernetes/redis - command: "redis-server /mnt/config-map/etc/redis.conf" - ports: - - containerPort: 6379 - volumeMounts: - - name: config-map-volume - mountPath: /mnt/config-map - volumes: - - name: config-map-volume - configMap: - files: - - path: "etc/redis.conf" - configMapName: redis-volume-config - key: redis.conf -``` - -### Future Improvements - -In the future, we may add the ability to specify an init-container that can watch the volume -contents for updates and respond to changes when they occur. - - -[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/configmap.md?pixel)]() - diff --git a/configmap.md b/configmap.md new file mode 100644 index 00000000..aa36f7a7 --- /dev/null +++ b/configmap.md @@ -0,0 +1,315 @@ + + + + +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/proposals/configmap.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- + + + + + +# Generic Configuration Object + +## Abstract + +This proposal proposes a new API resource, `ConfigMap`, that stores data used for the configuration +of applications deployed on `Kubernetes`. + +The main focus points of this proposal are: + +* Dynamic distribution of configuration data to deployed applications. +* Encapsulate configuration information and simplify `Kubernetes` deployments. +* Create a flexible configuration model for `Kubernetes`. + +## Motivation + +A `Secret`-like API resource is needed to store configuration data that pods can consume. + +Goals of this design: + +1. Describe a `ConfigMap` API resource +2. Describe the semantics of consuming `ConfigMap` as environment variables +3. Describe the semantics of consuming `ConfigMap` as files in a volume + +## Use Cases + +1. As a user, I want to be able to consume configuration data as environment variables +2. As a user, I want to be able to consume configuration data as files in a volume +3. As a user, I want my view of configuration data in files to be eventually consistent with changes + to the data + +### Consuming `ConfigMap` as Environment Variables + +Many programs read their configuration from environment variables. `ConfigMap` should be possible +to consume in environment variables. The rough series of events for consuming `ConfigMap` this way +is: + +1. A `ConfigMap` object is created +2. A pod that consumes the configuration data via environment variables is created +3. The pod is scheduled onto a node +4. The kubelet retrieves the `ConfigMap` resource(s) referenced by the pod and starts the container + processes with the appropriate data in environment variables + +### Consuming `ConfigMap` in Volumes + +Many programs read their configuration from configuration files. `ConfigMap` should be possible +to consume in a volume. The rough series of events for consuming `ConfigMap` this way +is: + +1. A `ConfigMap` object is created +2. A new pod using the `ConfigMap` via the volume plugin is created +3. The pod is scheduled onto a node +4. The Kubelet creates an instance of the volume plugin and calls its `Setup()` method +5. The volume plugin retrieves the `ConfigMap` resource(s) referenced by the pod and projects + the appropriate data into the volume + +### Consuming `ConfigMap` Updates + +Any long-running system has configuration that is mutated over time. Changes made to configuration +data must be made visible to pods consuming data in volumes so that they can respond to those +changes. + +The `resourceVersion` of the `ConfigMap` object will be updated by the API server every time the +object is modified. After an update, modifications will be made visible to the consumer container: + +1. A `ConfigMap` object is created +2. A new pod using the `ConfigMap` via the volume plugin is created +3. The pod is scheduled onto a node +4. During the sync loop, the Kubelet creates an instance of the volume plugin and calls its + `Setup()` method +5. The volume plugin retrieves the `ConfigMap` resource(s) referenced by the pod and projects + the appropriate data into the volume +6. The `ConfigMap` referenced by the pod is updated +7. During the next iteration of the `syncLoop`, the Kubelet creates an instance of the volume plugin + and calls its `Setup()` method +8. The volume plugin projects the updated data into the volume atomically + +It is the consuming pod's responsibility to make use of the updated data once it is made visible. + +Because environment variables cannot be updated without restarting a container, configuration data +consumed in environment variables will not be updated. + +### Advantages + +* Easy to consume in pods; consumer-agnostic +* Configuration data is persistent and versioned +* Consumers of configuration data in volumes can respond to changes in the data + +## Proposed Design + +### API Resource + +The `ConfigMap` resource will be added to the `extensions` API Group: + +```go +package api + +// ConfigMap holds configuration data for pods to consume. +type ConfigMap struct { + TypeMeta `json:",inline"` + ObjectMeta `json:"metadata,omitempty"` + + // Data contains the configuration data. Each key must be a valid DNS_SUBDOMAIN or leading + // dot followed by valid DNS_SUBDOMAIN. + Data map[string]string `json:"data,omitempty"` +} + +type ConfigMapList struct { + TypeMeta `json:",inline"` + ListMeta `json:"metadata,omitempty"` + + Items []ConfigMap `json:"items"` +} +``` + +A `Registry` implementation for `ConfigMap` will be added to `pkg/registry/configmap`. + +### Environment Variables + +The `EnvVarSource` will be extended with a new selector for `ConfigMap`: + +```go +package api + +// EnvVarSource represents a source for the value of an EnvVar. +type EnvVarSource struct { + // other fields omitted + + // Specifies a ConfigMap key + ConfigMap *ConfigMapSelector `json:"configMap,omitempty"` +} + +// ConfigMapSelector selects a key of a ConfigMap. +type ConfigMapSelector struct { + // The name of the ConfigMap to select a key from. + ConfigMapName string `json:"configMapName"` + // The key of the ConfigMap to select. + Key string `json:"key"` +} +``` + +### Volume Source + +A new `ConfigMapVolumeSource` type of volume source containing the `ConfigMap` object will be +added to the `VolumeSource` struct in the API: + +```go +package api + +type VolumeSource struct { + // other fields omitted + ConfigMap *ConfigMapVolumeSource `json:"configMap,omitempty"` +} + +// ConfigMapVolumeSource represents a volume that holds configuration data +type ConfigMapVolumeSource struct { + // A list of configuration data keys to project into the volume in files + Files []ConfigMapVolumeFile `json:"files"` +} + +// ConfigMapVolumeFile represents a single file containing configuration data +type ConfigMapVolumeFile struct { + ConfigMapSelector `json:",inline"` + + // The relative path name of the file to be created. + // Must not be absolute or contain the '..' path. Must be utf-8 encoded. + // The first item of the relative path must not start with '..' + Path string `json:"path"` +} +``` + +**Note:** The update logic used in the downward API volume plug-in will be extracted and re-used in +the volume plug-in for `ConfigMap`. + +## Examples + +#### Consuming `ConfigMap` as Environment Variables + +```yaml +apiVersion: extensions/v1beta1 +kind: ConfigMap +metadata: + name: etcd-env-config +data: + number_of_members: 1 + initial_cluster_state: new + initial_cluster_token: DUMMY_ETCD_INITIAL_CLUSTER_TOKEN + discovery_token: DUMMY_ETCD_DISCOVERY_TOKEN + discovery_url: http://etcd-discovery:2379 + etcdctl_peers: http://etcd:2379 +``` + +This pod consumes the `ConfigMap` as environment variables: + +```yaml +apiVersion: v1 +kind: Pod +metadata: + name: config-env-example +spec: + containers: + - name: etcd + image: openshift/etcd-20-centos7 + ports: + - containerPort: 2379 + protocol: TCP + - containerPort: 2380 + protocol: TCP + env: + - name: ETCD_NUM_MEMBERS + valueFrom: + configMap: + configMapName: etcd-env-config + key: number_of_members + - name: ETCD_INITIAL_CLUSTER_STATE + valueFrom: + configMap: + configMapName: etcd-env-config + key: initial_cluster_state + - name: ETCD_DISCOVERY_TOKEN + valueFrom: + configMap: + configMapName: etcd-env-config + key: discovery_token + - name: ETCD_DISCOVERY_URL + valueFrom: + configMap: + configMapName: etcd-env-config + key: discovery_url + - name: ETCDCTL_PEERS + valueFrom: + configMap: + configMapName: etcd-env-config + key: etcdctl_peers +``` + +### Consuming `ConfigMap` as Volumes + +`redis-volume-config` is intended to be used as a volume containing a config file: + +```yaml +apiVersion: extensions/v1beta1 +kind: ConfigMap +metadata: + name: redis-volume-config +data: + redis.conf: "pidfile /var/run/redis.pid\nport6379\ntcp-backlog 511\n databases 1\ntimeout 0\n" +``` + +The following pod consumes the `redis-volume-config` in a volume: + +```yaml +apiVersion: v1 +kind: Pod +metadata: + name: config-volume-example +spec: + containers: + - name: redis + image: kubernetes/redis + command: "redis-server /mnt/config-map/etc/redis.conf" + ports: + - containerPort: 6379 + volumeMounts: + - name: config-map-volume + mountPath: /mnt/config-map + volumes: + - name: config-map-volume + configMap: + files: + - path: "etc/redis.conf" + configMapName: redis-volume-config + key: redis.conf +``` + +### Future Improvements + +In the future, we may add the ability to specify an init-container that can watch the volume +contents for updates and respond to changes when they occur. + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/configmap.md?pixel)]() + -- cgit v1.2.3 From ee875e93eb126d20731d0799d99b9ec95bf0f8fd Mon Sep 17 00:00:00 2001 From: Jon Eisen Date: Fri, 4 Dec 2015 13:47:30 -0700 Subject: Add new clojure api bindings library https://github.com/yanatan16/clj-kubernetes-api --- client-libraries.md | 1 + 1 file changed, 1 insertion(+) diff --git a/client-libraries.md b/client-libraries.md index 22a59d06..a6f3e6ff 100644 --- a/client-libraries.md +++ b/client-libraries.md @@ -49,6 +49,7 @@ Documentation for other releases can be found at * [PHP](https://github.com/maclof/kubernetes-client) * [Node.js](https://github.com/tenxcloud/node-kubernetes-client) * [Perl](https://metacpan.org/pod/Net::Kubernetes) + * [Clojure](https://github.com/yanatan16/clj-kubernetes-api) -- cgit v1.2.3 From 9b60d8c88083958918bb92b6104b1fe8d4e9b9ec Mon Sep 17 00:00:00 2001 From: Tamer Tas Date: Mon, 7 Dec 2015 06:16:01 +0200 Subject: Rename githash to build_version and version to release_version --- releasing.md | 34 ++++++++++++++++++---------------- 1 file changed, 18 insertions(+), 16 deletions(-) diff --git a/releasing.md b/releasing.md index d4347ce4..757048ad 100644 --- a/releasing.md +++ b/releasing.md @@ -92,22 +92,24 @@ release from HEAD of the branch, (because you have to do some version-rev commits,) so choose the latest build on the release branch. (Remember, that branch should be frozen.) -Once you find some greens, you can find the git hash for a build by looking at -the Full Console Output and searching for `githash=`. You should see a line: +Once you find some greens, you can find the build hash for a build by looking at +the Full Console Output and searching for `build_version=`. You should see a line: ```console -githash=v1.2.0-alpha.2.164+b44c7d79d6c9bb +build_version=v1.2.0-alpha.2.164+b44c7d79d6c9bb ``` Or, if you're cutting from a release branch (i.e. doing an official release), ```console -githash=v1.1.0-beta.567+d79d6c9bbb44c7 +build_version=v1.1.0-beta.567+d79d6c9bbb44c7 ``` +Please note that `build_version` was called `githash` versions prior to v1.2. + Because Jenkins builds frequently, if you're looking between jobs (e.g. `kubernetes-e2e-gke-ci` and `kubernetes-e2e-gce`), there may be no single -`githash` that's been run on both jobs. In that case, take the a green +`build_version` that's been run on both jobs. In that case, take the a green `kubernetes-e2e-gce` build (but please check that it corresponds to a temporally similar build that's green on `kubernetes-e2e-gke-ci`). Lastly, if you're having trouble understanding why the GKE continuous integration clusters are failing @@ -117,10 +119,10 @@ oncall. Before proceeding to the next step: ```sh -export GITHASH=v1.2.0-alpha.2.164+b44c7d79d6c9bb +export BUILD_VERSION=v1.2.0-alpha.2.164+b44c7d79d6c9bb ``` -Where `v1.2.0-alpha.2.164+b44c7d79d6c9bb` is the git hash you decided on. This +Where `v1.2.0-alpha.2.164+b44c7d79d6c9bb` is the build hash you decided on. This will become your release point. ### Cutting/branching the release @@ -136,15 +138,15 @@ or `git checkout upstream/master` from an existing repo. Decide what version you're cutting and export it: -- alpha release: `export VER="vX.Y.0-alpha.W"`; -- beta release: `export VER="vX.Y.Z-beta.W"`; -- official release: `export VER="vX.Y.Z"`; -- new release series: `export VER="vX.Y"`. +- alpha release: `export RELEASE_VERSION="vX.Y.0-alpha.W"`; +- beta release: `export RELEASE_VERSION="vX.Y.Z-beta.W"`; +- official release: `export RELEASE_VERSION="vX.Y.Z"`; +- new release series: `export RELEASE_VERSION="vX.Y"`. Then, run ```console -./release/cut-official-release.sh "${VER}" "${GITHASH}" +./release/cut-official-release.sh "${RELEASE_VERSION}" "${BUILD_VERSION}" ``` This will do a dry run of the release. It will give you instructions at the @@ -152,7 +154,7 @@ end for `pushd`ing into the dry-run directory and having a look around. `pushd` into the directory and make sure everythig looks as you expect: ```console -git log "${VER}" # do you see the commit you expect? +git log "${RELEASE_VERSION}" # do you see the commit you expect? make release ./cluster/kubectl.sh version -c ``` @@ -161,7 +163,7 @@ If you're satisfied with the result of the script, go back to `upstream/master` run ```console -./release/cut-official-release.sh "${VER}" "${GITHASH}" --no-dry-run +./release/cut-official-release.sh "${RELEASE_VERSION}" "${BUILD_VERSION}" --no-dry-run ``` and follow the instructions. @@ -185,10 +187,10 @@ notes, (see #17444 for more info). - Only publish a beta release if it's a standalone pre-release. (We create beta tags after we do official releases to maintain proper semantic versioning, *we don't publish these beta releases*.) Use - `./hack/cherry_pick_list.sh ${VER}` to get release notes for such a + `./hack/cherry_pick_list.sh ${RELEASE_VERSION}` to get release notes for such a release. - Official release: - - From your clone of upstream/master, run `./hack/cherry_pick_list.sh ${VER}` + - From your clone of upstream/master, run `./hack/cherry_pick_list.sh ${RELEASE_VERSION}` to get the release notes for the patch release you just created. Feel free to prune anything internal, but typically for patch releases we tend to include everything in the release notes. -- cgit v1.2.3 From 94fa24d4845d67416d2fc4018aaa22ef7c40cecd Mon Sep 17 00:00:00 2001 From: "Tim St. Clair" Date: Mon, 7 Dec 2015 16:42:58 -0800 Subject: Node Allocatable resources proposal --- node-allocatable.md | 184 +++++++++++++++++++++++++++++++++++++++++++++++++++ node-allocatable.png | Bin 0 -> 17673 bytes 2 files changed, 184 insertions(+) create mode 100644 node-allocatable.md create mode 100644 node-allocatable.png diff --git a/node-allocatable.md b/node-allocatable.md new file mode 100644 index 00000000..8429eda9 --- /dev/null +++ b/node-allocatable.md @@ -0,0 +1,184 @@ + + + + +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/proposals/node-allocatable.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- + + + + + +# Node Allocatable Resources + +**Issue:** https://github.com/kubernetes/kubernetes/issues/13984 + +## Overview + +Currently Node.Status has Capacity, but no concept of node Allocatable. We need additional +parameters to serve several purposes: + +1. [Kubernetes metrics](compute-resource-metrics-api.md) provides "/docker-daemon", "/kubelet", + "/kube-proxy", "/system" etc. raw containers for monitoring system component resource usage + patterns and detecting regressions. Eventually we want to cap system component usage to a certain + limit / request. However this is not currently feasible due to a variety of reasons including: + 1. Docker still uses tons of computing resources (See + [#16943](https://github.com/kubernetes/kubernetes/issues/16943)) + 2. We have not yet defined the minimal system requirements, so we cannot control Kubernetes + nodes or know about arbitrary daemons, which can make the system resources + unmanageable. Even with a resource cap we cannot do a full resource management on the + node, but with the proposed parameters we can mitigate really bad resource over commits + 3. Usage scales with the number of pods running on the node +2. For external schedulers (such as mesos, hadoop, etc.) integration, they might want to partition + compute resources on a given node, limiting how much Kubelet can use. We should provide a + mechanism by which they can query kubelet, and reserve some resources for their own purpose. + +### Scope of proposal + +This proposal deals with resource reporting through the [`Allocatable` field](#allocatable) for more +reliable scheduling, and minimizing resource over commitment. This proposal *does not* cover +resource usage enforcement (e.g. limiting kubernetes component usage), pod eviction (e.g. when +reservation grows), or running multiple Kubelets on a single node. + +## Design + +### Definitions + +![image](node-allocatable.png) + +1. **Node Capacity** - Already provided as + [`NodeStatus.Capacity`](https://htmlpreview.github.io/?https://github.com/kubernetes/kubernetes/HEAD/docs/api-reference/v1/definitions.html#_v1_nodestatus), + this is total capacity read from the node instance, and assumed to be constant. +2. **System-Reserved** (proposed) - Compute resources reserved for processes which are not managed by + Kubernetes. Currently this covers all the processes lumped together in the `/system` raw + container. +3. **Kubelet Allocatable** - Compute resources available for scheduling (including scheduled & + unscheduled resources). This value is the focus of this proposal. See [below](#api-changes) for + more details. +4. **Kube-Reserved** (proposed) - Compute resources reserved for Kubernetes components such as the + docker daemon, kubelet, kube proxy, etc. + +### API changes + +#### Allocatable + +Add `Allocatable` (4) to +[`NodeStatus`](https://htmlpreview.github.io/?https://github.com/kubernetes/kubernetes/HEAD/docs/api-reference/v1/definitions.html#_v1_nodestatus): + +``` +type NodeStatus struct { + ... + // Allocatable represents schedulable resources of a node. + Allocatable ResourceList `json:"allocatable,omitempty"` + ... +} +``` + +Allocatable will be computed by the Kubelet and reported to the API server. It is defined to be: + +``` + [Allocatable] = [Node Capacity] - [Kube-Reserved] - [System-Reserved] +``` + +The scheduler will use `Allocatable` in place of `Capacity` when scheduling pods, and the Kubelet +will use it when performing admission checks. + +*Note: Since kernel usage can fluctuate and is out of kubernetes control, it will be reported as a + separate value (probably via the metrics API). Reporting kernel usage is out-of-scope for this + proposal.* + +#### Kube-Reserved + +`KubeReserved` is the parameter specifying resources reserved for kubernetes components (4). It is +provided as a command-line flag to the Kubelet at startup, and therefore cannot be changed during +normal Kubelet operation (this may change in the [future](#future-work)). + +The flag will be specified as a serialized `ResourceList`, with resources defined by the API +`ResourceName` and values specified in `resource.Quantity` format, e.g.: + +``` +--kube-reserved=cpu=500m,memory=5Mi +``` + +Initially we will only support CPU and memory, but will eventually support more resources. See +[#16889](https://github.com/kubernetes/kubernetes/pull/16889) for disk accounting. + +If KubeReserved is not set it defaults to a sane value (TBD) calculated from machine capacity. If it +is explicitly set to 0 (along with `SystemReserved`), then `Allocatable == Capacity`, and the system +behavior is equivalent to the 1.1 behavior with scheduling based on Capacity. + +#### System-Reserved + +In the initial implementation, `SystemReserved` will be functionally equivalent to +[`KubeReserved`](#system-reserved), but with a different semantic meaning. While KubeReserved +designates resources set asside for kubernetes components, SystemReserved designates resources set +asside for non-kubernetes components (currently this is reported as all the processes lumped +together in the `/system` raw container). + +## Issues + +### Kubernetes reservation is smaller than kubernetes component usage + +**Solution**: Initially, do nothing (best effort). Let the kubernetes daemons overflow the reserved +resources and hope for the best. If the node usage is less than Allocatable, there will be some room +for overflow and the node should continue to function. If the node has been scheduled to capacity +(worst-case scenario) it may enter an unstable state, which is the current behavior in this +situation. + +In the [future](#future-work) we may set a parent cgroup for kubernetes components, with limits set +according to `KubeReserved`. + +### Version discrepancy + +**API server / scheduler is not allocatable-resources aware:** If the Kubelet rejects a Pod but the + scheduler expects the Kubelet to accept it, the system could get stuck in an infinite loop + scheduling a Pod onto the node only to have Kubelet repeatedly reject it. To avoid this situation, + we will do a 2-stage rollout of `Allocatable`. In stage 1 (targeted for 1.2), `Allocatable` will + be reported by the Kubelet and the scheduler will be updated to use it, but Kubelet will continue + to do admission checks based on `Capacity` (same as today). In stage 2 of the rollout (targeted + for 1.3 or later), the Kubelet will start doing admission checks based on `Allocatable`. + +**API server expects `Allocatable` but does not receive it:** If the kubelet is older and does not + provide `Allocatable` in the `NodeStatus`, then `Allocatable` will be + [defaulted](../../pkg/api/v1/defaults.go) to + `Capacity` (which will yield todays behavior of scheduling based on capacity). + +### 3rd party schedulers + +The community should be notified that an update to schedulers is recommended, but if a scheduler is +not updated it falls under the above case of "scheduler is not allocatable-resources aware". + +## Future work + +1. Convert kubelet flags to Config API - Prerequisite to (2). See + [#12245](https://github.com/kubernetes/kubernetes/issues/12245). +2. Set cgroup limits according KubeReserved - as described in the [overview](#overview) +3. Report kernel usage to be considered with scheduling decisions. + + + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/node-allocatable.md?pixel)]() + diff --git a/node-allocatable.png b/node-allocatable.png new file mode 100644 index 00000000..d6f5383e Binary files /dev/null and b/node-allocatable.png differ -- cgit v1.2.3 From 2d307ee9e0a5b58e803af670bb311fb7fac11936 Mon Sep 17 00:00:00 2001 From: Daniel Smith Date: Thu, 15 Oct 2015 13:48:59 -0700 Subject: Create client package proposal --- client-package-structure.md | 349 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 349 insertions(+) create mode 100644 client-package-structure.md diff --git a/client-package-structure.md b/client-package-structure.md new file mode 100644 index 00000000..2739f30a --- /dev/null +++ b/client-package-structure.md @@ -0,0 +1,349 @@ + + + + +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/proposals/client-package-structure.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- + + + + + + + +- [Client: layering and package structure](#client-layering-and-package-structure) + - [Desired layers](#desired-layers) + - [Transport](#transport) + - [RESTClient/request.go](#restclientrequestgo) + - [Mux layer](#mux-layer) + - [High-level: Individual typed](#high-level-individual-typed) + - [High-level, typed: Discovery](#high-level-typed-discovery) + - [High-level: Dynamic](#high-level-dynamic) + - [High-level: Client Sets](#high-level-client-sets) + - [Package Structure](#package-structure) + - [Client Guarantees (and testing)](#client-guarantees-and-testing) + + + +# Client: layering and package structure + +## Desired layers + +### Transport + +The transport layer is concerned with round-tripping requests to an apiserver +somewhere. It consumes a Config object with options appropriate for this. +(That's most of the current client.Config structure.) + +Transport delivers an object that implements http's RoundTripper interface +and/or can be used in place of http.DefaultTransport to route requests. + +Transport objects are safe for concurrent use, and are cached and reused by +subsequent layers. + +Tentative name: "Transport". + +It's expected that the transport config will be general enough that third +parties (e.g., OpenShift) will not need their own implementation, rather they +can change the certs, token, etc., to be appropriate for their own servers, +etc.. + +Action items: +* Split out of current client package into a new package. (@krousey) + +### RESTClient/request.go + +RESTClient consumes a Transport and a Codec (and optionally a group/version), +and produces something that implements the interface currently in request.go. +That is, with a RESTClient, you can write chains of calls like: + +`c.Get().Path(p).Param("name", "value").Do()` + +RESTClient is generically usable by any client for servers exposing REST-like +semantics. It provides helpers that benefit those following api-conventions.md, +but does not mandate them. It provides a higher level http interface that +abstracts transport, wire serialization, retry logic, and error handling. +Kubernetes-like constructs that deviate from standard HTTP should be bypassable. +Every non-trivial call made to a remote restful API from Kubernetes code should +go through a rest client. + +The group and version may be empty when constructing a RESTClient. This is valid +for executing discovery commands. The group and version may be overridable with +a chained function call. + +Ideally, no semantic behavior is built into RESTClient, and RESTClient will use +the Codec it was constructed with for all semantic operations, including turning +options objects into URL query parameters. Unfortunately, that is not true of +today's RESTClient, which may have some semantic information built in. We will +remove this. + +RESTClient should not make assumptions about the format of data produced or +consumed by the Codec. Currently, it is JSON, but we want to support binary +protocols in the future. + +The Codec would look something like this: + +```go +type Codec interface { + Encode(runtime.Object) ([]byte, error) + Decode([]byte]) (runtime.Object, error) + + // Used to version-control query parameters + EncodeParameters(optionsObject runtime.Object) (url.Values, error) + + // Not included here since the client doesn't need it, but a corresponding + // DecodeParametersInto method would be available on the server. +} +``` + +There should be one codec per version. RESTClient is *not* responsible for +converting between versions; if a client wishes, they can supply a Codec that +does that. But RESTClient will make the assumption that it's talking to a single +group/version, and will not contain any conversion logic. (This is a slight +change from the current state.) + +As with Transport, it is expected that 3rd party providers following the api +conventions should be able to use RESTClient, and will not need to implement +their own. + +Action items: +* Split out of the current client package. (@krousey) +* Possibly, convert to an interface (currently, it's a struct). This will allow + extending the error-checking monad that's currently in request.go up an + additional layer. +* Switch from ParamX("x") functions to using types representing the collection + of parameters and the Codec for query parameter serialization. +* Any other Kubernetes group specific behavior should also be removed from + RESTClient. + +### Mux layer + +(See TODO at end; this can probably be merged with the "client set" concept.) + +The client muxer layer has a map of group/version to cached RESTClient, and +knows how to construct a new RESTClient in case of a cache miss (using the +discovery client mentioned below). The ClientMux may need to deal with multiple +transports pointing at differing destinations (e.g. OpenShift or other 3rd party +provider API may be at a different location). + +When constructing a RESTClient generically, the muxer will just use the Codec +the high-level dynamic client would use. Alternatively, the user should be able +to pass in a Codec-- for the case where the correct types are compiled in. + +Tentative name: ClientMux + +Action items: +* Move client cache out of kubectl libraries into a more general home. +* TODO: a mux layer may not be necessary, depending on what needs to be cached. + If transports are cached already, and RESTClients are extremely light-weight, + there may not need to be much code at all in this layer. + +### High-level: Individual typed + +Our current high-level client allows you to write things like +`c.Pods("namespace").Create(p)`; we will insert a level for the group. + +That is, the system will be: + +`clientset.GroupName().NamespaceSpecifier().Action()` + +Where: +* `clientset` is a thing that holds multiple individually typed clients (see + below). +* `GroupName()` returns the generated client that this section is about. +* `NamespaceSpecifier()` may take a namespace parameter or nothing. +* `Action` is one of Create/Get/Update/Delete/Watch, or appropriate actions + from the type's subresources. +* It is TBD how we'll represent subresources and their actions. This is + inconsistent in the current clients, so we'll need to define a consistent + format. Possible choices: + * Insert a `.Subresource()` before the `.Action()` + * Flatten subresources, such that they become special Actions on the parent + resource. + +The types returned/consumed by such functions will be e.g. api/v1, NOT the +current version inspecific types. The current internal-versioned client is +inconvenient for users, as it does not protect them from having to recompile +their code with every minor update. (We may continue to generate an +internal-versioned client for our own use for a while, but even for our own +components it probably makes sense to switch to specifically versioned clients.) + +We will provide this structure for each version of each group. It is infeasible +to do this manually, so we will generate this. The generator will accept both +swagger and the ordinary go types. The generator should operate on out-of-tree +sources AND out-of-tree destinations, so it will be useful for consuming +out-of-tree APIs and for others to build custom clients into their own +repositories. + +Typed clients will be constructabale given a ClientMux; the typed constructor will use +the ClientMux to find or construct an appropriate RESTClient. Alternatively, a +typed client should be constructable individually given a config, from which it +will be able to construct the appropriate RESTClient. + +Typed clients do not require any version negotiation. The server either supports +the client's group/version, or it does not. However, there are ways around this: +* If you want to use a typed client against a server's API endpoint and the + server's API version doesn't match the client's API version, you can construct + the client with a RESTClient using a Codec that does the conversion (this is + basically what our client does now). +* Alternatively, you could use the dynamic client. + +Action items: +* Move current typed clients into new directory structure (described below) +* Finish client generation logic. (@caesarxuchao, @lavalamp) + +#### High-level, typed: Discovery + +A `DiscoveryClient` is necessary to discover the api groups, versions, and +resources a server supports. It's constructable given a RESTClient. It is +consumed by both the ClientMux and users who want to iterate over groups, +versions, or resources. (Example: namespace controller.) + +The DiscoveryClient is *not* required if you already know the group/version of +the resource you want to use: you can simply try the operation without checking +first, which is lower-latency anyway as it avoids an extra round-trip. + +Action items: +* Refactor existing functions to present a sane interface, as close to that + offered by the other typed clients as possible. (@caeserxuchao) +* Use a RESTClient to make the necessary API calls. +* Make sure that no discovery happens unless it is explicitly requested. (Make + sure SetKubeDefaults doesn't call it, for example.) + +### High-level: Dynamic + +The dynamic client lets users consume apis which are not compiled into their +binary. It will provide the same interface as the typed client, but will take +and return `runtime.Object`s instead of typed objects. There is only one dynamic +client, so it's not necessary to generate it, although optionally we may do so +depending on whether the typed client generator makes it easy. + +A dynamic client is constructable given a config, group, and version. It will +use this to construct a RESTClient with a Codec which encodes/decodes to +'Unstructured' `runtime.Object`s. The group and version may be from a previous +invocation of a DiscoveryClient, or they may be known by other means. + +For now, the dynamic client will assume that a JSON encoding is allowed. In the +future, if we have binary-only APIs (unlikely?), we can add that to the +discovery information and construct an appropriate dynamic Codec. + +Action items: +* A rudimentary version of this exists in kubectl's builder. It needs to be + moved to a more general place. +* Produce a useful 'Unstructured' runtime.Object, which allows for easy + Object/ListMeta introspection. + +### High-level: Client Sets + +Because there will be multiple groups with multiple versions, we will provide an +aggregation layer that combines multiple typed clients in a single object. + +We do this to: +* Deliver a concrete thing for users to consume, construct, and pass around. We + don't want people making 10 typed clients and making a random system to keep + track of them. +* Constrain the testing matrix. Users can generate a client set at their whim + against their cluster, but we need to make guarantees that the clients we + shipped with v1.X.0 will work with v1.X+1.0, and vice versa. That's not + practical unless we "bless" a particular version of each API group and ship an + official client set with earch release. (If the server supports 15 groups with + 2 versions each, that's 2^15 different possible client sets. We don't want to + test all of them.) + +A client set is generated into its own package. The generator will take the list +of group/versions to be included. Only one version from each group will be in +the client set. + +A client set is constructable at runtime from either a ClientMux or a transport +config (for easy one-stop-shopping). + +An example: + +```go +import ( + api_v1 "k8s.io/kubernetes/pkg/client/typed/generated/v1" + ext_v1beta1 "k8s.io/kubernetes/pkg/client/typed/generated/extensions/v1beta1" + net_v1beta1 "k8s.io/kubernetes/pkg/client/typed/generated/net/v1beta1" + "k8s.io/kubernetes/pkg/client/typed/dynamic" +) + +type Client interface { + API() api_v1.Client + Extensions() ext_v1beta1.Client + Net() net_v1beta1.Client + // ... other typed clients here. + + // Included in every set + Discovery() discovery.Client + GroupVersion(group, version string) dynamic.Client +} +``` + +Note that a particular version is chosen for each group. It is a general rule +for our API structure that no client need care about more than one version of +each group at a time. + +This is the primary deliverable that people would consume. It is also generated. + +Action items: +* This needs to be built. It will replace the ClientInterface that everyone + passes around right now. + +## Package Structure + +``` +pkg/client/ +----------/transport/ # transport & associated config +----------/restclient/ +----------/clientmux/ +----------/typed/ +----------------/discovery/ +----------------/generated/ +--------------------------// +----------------------------------// +--------------------------------------------/.go +----------------/dynamic/ +----------/clientsets/ +---------------------/release-1.1/ +---------------------/release-1.2/ +---------------------/the-test-set-you-just-generated/ +``` + +`/clientsets/` will retain their contents until they reach their expire date. +e.g., when we release v1.N, we'll remove clientset v1.(N-3). Clients from old +releases live on and continue to work (i.e., are tested) without any interface +changes for multiple releases, to give users time to transition. + +## Client Guarantees (and testing) + +Once we release a clientset, we will not make interface changes to it. Users of +that client will not have to change their code until they are deliberately +upgrading their import. We probably will want to generate some sort of stub test +with a clienset, to ensure that we don't change the interface. + + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/client-package-structure.md?pixel)]() + -- cgit v1.2.3 From bea0fc0517fbd2522028692302e609742aed7b94 Mon Sep 17 00:00:00 2001 From: zhengguoyong Date: Wed, 9 Dec 2015 16:35:48 +0800 Subject: Fix small typo --- rescheduler.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/rescheduler.md b/rescheduler.md index 550c2270..f10eb6d3 100644 --- a/rescheduler.md +++ b/rescheduler.md @@ -107,7 +107,7 @@ antagonism and ask the rescheduler to move one of the antagonists to a new node. The vast majority of users probably only care about rescheduling for three scenarios: 1. Move Pods around to get a PENDING Pod to schedule -1. Redistribute Pods onto new nodes added by a cluster auto-scaler when ther are no PENDING Pods +1. Redistribute Pods onto new nodes added by a cluster auto-scaler when there are no PENDING Pods 1. Move Pods around when CPU starvation is detected on a node ## Design considerations and design space -- cgit v1.2.3 From 7da888eee00d0ca825059b7ceaf05dbdecceaf38 Mon Sep 17 00:00:00 2001 From: Filip Grzadkowski Date: Wed, 25 Nov 2015 14:50:46 +0100 Subject: Update documents for release process --- releasing.md | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/releasing.md b/releasing.md index 757048ad..3cefb725 100644 --- a/releasing.md +++ b/releasing.md @@ -134,7 +134,7 @@ git clone git@github.com:kubernetes/kubernetes.git cd kubernetes ``` -or `git checkout upstream/master` from an existing repo. +or `git fetch upstream && git checkout upstream/master` from an existing repo. Decide what version you're cutting and export it: @@ -210,9 +210,10 @@ release](https://github.com/kubernetes/kubernetes/releases/new): 1. fill in the release title from the draft; 1. re-run the appropriate release notes tool(s) to pick up any changes people have made; -1. find the appropriate `kubernetes.tar.gz` in GCS, download it, double check - the hash (compare to what you had in the release notes draft), and attach it - to the release; and +1. find the appropriate `kubernetes.tar.gz` in [GCS bucket](https:// +console.developers.google.com/storage/browser/kubernetes-release/release/), + download it, double check the hash (compare to what you had in the release + notes draft), and attach it to the release; and 1. publish! ## Injecting Version into Binaries -- cgit v1.2.3 From 458e489bbdd5a92cf483836c722cfdfca497c0d5 Mon Sep 17 00:00:00 2001 From: Tim Hockin Date: Wed, 2 Dec 2015 09:54:21 -0800 Subject: Make go version requirements clearer --- development.md | 20 +++++++++++++++++--- 1 file changed, 17 insertions(+), 3 deletions(-) diff --git a/development.md b/development.md index 09abe1e7..3b5443bc 100644 --- a/development.md +++ b/development.md @@ -33,15 +33,29 @@ Documentation for other releases can be found at # Development Guide -# Releases and Official Builds +This document is intended to be the canonical source of truth for things like +supported toolchain versions for building Kubernetes. If you find a +requirement that this doc does not capture, please file a bug. If you find +other docs with references to requirements that are not simply links to this +doc, please file a bug. + +This document is intended to be relative to the branch in which it is found. +It is guaranteed that requirements will change over time for the development +branch, but release branches of Kubernetes should not change. + +## Releases and Official Builds Official releases are built in Docker containers. Details are [here](http://releases.k8s.io/HEAD/build/README.md). You can do simple builds and development with just a local Docker installation. If want to build go locally outside of docker, please continue below. ## Go development environment -Kubernetes is written in [Go](http://golang.org) programming language. If you haven't set up Go development environment, please follow [this instruction](http://golang.org/doc/code.html) to install go tool and set up GOPATH. Ensure your version of Go is at least 1.3. +Kubernetes is written in the [Go](http://golang.org) programming language. If you haven't set up a Go development environment, please follow [these instructions](http://golang.org/doc/code.html) to install the go tools and set up a GOPATH. + +### Go versions + +Requires Go version 1.4.x or 1.5.x -## Git Setup +## Git setup Below, we outline one of the more common git workflows that core developers use. Other git workflows are also valid. -- cgit v1.2.3 From 5e85d2ace97fc9f4b339bf4323cffb61ee8a44ea Mon Sep 17 00:00:00 2001 From: gmarek Date: Tue, 1 Dec 2015 15:44:14 +0100 Subject: Add a proposal for monitoring cluster performance --- performance-related-monitoring.md | 149 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 149 insertions(+) create mode 100644 performance-related-monitoring.md diff --git a/performance-related-monitoring.md b/performance-related-monitoring.md new file mode 100644 index 00000000..f2752bec --- /dev/null +++ b/performance-related-monitoring.md @@ -0,0 +1,149 @@ + + + + +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/proposals/performance-related-monitoring.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- + + + + + +# Performance Monitoring + +## Reason for this document + +This document serves as a place to gather information about past performance regressions, their reason and impact and discuss ideas to avoid similar regressions in the future. +Main reason behind doing this is to understand what kind of monitoring needs to be in place to keep Kubernetes fast. + +## Known past and present performance issues + +### Higher logging level causing scheduler stair stepping + +Issue https://github.com/kubernetes/kubernetes/issues/14216 was opened because @spiffxp observed a regression in scheduler performance in 1.1 branch in comparison to `old` 1.0 +cut. In the end it turned out the be caused by `--v=4` (instead of default `--v=2`) flag in the scheduler together with the flag `--logtostderr` which disables batching of +log lines and a number of loging without explicit V level. This caused weird behavior of the whole component. + +Because we now know that logging may have big performance impact we should consider instrumenting logging mechanism and compute statistics such as number of logged messages, +total and average size of them. Each binary should be responsible for exposing its metrics. An unaccounted but way too big number of days, if not weeks, of engineering time was +lost because of this issue. + +### Adding per-pod probe-time, which increased the number of PodStatus updates, causing major slowdown + +In September 2015 we tried to add per-pod probe times to the PodStatus. It caused (https://github.com/kubernetes/kubernetes/issues/14273) a massive increase in both number and +total volume of object (PodStatus) changes. It drastically increased the load on API server which wasn’t able to handle new number of requests quickly enough, violating our +response time SLO. We had to revert this change. + +### Late Ready->Running PodPhase transition caused test failures as it seemed like slowdown + +In late September we encountered a strange problem (https://github.com/kubernetes/kubernetes/issues/14554): we observed an increased observed latencies in small clusters (few +Nodes). It turned out that it’s caused by an added latency between PodRunning and PodReady phases. This was not a real regression, but our tests thought it were, which shows +how careful we need to be. + +### Huge number of handshakes slows down API server + +It was a long standing issue for performance and is/was an important bottleneck for scalability (https://github.com/kubernetes/kubernetes/issues/13671). The bug directly +causing this problem was incorrect (from the golangs standpoint) handling of TCP connections. Secondary issue was that elliptic curve encryption (only one available in go 1.4) +is unbelievably slow. + +## Proposed metrics/statistics to gather/compute to avoid problems + +### Cluster-level metrics + +Basic ideas: +- number of Pods/ReplicationControllers/Services in the cluster +- number of running replicas of master components (if they are replicated) +- current elected master of ectd cluster (if running distributed version) +- nuber of master component restarts +- number of lost Nodes + +### Logging monitoring + +Log spam is a serious problem and we need to keep it under control. Simplest way to check for regressions, suggested by @bredanburns, is to compute the rate in which log files +grow in e2e tests. + +Basic ideas: +- log generation rate (B/s) + +### REST call monitoring + +We do measure REST call duration in the Density test, but we need an API server monitoring as well, to avoid false failures caused e.g. by the network traffic. We already have +some metrics in place (https://github.com/kubernetes/kubernetes/blob/master/pkg/apiserver/metrics/metrics.go), but we need to revisit the list and add some more. + +Basic ideas: +- number of calls per verb, client, resource type +- latency distribution per verb, client, resource type +- number of calls that was rejected per client, resource type and reason (invalid version number, already at maximum number of requests in flight) +- number of relists in various watchers + +### Rate limit monitoring + +Reverse of REST call monitoring done in the API server. We need to know when a given component increases a pressure it puts on the API server. As a proxy for number of +requests sent we can track how saturated are rate limiters. This has additional advantage of giving us data needed to fine-tune rate limiter constants. + +Because we have rate limitting on both ends (client and API server) we should monitor number of inflight requests in API server and how it relates to `max-requests-inflight`. + +Basic ideas: +- percentage of used non-burst limit, +- amount of time in last hour with depleted burst tokens, +- number of inflight requests in API server. + +### Network connection monitoring + +During development we observed incorrect use/reuse of HTTP connections multiple times already. We should at least monitor number of created connections. + +### ETCD monitoring + +@xiang-90 and @hongchaodeng - you probably have way more experience on what'd be good to look at from the ETCD perspective. + +Basic ideas: +- ETCD memory footprint +- number of objects per kind +- read/write latencies per kind +- number of requests from the API server +- read/write counts per key (it may be too heavy though) + +### Resource consumption + +On top of all things mentioned above we need to monitor changes in resource usage in both: cluster components (API server, Kubelet, Scheduler, etc.) and system add-ons +(Heapster, L7 load balancer, etc.). Monitoring memory usage is tricky, because if no limits are set, system won't apply memory pressure to processes, which makes their memory +footprint constantly grow. We argue that monitoring usage in tests still makes sense, as tests should be repeatable, and if memory usage will grow drastically between two runs +it most likely can be attributed to some kind of regression (assuming that nothing else has changed in the environment). + +Basic ideas: +- CPU usage +- memory usage + +### Other saturation metrics + +We should monitor other aspects of the system, which may indicate saturation of some component. + +Basic ideas: +- queue lenght for queues in the system, +- wait time for WaitGroups. + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/performance-related-monitoring.md?pixel)]() + -- cgit v1.2.3 From 343a552e67f238effd78a96be4979762e101a864 Mon Sep 17 00:00:00 2001 From: Justin Santa Barbara Date: Sat, 5 Dec 2015 22:30:46 -0500 Subject: Zone scheduler: Update scheduler docs There's not a huge amount of detail in the docs as to how the scheduler actually works, which is probably a good thing both for readability and because it makes it easier to tweak the zone-spreading approach in the future, but we should include some information that we do spread across zones if zone information is present on the nodes. --- scheduler.md | 2 +- scheduler_algorithm.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/scheduler.md b/scheduler.md index ffc73ca1..2bdb4c16 100755 --- a/scheduler.md +++ b/scheduler.md @@ -47,7 +47,7 @@ will filter out nodes that don't have at least that much resources available (co as the capacity of the node minus the sum of the resource requests of the containers that are already running on the node). Second, it applies a set of "priority functions" that rank the nodes that weren't filtered out by the predicate check. For example, -it tries to spread Pods across nodes while at the same time favoring the least-loaded +it tries to spread Pods across nodes and zones while at the same time favoring the least-loaded nodes (where "load" here is sum of the resource requests of the containers running on the node, divided by the node's capacity). Finally, the node with the highest priority is chosen diff --git a/scheduler_algorithm.md b/scheduler_algorithm.md index c8790af9..3888786c 100755 --- a/scheduler_algorithm.md +++ b/scheduler_algorithm.md @@ -61,7 +61,7 @@ Currently, Kubernetes scheduler provides some practical priority functions, incl - `LeastRequestedPriority`: The node is prioritized based on the fraction of the node that would be free if the new Pod were scheduled onto the node. (In other words, (capacity - sum of requests of all Pods already on the node - request of Pod that is being scheduled) / capacity). CPU and memory are equally weighted. The node with the highest free fraction is the most preferred. Note that this priority function has the effect of spreading Pods across the nodes with respect to resource consumption. - `CalculateNodeLabelPriority`: Prefer nodes that have the specified label. - `BalancedResourceAllocation`: This priority function tries to put the Pod on a node such that the CPU and Memory utilization rate is balanced after the Pod is deployed. -- `CalculateSpreadPriority`: Spread Pods by minimizing the number of Pods belonging to the same service on the same node. +- `CalculateSpreadPriority`: Spread Pods by minimizing the number of Pods belonging to the same service on the same node. If zone information is present on the nodes, the priority will be adjusted so that pods are spread across zones and nodes. - `CalculateAntiAffinityPriority`: Spread Pods by minimizing the number of Pods belonging to the same service on nodes with the same value for a particular label. The details of the above priority functions can be found in [plugin/pkg/scheduler/algorithm/priorities](http://releases.k8s.io/HEAD/plugin/pkg/scheduler/algorithm/priorities/). Kubernetes uses some, but not all, of these priority functions by default. You can see which ones are used by default in [plugin/pkg/scheduler/algorithmprovider/defaults/defaults.go](http://releases.k8s.io/HEAD/plugin/pkg/scheduler/algorithmprovider/defaults/defaults.go). Similar as predicates, you can combine the above priority functions and assign weight factors (positive number) to them as you want (check [scheduler.md](scheduler.md) for how to customize). -- cgit v1.2.3 From cd70be19ddd8803c66bc237a443eae9ceead286b Mon Sep 17 00:00:00 2001 From: Paul Morie Date: Wed, 2 Dec 2015 23:56:13 -0500 Subject: Proposal: internalize ownership management of volumes into plugins --- volume-ownership-management.md | 141 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 141 insertions(+) create mode 100644 volume-ownership-management.md diff --git a/volume-ownership-management.md b/volume-ownership-management.md new file mode 100644 index 00000000..8dd4b8bb --- /dev/null +++ b/volume-ownership-management.md @@ -0,0 +1,141 @@ + + + + +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/proposals/volume-ownership-management.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- + + + + + +## Volume plugins and idempotency + +Currently, volume plugins have a `SetUp` method which is called in the context of a higher-level +workflow within the kubelet which has externalized the problem of managing the ownership of volumes. +This design has a number of drawbacks that can be mitigated by completely internalizing all concerns +of volume setup behind the volume plugin `SetUp` method. + +### Known issues with current externalized design + +1. The ownership management is currently repeatedly applied, which breaks packages that require + special permissions in order to work correctly +2. There is a gap between files being mounted/created by volume plugins and when their ownership + is set correctly; race conditions exist around this +3. Solving the correct application of ownership management in an externalized model is difficult + and makes it clear that the a transaction boundary is being broken by the externalized design + +### Additional issues with externalization + +Fully externalizing any one concern of volumes is difficult for a number of reasons: + +1. Many types of idempotence checks exist, and are used in a variety of combinations and orders +2. Workflow in the kubelet becomes much more complex to handle: + 1. composition of plugins + 2. correct timing of application of ownership management + 3. callback to volume plugins when we know the whole `SetUp` flow is complete and correct + 4. callback to touch sentinel files + 5. etc etc +3. We want to support fully external volume plugins -- would require complex orchestration / chatty + remote API + +## Proposed implementation + +Since all of the ownership information is known in advance of the call to the volume plugin `SetUp` +method, we can easily internalize these concerns into the volume plugins and pass the ownership +information to `SetUp`. + +The volume `Builder` interface's `SetUp` method changes to accept the group that should own the +volume. Plugins become responsible for ensuring that the correct group is applied. The volume +`Attributes` struct can be modified to remove the `SupportsOwnershipManagement` field. + +```go +package volume + +type Builder interface { + // other methods omitted + + // SetUp prepares and mounts/unpacks the volume to a self-determined + // directory path and returns an error. The group ID that should own the volume + // is passed as a parameter. Plugins may choose to ignore the group ID directive + // in the event that they do not support it (example: NFS). A group ID of -1 + // indicates that the group ownership of the volume should not be modified by the plugin. + // + // SetUp will be called multiple times and should be idempotent. + SetUp(gid int64) error +} +``` + +Each volume plugin will have to change to support the new `SetUp` signature. The existing +ownership management code will be refactored into a library that volume plugins can use: + +``` +package volume + +func ManageOwnership(path string, fsGroup int64) error { + // 1. recursive chown of path + // 2. make path +setgid +} +``` + +The workflow from the Kubelet's perspective for handling volume setup and refresh becomes: + +```go +// go-ish pseudocode +func mountExternalVolumes(pod) error { + podVolumes := make(kubecontainer.VolumeMap) + for i := range pod.Spec.Volumes { + volSpec := &pod.Spec.Volumes[i] + var fsGroup int64 = 0 + if pod.Spec.SecurityContext != nil && + pod.Spec.SecurityContext.FSGroup != nil { + fsGroup = *pod.Spec.SecurityContext.FSGroup + } else { + fsGroup = -1 + } + + // Try to use a plugin for this volume. + plugin := volume.NewSpecFromVolume(volSpec) + builder, err := kl.newVolumeBuilderFromPlugins(plugin, pod) + if err != nil { + return err + } + if builder == nil { + return errUnsupportedVolumeType + } + + err := builder.SetUp(fsGroup) + if err != nil { + return nil + } + } + + return nil +} +``` + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/volume-ownership-management.md?pixel)]() + -- cgit v1.2.3 From ad6bfda32161984d88dd14e8c3c43a739f4db2d4 Mon Sep 17 00:00:00 2001 From: Paul Morie Date: Mon, 14 Dec 2015 15:03:21 -0500 Subject: Add note about type comments to API changes doc --- api_changes.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/api_changes.md b/api_changes.md index 4bbb5bd4..d2f0aea7 100644 --- a/api_changes.md +++ b/api_changes.md @@ -320,7 +320,8 @@ before starting "all the rest". The struct definitions for each API are in `pkg/api//types.go`. Edit those files to reflect the change you want to make. Note that all types and non-inline fields in versioned APIs must be preceded by descriptive comments - these are used to generate -documentation. +documentation. Comments for types should not contain the type name; API documentation is +generated from these comments and end-users should not be exposed to golang type names. Optional fields should have the `,omitempty` json tag; fields are interpreted as being required otherwise. -- cgit v1.2.3 From 2743354deee6a23c24c668b936c2a5729ae67f8f Mon Sep 17 00:00:00 2001 From: Tim Hockin Date: Sat, 14 Nov 2015 10:22:42 -0800 Subject: api-conventions: Namespace is label, not subdomain --- api-conventions.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/api-conventions.md b/api-conventions.md index cd64435a..a6314f0b 100644 --- a/api-conventions.md +++ b/api-conventions.md @@ -152,7 +152,7 @@ These fields are required for proper decoding of the object. They may be populat Every object kind MUST have the following metadata in a nested object field called "metadata": -* namespace: a namespace is a DNS compatible subdomain that objects are subdivided into. The default namespace is 'default'. See [docs/user-guide/namespaces.md](../user-guide/namespaces.md) for more. +* namespace: a namespace is a DNS compatible label that objects are subdivided into. The default namespace is 'default'. See [docs/user-guide/namespaces.md](../user-guide/namespaces.md) for more. * name: a string that uniquely identifies this object within the current namespace (see [docs/user-guide/identifiers.md](../user-guide/identifiers.md)). This value is used in the path when retrieving an individual object. * uid: a unique in time and space value (typically an RFC 4122 generated identifier, see [docs/user-guide/identifiers.md](../user-guide/identifiers.md)) used to distinguish between objects with the same name that have been deleted and recreated -- cgit v1.2.3 From 8ecb41df7e8e98a90413409a13054ead8c04eb20 Mon Sep 17 00:00:00 2001 From: Isaac Hollander McCreery Date: Fri, 11 Dec 2015 14:03:41 -0800 Subject: Mark a release as stable when we announce it, and stop using cherry_pick_list.sh --- releasing.md | 52 ++++++++++++++++++++++++++-------------------------- 1 file changed, 26 insertions(+), 26 deletions(-) diff --git a/releasing.md b/releasing.md index 3cefb725..8ab678ef 100644 --- a/releasing.md +++ b/releasing.md @@ -170,39 +170,35 @@ and follow the instructions. ### Publishing binaries and release notes +Only publish a beta release if it's a standalone pre-release (*not* +vX.Y.Z-beta.0). We create beta tags after we do official releases to +maintain proper semantic versioning, but we don't publish these beta releases. + The script you ran above will prompt you to take any remaining steps to push tars, and will also give you a template for the release notes. Compose an -email to the team with the template, and use `build/make-release-notes.sh` -and/or `release-notes/release-notes.go` in -[kubernetes/contrib](https://github.com/kubernetes/contrib) to make the release -notes, (see #17444 for more info). - -- Alpha release: - - Figure out what the PR numbers for this release and last release are, and - get an api-token from GitHub (https://github.com/settings/tokens). From a - clone of kubernetes/contrib at upstream/master, - go run release-notes/release-notes.go --last-release-pr= --current-release-pr= --api-token= - Feel free to prune. -- Beta release: - - Only publish a beta release if it's a standalone pre-release. (We create - beta tags after we do official releases to maintain proper semantic - versioning, *we don't publish these beta releases*.) Use - `./hack/cherry_pick_list.sh ${RELEASE_VERSION}` to get release notes for such a - release. -- Official release: - - From your clone of upstream/master, run `./hack/cherry_pick_list.sh ${RELEASE_VERSION}` - to get the release notes for the patch release you just created. Feel free - to prune anything internal, but typically for patch releases we tend to - include everything in the release notes. - - If this is a first official release (vX.Y.0), look through the release - notes for all of the alpha releases since the last cycle, and include - anything important in release notes. +email to the team with the template. Figure out what the PR numbers for this +release and last release are, and get an api-token from GitHub +(https://github.com/settings/tokens). From a clone of +[kubernetes/contrib](https://github.com/kubernetes/contrib), + +``` +go run release-notes/release-notes.go --last-release-pr= --current-release-pr= --api-token= --base= +``` + +where `` is `master` for alpha releases and `release-X.Y` for beta and official releases. + +**If this is a first official release (vX.Y.0)**, look through the release +notes for all of the alpha releases since the last cycle, and include anything +important in release notes. + +Feel free to edit the notes, (e.g. cherry picks should generally just have the +same title as the original PR). Send the email out, letting people know these are the draft release notes. If they want to change anything, they should update the appropriate PRs with the `release-note` label. -When we're ready to announce the release, [create a GitHub +When you're ready to announce the release, [create a GitHub release](https://github.com/kubernetes/kubernetes/releases/new): 1. pick the appropriate tag; @@ -216,6 +212,10 @@ console.developers.google.com/storage/browser/kubernetes-release/release/), notes draft), and attach it to the release; and 1. publish! +Finally, from a clone of upstream/master, *make sure* you still have +`RELEASE_VERSION` set correctly, and run `./build/mark-stable-release.sh +${RELEASE_VERSION}`. + ## Injecting Version into Binaries *Please note that this information may be out of date. The scripts are the -- cgit v1.2.3 From 12e5ddcbac266b547c34e85e9a09f6e0acf30580 Mon Sep 17 00:00:00 2001 From: Amy Unruh Date: Thu, 3 Dec 2015 15:53:33 -0800 Subject: config best practices doc edits --- coding-conventions.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/coding-conventions.md b/coding-conventions.md index df9f63e7..d51278be 100644 --- a/coding-conventions.md +++ b/coding-conventions.md @@ -72,7 +72,8 @@ Directory and file conventions - Package directories should generally avoid using separators as much as possible (when packages are multiple words, they usually should be in nested subdirectories). - Document directories and filenames should use dashes rather than underscores - Contrived examples that illustrate system features belong in /docs/user-guide or /docs/admin, depending on whether it is a feature primarily intended for users that deploy applications or cluster administrators, respectively. Actual application examples belong in /examples. - - Examples should also illustrate [best practices for using the system](../user-guide/config-best-practices.md) + - Examples should also illustrate + [best practices for configuration and using the system](../user-guide/config-best-practices.md) - Third-party code - Third-party Go code is managed using Godeps - Other third-party code belongs in /third_party -- cgit v1.2.3 From d2fa04145318b85c8e4998aacab1a67e5f7a7283 Mon Sep 17 00:00:00 2001 From: Quinton Hoole Date: Wed, 16 Dec 2015 08:45:08 -0800 Subject: Formalize the meanings of issue priority labels. --- issue-priorities.md | 6 ++++++ 1 file changed, 6 insertions(+) create mode 100644 issue-priorities.md diff --git a/issue-priorities.md b/issue-priorities.md new file mode 100644 index 00000000..718b573c --- /dev/null +++ b/issue-priorities.md @@ -0,0 +1,6 @@ +These are the meanings of the labels priority/p0 ... priority/p3 that we apply to issues in order to try to prioritize them relative to each other. We try to apply these priority labels consistently across the entire project, but if you notice an issue that you believe to be misprioritized, please do let us know and we will evaluate your counter-proposal. + +- **priority/P0**: Must be actively worked on as someone's top priority right now. Stuff is burning. If it's not being actively worked on, someone is expected to drop what they're doing immediately to work on it. TL's of teams responsible for making sure that all P0's in their area are being actively worked on. +- **priority/P1**: Should be staffed and worked on either currently, or very soon, ideally in time for the next release. +- **priority/P2**: Agreed that this would be good to have, but we possibly don't have anyone available to work on it right now or in the immediate future. Hopefully in the future some time we will. Community contributions would be most welcome in the mean time. +-**priority/P3**: Probably useful, but not yet enough support to actually get it done. These are mostly place-holders for potentially good ideas, so that they don't get completely forgotten, and can be referenced/deduped every time they come up. -- cgit v1.2.3 From 755b7cef5baf28d9a384f50766e52889372292a3 Mon Sep 17 00:00:00 2001 From: Quinton Hoole Date: Wed, 16 Dec 2015 08:59:49 -0800 Subject: Tidy up definitions of issue priority labels. --- issue-priorities.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/issue-priorities.md b/issue-priorities.md index 718b573c..1d3bd57b 100644 --- a/issue-priorities.md +++ b/issue-priorities.md @@ -1,6 +1,6 @@ These are the meanings of the labels priority/p0 ... priority/p3 that we apply to issues in order to try to prioritize them relative to each other. We try to apply these priority labels consistently across the entire project, but if you notice an issue that you believe to be misprioritized, please do let us know and we will evaluate your counter-proposal. -- **priority/P0**: Must be actively worked on as someone's top priority right now. Stuff is burning. If it's not being actively worked on, someone is expected to drop what they're doing immediately to work on it. TL's of teams responsible for making sure that all P0's in their area are being actively worked on. +- **priority/P0**: Must be actively worked on as someone's top priority right now. Stuff is burning. If it's not being actively worked on, someone is expected to drop what they're doing immediately to work on it. TL's of teams are responsible for making sure that all P0's in their area are being actively worked on. - **priority/P1**: Should be staffed and worked on either currently, or very soon, ideally in time for the next release. -- **priority/P2**: Agreed that this would be good to have, but we possibly don't have anyone available to work on it right now or in the immediate future. Hopefully in the future some time we will. Community contributions would be most welcome in the mean time. --**priority/P3**: Probably useful, but not yet enough support to actually get it done. These are mostly place-holders for potentially good ideas, so that they don't get completely forgotten, and can be referenced/deduped every time they come up. +- **priority/P2**: There appears to be general agreement that this would be good to have, but we possibly don't have anyone available to work on it right now or in the immediate future. Hopefully in the future some time we will. Community contributions would be most welcome in the mean time (although it might take a while to get them reviewed if reviewers are fully occupied with higher priority issues, for example immediately before a release). +- **priority/P3**: Probably useful, but not yet enough support to actually get it done. These are mostly place-holders for potentially good ideas, so that they don't get completely forgotten, and can be referenced/deduped every time they come up. -- cgit v1.2.3 From a2ccb32f3e46b2d79e28f12a7f0feb5d75a7a7c4 Mon Sep 17 00:00:00 2001 From: Quinton Hoole Date: Wed, 16 Dec 2015 09:47:12 -0800 Subject: Addressed thockin's comments. --- issue-priorities.md | 6 ++++++ 1 file changed, 6 insertions(+) create mode 100644 issue-priorities.md diff --git a/issue-priorities.md b/issue-priorities.md new file mode 100644 index 00000000..8b6e69f5 --- /dev/null +++ b/issue-priorities.md @@ -0,0 +1,6 @@ +These are the meanings of the labels priority/P0 ... priority/P3 that we apply to issues in order to try to prioritize them relative to each other. We try to apply these priority labels consistently across the entire project, but if you notice an issue that you believe to be misprioritized, please do let us know and we will evaluate your counter-proposal. + +- **priority/P0**: Must be actively worked on as someone's top priority right now. Stuff is burning. If it's not being actively worked on, someone is expected to drop what they're doing immediately to work on it. TL's of teams are responsible for making sure that all P0's in their area are being actively worked on. Examples include user-visible bugs in core features, broken builds or tests and critical security issues. +- **priority/P1**: Must be staffed and worked on either currently, or very soon, ideally in time for the next release. +- **priority/P2**: There appears to be general agreement that this would be good to have, but we don't have anyone available to work on it right now or in the immediate future. Community contributions would be most welcome in the mean time (although it might take a while to get them reviewed if reviewers are fully occupied with higher priority issues, for example immediately before a release). +- **priority/P3**: Probably useful, but not yet enough support to actually get it done. These are mostly place-holders for potentially good ideas, so that they don't get completely forgotten, and can be referenced/deduped every time they come up. -- cgit v1.2.3 From 75500fd9f484b9e451b445793af3c3cb3caf0f99 Mon Sep 17 00:00:00 2001 From: Quinton Hoole Date: Wed, 16 Dec 2015 09:47:12 -0800 Subject: Addressed thockin's comments. --- issue-priorities.md | 6 ------ 1 file changed, 6 deletions(-) delete mode 100644 issue-priorities.md diff --git a/issue-priorities.md b/issue-priorities.md deleted file mode 100644 index 1d3bd57b..00000000 --- a/issue-priorities.md +++ /dev/null @@ -1,6 +0,0 @@ -These are the meanings of the labels priority/p0 ... priority/p3 that we apply to issues in order to try to prioritize them relative to each other. We try to apply these priority labels consistently across the entire project, but if you notice an issue that you believe to be misprioritized, please do let us know and we will evaluate your counter-proposal. - -- **priority/P0**: Must be actively worked on as someone's top priority right now. Stuff is burning. If it's not being actively worked on, someone is expected to drop what they're doing immediately to work on it. TL's of teams are responsible for making sure that all P0's in their area are being actively worked on. -- **priority/P1**: Should be staffed and worked on either currently, or very soon, ideally in time for the next release. -- **priority/P2**: There appears to be general agreement that this would be good to have, but we possibly don't have anyone available to work on it right now or in the immediate future. Hopefully in the future some time we will. Community contributions would be most welcome in the mean time (although it might take a while to get them reviewed if reviewers are fully occupied with higher priority issues, for example immediately before a release). -- **priority/P3**: Probably useful, but not yet enough support to actually get it done. These are mostly place-holders for potentially good ideas, so that they don't get completely forgotten, and can be referenced/deduped every time they come up. -- cgit v1.2.3 From 081c9100c770c34894536a0321eb6126771ac06e Mon Sep 17 00:00:00 2001 From: Quinton Hoole Date: Wed, 16 Dec 2015 10:39:02 -0800 Subject: Moved to existing documentation about issue priorities. --- issue-priorities.md | 6 ------ issues.md | 20 +++++++++----------- 2 files changed, 9 insertions(+), 17 deletions(-) delete mode 100644 issue-priorities.md diff --git a/issue-priorities.md b/issue-priorities.md deleted file mode 100644 index 8b6e69f5..00000000 --- a/issue-priorities.md +++ /dev/null @@ -1,6 +0,0 @@ -These are the meanings of the labels priority/P0 ... priority/P3 that we apply to issues in order to try to prioritize them relative to each other. We try to apply these priority labels consistently across the entire project, but if you notice an issue that you believe to be misprioritized, please do let us know and we will evaluate your counter-proposal. - -- **priority/P0**: Must be actively worked on as someone's top priority right now. Stuff is burning. If it's not being actively worked on, someone is expected to drop what they're doing immediately to work on it. TL's of teams are responsible for making sure that all P0's in their area are being actively worked on. Examples include user-visible bugs in core features, broken builds or tests and critical security issues. -- **priority/P1**: Must be staffed and worked on either currently, or very soon, ideally in time for the next release. -- **priority/P2**: There appears to be general agreement that this would be good to have, but we don't have anyone available to work on it right now or in the immediate future. Community contributions would be most welcome in the mean time (although it might take a while to get them reviewed if reviewers are fully occupied with higher priority issues, for example immediately before a release). -- **priority/P3**: Probably useful, but not yet enough support to actually get it done. These are mostly place-holders for potentially good ideas, so that they don't get completely forgotten, and can be referenced/deduped every time they come up. diff --git a/issues.md b/issues.md index f2ce6949..cbad9517 100644 --- a/issues.md +++ b/issues.md @@ -33,23 +33,21 @@ Documentation for other releases can be found at GitHub Issues for the Kubernetes Project ======================================== -A list quick overview of how we will review and prioritize incoming issues at https://github.com/kubernetes/kubernetes/issues +A quick overview of how we will review and prioritize incoming issues at https://github.com/kubernetes/kubernetes/issues Priorities ---------- -We will use GitHub issue labels for prioritization. The absence of a priority label means the bug has not been reviewed and prioritized yet. +We use GitHub issue labels for prioritization. The absence of a +priority label means the bug has not been reviewed and prioritized +yet. -Definitions ------------ -* P0 - something broken for users, build broken, or critical security issue. Someone must drop everything and work on it. -* P1 - must fix for earliest possible binary release (every two weeks) -* P2 - should be fixed in next major release version -* P3 - default priority for lower importance bugs that we still want to track and plan to fix at some point -* design - priority/design is for issues that are used to track design discussions -* support - priority/support is used for issues tracking user support requests -* untriaged - anything without a priority/X label will be considered untriaged +We try to apply these priority labels consistently across the entire project, but if you notice an issue that you believe to be misprioritized, please do let us know and we will evaluate your counter-proposal.\ +- **priority/P0**: Must be actively worked on as someone's top priority right now. Stuff is burning. If it's not being actively worked on, someone is expected to drop what they're doing immediately to work on it. TL's of teams are responsible for making sure that all P0's in their area are being actively worked on. Examples include user-visible bugs in core features, broken builds or tests and critical security issues. +- **priority/P1**: Must be staffed and worked on either currently, or very soon, ideally in time for the next release. +- **priority/P2**: There appears to be general agreement that this would be good to have, but we don't have anyone available to work on it right now or in the immediate future. Community contributions would be most welcome in the mean time (although it might take a while to get them reviewed if reviewers are fully occupied with higher priority issues, for example immediately before a release). +- **priority/P3**: Possibly useful, but not yet enough support to actually get it done. These are mostly place-holders for potentially good ideas, so that they don't get completely forgotten, and can be referenced/deduped every time they come up. [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/issues.md?pixel)]() -- cgit v1.2.3 From 9def9b378e36820a87d284e56039041bc642884a Mon Sep 17 00:00:00 2001 From: Chao Xu Date: Thu, 17 Dec 2015 10:57:55 -0800 Subject: add the required changes in master to devel/releasing.md --- releasing.md | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/releasing.md b/releasing.md index 8ab678ef..d47202f2 100644 --- a/releasing.md +++ b/releasing.md @@ -46,6 +46,7 @@ release breaks down into four pieces: 1. cutting/branching the release; 1. building and pushing the binaries; and 1. publishing binaries and release notes. +1. updating the master branch. You should progress in this strict order. @@ -216,6 +217,15 @@ Finally, from a clone of upstream/master, *make sure* you still have `RELEASE_VERSION` set correctly, and run `./build/mark-stable-release.sh ${RELEASE_VERSION}`. +### Updating the master branch + +If you are cutting a new release series, please also update the master branch: +change the `latestReleaseBranch` in `cmd/mungedocs/mungedocs.go` to the new +release branch (`release-X.Y`), run `hack/update-generated-docs.sh`. This will +let the unversioned warning in docs point to the latest release series. Please +send the changes as a PR titled "Update the latestReleaseBranch to release-X.Y +in the munger". + ## Injecting Version into Binaries *Please note that this information may be out of date. The scripts are the -- cgit v1.2.3 From 88882f06f45b07117ed96f6136b25c93f75aad4c Mon Sep 17 00:00:00 2001 From: Tim Hockin Date: Sat, 14 Nov 2015 12:26:04 -0800 Subject: Clean up and document validation strings Also add a detail string for Required and Forbidden. Fix tests. --- api-conventions.md | 30 ++++++++++++++++++++++++++++++ 1 file changed, 30 insertions(+) diff --git a/api-conventions.md b/api-conventions.md index a6314f0b..1fe165a6 100644 --- a/api-conventions.md +++ b/api-conventions.md @@ -76,6 +76,7 @@ using resources with kubectl can be found in [Working with resources](../user-gu - [Naming conventions](#naming-conventions) - [Label, selector, and annotation conventions](#label-selector-and-annotation-conventions) - [WebSockets and SPDY](#websockets-and-spdy) + - [Validation](#validation) @@ -787,6 +788,35 @@ There are two primary protocols in use today: Clients should use the SPDY protocols if their clients have native support, or WebSockets as a fallback. Note that WebSockets is susceptible to Head-of-Line blocking and so clients must read and process each message sequentionally. In the future, an HTTP/2 implementation will be exposed that deprecates SPDY. +## Validation + +API objects are validated upon receipt by the apiserver. Validation errors are +flagged and returned to the caller in a `Failure` status with `reason` set to +`Invalid`. In order to facilitate consistent error messages, we ask that +validation logic adheres to the following guidelines whenever possible (though +exceptional cases will exist). + +* Be as precise as possible. +* Telling users what they CAN do is more useful than telling them what they + CANNOT do. +* When asserting a requirement in the positive, use "must". Examples: "must be + greater than 0", "must match regex '[a-z]+'". Words like "should" imply that + the assertion is optional, and must be avoided. +* When asserting a formatting requirement in the negative, use "must not". + Example: "must not contain '..'". Words like "should not" imply that the + assertion is optional, and must be avoided. +* When asserting a behavioral requirement in the negative, use "may not". + Examples: "may not be specified when otherField is empty", "only `name` may be + specified". +* When referencing a literal string value, indicate the literal in + single-quotes. Example: "must not contain '..'". +* When referencing another field name, indicate the name in back-quotes. + Example: "must be greater than `request`". +* When specifying inequalities, use words rather than symbols. Examples: "must + be less than 256", "must be greater than or equal to 0". Do not use words + like "larger than", "bigger than", "more than", "higher than", etc. +* When specifying numeric ranges, use inclusive ranges when possible. + [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/api-conventions.md?pixel)]() -- cgit v1.2.3 From 50e6624e2bafaf29d658a779a9b2940400cecab3 Mon Sep 17 00:00:00 2001 From: nikhiljindal Date: Mon, 30 Nov 2015 13:17:08 -0800 Subject: Adding a doc to explain the process of updating release docs --- update-release-docs.md | 148 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 148 insertions(+) create mode 100644 update-release-docs.md diff --git a/update-release-docs.md b/update-release-docs.md new file mode 100644 index 00000000..ea8a9b48 --- /dev/null +++ b/update-release-docs.md @@ -0,0 +1,148 @@ + + + + +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/devel/update-release-docs.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- + + + + + +# Table of Contents + + + +- [Table of Contents](#table-of-contents) +- [Overview](#overview) +- [Adding a new docs collection for a release](#adding-a-new-docs-collection-for-a-release) +- [Updating docs in an existing collection](#updating-docs-in-an-existing-collection) + - [Updating docs on HEAD](#updating-docs-on-head) + - [Updating docs in release branch](#updating-docs-in-release-branch) + - [Updating docs in gh-pages branch](#updating-docs-in-gh-pages-branch) + + + +# Overview + +This document explains how to update kubernetes release docs hosted at http://kubernetes.io/docs/. + +http://kubernetes.io is served using the [gh-pages +branch](https://github.com/kubernetes/kubernetes/tree/gh-pages) of kubernetes repo on github. +Updating docs in that branch will update http://kubernetes.io + +There are 2 scenarios which require updating docs: +* Adding a new docs collection for a release. +* Updating docs in an existing collection. + +# Adding a new docs collection for a release + +Whenever a new release series (`release-X.Y`) is cut from `master`, we push the +corresponding set of docs to `http://kubernetes.io/vX.Y/docs`. The steps are as follows: + +* Create a `_vX.Y` folder in `gh-pages` branch. +* Add `vX.Y` as a valid collection in [_config.yml](https://github.com/kubernetes/kubernetes/blob/gh-pages/_config.yml) +* Create a new `_includes/nav_vX.Y.html` file with the navigation menu. This can + be a copy of `_includes/nav_vX.Y-1.html` with links to new docs added and links + to deleted docs removed. Update [_layouts/docwithnav.html] + (https://github.com/kubernetes/kubernetes/blob/gh-pages/_layouts/docwithnav.html) + to include this new navigation html file. Example PR: [#16143](https://github.com/kubernetes/kubernetes/pull/16143). +* [Pull docs from release branch](#updating-docs-in-gh-pages-branch) in `_vX.Y` + folder. + +Once these changes have been submitted, you should be able to reach the docs at +`http://kubernetes.io/vX.Y/docs/` where you can test them. + +To make `X.Y` the default version of docs: + +* Update [_config.yml](https://github.com/kubernetes/kubernetes/blob/gh-pages/_config.yml) + and [/kubernetes/kubernetes/blob/gh-pages/_docs/index.md](https://github.com/kubernetes/kubernetes/blob/gh-pages/_docs/index.md) + to point to the new version. Example PR: [#16416](https://github.com/kubernetes/kubernetes/pull/16416). +* Update [_includes/docversionselector.html](https://github.com/kubernetes/kubernetes/blob/gh-pages/_includes/docversionselector.html) + to make `vX.Y` the default version. +* Add "Disallow: /vX.Y-1/" to existing [robots.txt](https://github.com/kubernetes/kubernetes/blob/gh-pages/robots.txt) + file to hide old content from web crawlers and focus SEO on new docs. Example PR: + [#16388](https://github.com/kubernetes/kubernetes/pull/16388). +* Regenerate [sitemaps.xml](https://github.com/kubernetes/kubernetes/blob/gh-pages/sitemap.xml) + so that it now contains `vX.Y` links. Sitemap can be regenerated using + https://www.xml-sitemaps.com. Example PR: [#17126](https://github.com/kubernetes/kubernetes/pull/17126). +* Resubmit the updated sitemaps file to [Google + webmasters](https://www.google.com/webmasters/tools/sitemap-list?siteUrl=http://kubernetes.io/) for google to index the new links. +* Update [_layouts/docwithnav.html] (https://github.com/kubernetes/kubernetes/blob/gh-pages/_layouts/docwithnav.html) + to include [_includes/archivedocnotice.html](https://github.com/kubernetes/kubernetes/blob/gh-pages/_includes/archivedocnotice.html) + for `vX.Y-1` docs which need to be archived. +* Ping @thockin to update docs.k8s.io to redirect to `http://kubernetes.io/vX.Y/`. [#18788](https://github.com/kubernetes/kubernetes/issues/18788). + +http://kubernetes.io/docs/ should now be redirecting to `http://kubernetes.io/vX.Y/`. + +# Updating docs in an existing collection + +The high level steps to update docs in an existing collection are: + +1. Update docs on `HEAD` (master branch) +2. Cherryick the change in relevant release branch. +3. Update docs on `gh-pages`. + +## Updating docs on HEAD + +[Development guide](development.md) provides general instructions on how to contribute to kubernetes github repo. +[Docs how to guide](how-to-doc.md) provides conventions to follow while writting docs. + +## Updating docs in release branch + +Once docs have been updated in the master branch, the changes need to be +cherrypicked in the latest release branch. +[Cherrypick guide](cherry-picks.md) has more details on how to cherrypick your change. + +## Updating docs in gh-pages branch + +Once release branch has all the relevant changes, we can pull in the latest docs +in `gh-pages` branch. +Run the following 2 commands in `gh-pages` branch to update docs for release `X.Y`: + +``` +_tools/import_docs vX.Y _vX.Y release-X.Y release-X.Y +``` + +For ex: to pull in docs for release 1.1, run: + +``` +_tools/import_docs v1.1 _v1.1 release-1.1 release-1.1 +``` + +Apart from copying over the docs, `_tools/release_docs` also does some post processing +(like updating the links to docs to point to http://kubernetes.io/docs/ instead of pointing to github repo). +Note that we always pull in the docs from release branch and not from master (pulling docs +from master requires some extra processing like versionizing the links and removing unversioned warnings). + +We delete all existing docs before pulling in new ones to ensure that deleted +docs go away. + +If the change added or deleted a doc, then update the corresponding `_includes/nav_vX.Y.html` file as well. + + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/update-release-docs.md?pixel)]() + -- cgit v1.2.3 From 3d4cf50dd255c732440474b1ddf70e96a65c8f77 Mon Sep 17 00:00:00 2001 From: nikhiljindal Date: Thu, 17 Dec 2015 15:04:42 -0800 Subject: Add instructions to run versionize-docs in cherrypick doc --- cherry-picks.md | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/cherry-picks.md b/cherry-picks.md index f407c949..6fae778f 100644 --- a/cherry-picks.md +++ b/cherry-picks.md @@ -47,6 +47,23 @@ hack/cherry_pick_pull.sh upstream/release-3.14 98765 This will walk you through the steps to propose an automated cherry pick of pull #98765 for remote branch `upstream/release-3.14`. +### Cherrypicking a doc change + +If you are cherrypicking a change which adds a doc, then you also need to run +`build/versionize-docs.sh` in the release branch to versionize that doc. +Ideally, just running `hack/cherry_pick_pull.sh` should be enough, but we are not there +yet: [#18861](https://github.com/kubernetes/kubernetes/issues/18861) + +To cherrypick PR 123456 to release-1.1, run the following commands after running `hack/cherry_pick_pull.sh` and before merging the PR: + +``` +$ git checkout -b automated-cherry-pick-of-#123456-upstream-release-1.1 + origin/automated-cherry-pick-of-#123456-upstream-release-1.1 +$ ./build/versionize-docs.sh release-1.1 +$ git commit -a -m "Running versionize docs" +$ git push origin automated-cherry-pick-of-#123456-upstream-release-1.1 +``` + ## Cherry Pick Review Cherry pick pull requests are reviewed differently than normal pull requests. In -- cgit v1.2.3 From ecc0cc2d5b47258f834a82fda4219767c1b0e3f8 Mon Sep 17 00:00:00 2001 From: Clayton Coleman Date: Sun, 20 Dec 2015 14:36:34 -0500 Subject: Document that int32 and int64 must be used in external types --- api-conventions.md | 1 + 1 file changed, 1 insertion(+) diff --git a/api-conventions.md b/api-conventions.md index 1fe165a6..ab049694 100644 --- a/api-conventions.md +++ b/api-conventions.md @@ -256,6 +256,7 @@ This rule maintains the invariant that all JSON/YAML keys are fields in API obje * Do not use unsigned integers, due to inconsistent support across languages and libraries. Just validate that the integer is non-negative if that's the case. * Do not use enums. Use aliases for string instead (e.g., `NodeConditionType`). * Look at similar fields in the API (e.g., ports, durations) and follow the conventions of existing fields. +* All public integer fields MUST use the Go `(u)int32` or Go `(u)int64` types, not `(u)int` (which is ambiguous depending on target platform). Internal types may use `(u)int`. #### Constants -- cgit v1.2.3 From f43cec8f19af7a9a2701d507bc152c44a7eb1528 Mon Sep 17 00:00:00 2001 From: Clayton Coleman Date: Sun, 20 Dec 2015 14:38:34 -0500 Subject: Document lowercase filenames --- coding-conventions.md | 1 + 1 file changed, 1 insertion(+) diff --git a/coding-conventions.md b/coding-conventions.md index d51278be..e1708633 100644 --- a/coding-conventions.md +++ b/coding-conventions.md @@ -68,6 +68,7 @@ Directory and file conventions - Avoid package sprawl. Find an appropriate subdirectory for new packages. (See [#4851](http://issues.k8s.io/4851) for discussion.) - Libraries with no more appropriate home belong in new package subdirectories of pkg/util - Avoid general utility packages. Packages called "util" are suspect. Instead, derive a name that describes your desired function. For example, the utility functions dealing with waiting for operations are in the "wait" package and include functionality like Poll. So the full name is wait.Poll + - All filenames should be lowercase - Go source files and directories use underscores, not dashes - Package directories should generally avoid using separators as much as possible (when packages are multiple words, they usually should be in nested subdirectories). - Document directories and filenames should use dashes rather than underscores -- cgit v1.2.3 From 83db13cc2e582365a830b196a582fa9ff4d5a534 Mon Sep 17 00:00:00 2001 From: Chao Xu Date: Mon, 14 Dec 2015 10:37:38 -0800 Subject: run hack/update-generated-docs.sh --- README.md | 1 + adding-an-APIGroup.md | 4 ---- api-conventions.md | 1 + api_changes.md | 1 + automation.md | 1 + cherry-picks.md | 1 + cli-roadmap.md | 1 + client-libraries.md | 1 + coding-conventions.md | 1 + collab.md | 1 + developer-guides/vagrant.md | 1 + development.md | 1 + e2e-tests.md | 1 + faster_reviews.md | 1 + flaky-tests.md | 1 + getting-builds.md | 1 + instrumentation.md | 1 + issues.md | 1 + kubectl-conventions.md | 1 + kubemark-guide.md | 4 ---- logging.md | 1 + making-release-notes.md | 1 + owners.md | 4 ---- profiling.md | 1 + pull-requests.md | 1 + releasing.md | 1 + scheduler.md | 1 + scheduler_algorithm.md | 1 + update-release-docs.md | 4 ---- writing-a-getting-started-guide.md | 1 + 30 files changed, 26 insertions(+), 16 deletions(-) diff --git a/README.md b/README.md index 87ede398..ed586cd0 100644 --- a/README.md +++ b/README.md @@ -18,6 +18,7 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. + The latest release of this document can be found [here](http://releases.k8s.io/release-1.1/docs/devel/README.md). diff --git a/adding-an-APIGroup.md b/adding-an-APIGroup.md index afef1456..8f67a0ab 100644 --- a/adding-an-APIGroup.md +++ b/adding-an-APIGroup.md @@ -18,10 +18,6 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. - -The latest release of this document can be found -[here](http://releases.k8s.io/release-1.1/docs/devel/adding-an-APIGroup.md). - Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/api-conventions.md b/api-conventions.md index 1fe165a6..17cda1eb 100644 --- a/api-conventions.md +++ b/api-conventions.md @@ -18,6 +18,7 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. + The latest release of this document can be found [here](http://releases.k8s.io/release-1.1/docs/devel/api-conventions.md). diff --git a/api_changes.md b/api_changes.md index d2f0aea7..f5ffbd46 100644 --- a/api_changes.md +++ b/api_changes.md @@ -18,6 +18,7 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. + The latest release of this document can be found [here](http://releases.k8s.io/release-1.1/docs/devel/api_changes.md). diff --git a/automation.md b/automation.md index c21f4ed6..d7cdaef1 100644 --- a/automation.md +++ b/automation.md @@ -18,6 +18,7 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. + The latest release of this document can be found [here](http://releases.k8s.io/release-1.1/docs/devel/automation.md). diff --git a/cherry-picks.md b/cherry-picks.md index f407c949..711f1233 100644 --- a/cherry-picks.md +++ b/cherry-picks.md @@ -18,6 +18,7 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. + The latest release of this document can be found [here](http://releases.k8s.io/release-1.1/docs/devel/cherry-picks.md). diff --git a/cli-roadmap.md b/cli-roadmap.md index de2f4a43..b2ea1894 100644 --- a/cli-roadmap.md +++ b/cli-roadmap.md @@ -18,6 +18,7 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. + The latest release of this document can be found [here](http://releases.k8s.io/release-1.1/docs/devel/cli-roadmap.md). diff --git a/client-libraries.md b/client-libraries.md index a6f3e6ff..fb7cdf6b 100644 --- a/client-libraries.md +++ b/client-libraries.md @@ -18,6 +18,7 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. + The latest release of this document can be found [here](http://releases.k8s.io/release-1.1/docs/devel/client-libraries.md). diff --git a/coding-conventions.md b/coding-conventions.md index e1708633..8b264395 100644 --- a/coding-conventions.md +++ b/coding-conventions.md @@ -18,6 +18,7 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. + The latest release of this document can be found [here](http://releases.k8s.io/release-1.1/docs/devel/coding-conventions.md). diff --git a/collab.md b/collab.md index de2ce10c..28de1035 100644 --- a/collab.md +++ b/collab.md @@ -18,6 +18,7 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. + The latest release of this document can be found [here](http://releases.k8s.io/release-1.1/docs/devel/collab.md). diff --git a/developer-guides/vagrant.md b/developer-guides/vagrant.md index 14ccfe6b..ebb12ab1 100644 --- a/developer-guides/vagrant.md +++ b/developer-guides/vagrant.md @@ -18,6 +18,7 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. + The latest release of this document can be found [here](http://releases.k8s.io/release-1.1/docs/devel/developer-guides/vagrant.md). diff --git a/development.md b/development.md index 3b5443bc..27ce1b8a 100644 --- a/development.md +++ b/development.md @@ -18,6 +18,7 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. + The latest release of this document can be found [here](http://releases.k8s.io/release-1.1/docs/devel/development.md). diff --git a/e2e-tests.md b/e2e-tests.md index d1f909dc..902ba1c1 100644 --- a/e2e-tests.md +++ b/e2e-tests.md @@ -18,6 +18,7 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. + The latest release of this document can be found [here](http://releases.k8s.io/release-1.1/docs/devel/e2e-tests.md). diff --git a/faster_reviews.md b/faster_reviews.md index f0cb159c..18a01fe9 100644 --- a/faster_reviews.md +++ b/faster_reviews.md @@ -18,6 +18,7 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. + The latest release of this document can be found [here](http://releases.k8s.io/release-1.1/docs/devel/faster_reviews.md). diff --git a/flaky-tests.md b/flaky-tests.md index d5cc6a45..51f8bcac 100644 --- a/flaky-tests.md +++ b/flaky-tests.md @@ -18,6 +18,7 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. + The latest release of this document can be found [here](http://releases.k8s.io/release-1.1/docs/devel/flaky-tests.md). diff --git a/getting-builds.md b/getting-builds.md index 375a1fac..0caacb34 100644 --- a/getting-builds.md +++ b/getting-builds.md @@ -18,6 +18,7 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. + The latest release of this document can be found [here](http://releases.k8s.io/release-1.1/docs/devel/getting-builds.md). diff --git a/instrumentation.md b/instrumentation.md index 49f1f077..bfd74026 100644 --- a/instrumentation.md +++ b/instrumentation.md @@ -18,6 +18,7 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. + The latest release of this document can be found [here](http://releases.k8s.io/release-1.1/docs/devel/instrumentation.md). diff --git a/issues.md b/issues.md index cbad9517..483747a1 100644 --- a/issues.md +++ b/issues.md @@ -18,6 +18,7 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. + The latest release of this document can be found [here](http://releases.k8s.io/release-1.1/docs/devel/issues.md). diff --git a/kubectl-conventions.md b/kubectl-conventions.md index 3775c0b3..a3a7b6f6 100644 --- a/kubectl-conventions.md +++ b/kubectl-conventions.md @@ -18,6 +18,7 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. + The latest release of this document can be found [here](http://releases.k8s.io/release-1.1/docs/devel/kubectl-conventions.md). diff --git a/kubemark-guide.md b/kubemark-guide.md index df0ecb96..c2addc8f 100644 --- a/kubemark-guide.md +++ b/kubemark-guide.md @@ -18,10 +18,6 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. - -The latest release of this document can be found -[here](http://releases.k8s.io/release-1.1/docs/devel/kubemark-guide.md). - Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/logging.md b/logging.md index 3dc22ca5..8dca0a9f 100644 --- a/logging.md +++ b/logging.md @@ -18,6 +18,7 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. + The latest release of this document can be found [here](http://releases.k8s.io/release-1.1/docs/devel/logging.md). diff --git a/making-release-notes.md b/making-release-notes.md index 7a2d73c0..48c7d72f 100644 --- a/making-release-notes.md +++ b/making-release-notes.md @@ -18,6 +18,7 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. + The latest release of this document can be found [here](http://releases.k8s.io/release-1.1/docs/devel/making-release-notes.md). diff --git a/owners.md b/owners.md index 22bb2fef..3b5a1aca 100644 --- a/owners.md +++ b/owners.md @@ -18,10 +18,6 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. - -The latest release of this document can be found -[here](http://releases.k8s.io/release-1.1/docs/devel/owners.md). - Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/profiling.md b/profiling.md index f05b9d74..18c87f41 100644 --- a/profiling.md +++ b/profiling.md @@ -18,6 +18,7 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. + The latest release of this document can be found [here](http://releases.k8s.io/release-1.1/docs/devel/profiling.md). diff --git a/pull-requests.md b/pull-requests.md index b97da36e..eaffce23 100644 --- a/pull-requests.md +++ b/pull-requests.md @@ -18,6 +18,7 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. + The latest release of this document can be found [here](http://releases.k8s.io/release-1.1/docs/devel/pull-requests.md). diff --git a/releasing.md b/releasing.md index d47202f2..d43a20cd 100644 --- a/releasing.md +++ b/releasing.md @@ -18,6 +18,7 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. + The latest release of this document can be found [here](http://releases.k8s.io/release-1.1/docs/devel/releasing.md). diff --git a/scheduler.md b/scheduler.md index 2bdb4c16..5051bfed 100755 --- a/scheduler.md +++ b/scheduler.md @@ -18,6 +18,7 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. + The latest release of this document can be found [here](http://releases.k8s.io/release-1.1/docs/devel/scheduler.md). diff --git a/scheduler_algorithm.md b/scheduler_algorithm.md index 3888786c..06c482fd 100755 --- a/scheduler_algorithm.md +++ b/scheduler_algorithm.md @@ -18,6 +18,7 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. + The latest release of this document can be found [here](http://releases.k8s.io/release-1.1/docs/devel/scheduler_algorithm.md). diff --git a/update-release-docs.md b/update-release-docs.md index ea8a9b48..e94c5442 100644 --- a/update-release-docs.md +++ b/update-release-docs.md @@ -18,10 +18,6 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. - -The latest release of this document can be found -[here](http://releases.k8s.io/release-1.1/docs/devel/update-release-docs.md). - Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/writing-a-getting-started-guide.md b/writing-a-getting-started-guide.md index a82691a8..f6b2a4b1 100644 --- a/writing-a-getting-started-guide.md +++ b/writing-a-getting-started-guide.md @@ -18,6 +18,7 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. + The latest release of this document can be found [here](http://releases.k8s.io/release-1.1/docs/devel/writing-a-getting-started-guide.md). -- cgit v1.2.3 From c7513bff5a8a94e36057b28e0ff77d1208e43a88 Mon Sep 17 00:00:00 2001 From: Chao Xu Date: Mon, 14 Dec 2015 10:37:38 -0800 Subject: run hack/update-generated-docs.sh --- api-group.md | 1 + apiserver-watch.md | 1 + autoscaling.md | 1 + client-package-structure.md | 4 ---- compute-resource-metrics-api.md | 1 + configmap.md | 4 ---- custom-metrics.md | 4 ---- deployment.md | 1 + federation.md | 1 + flannel-integration.md | 4 ---- high-availability.md | 1 + initial-resources.md | 1 + job.md | 1 + kubemark.md | 1 + metrics-plumbing.md | 4 ---- multiple-schedulers.md | 4 ---- node-allocatable.md | 4 ---- performance-related-monitoring.md | 4 ---- pod-security-context.md | 1 + rescheduler.md | 1 + resource-qos.md | 1 + scalability-testing.md | 1 + selinux.md | 4 ---- volume-ownership-management.md | 4 ---- volumes.md | 4 ---- 25 files changed, 14 insertions(+), 44 deletions(-) diff --git a/api-group.md b/api-group.md index 9b40bb81..c2f9a073 100644 --- a/api-group.md +++ b/api-group.md @@ -18,6 +18,7 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. + The latest release of this document can be found [here](http://releases.k8s.io/release-1.1/docs/proposals/api-group.md). diff --git a/apiserver-watch.md b/apiserver-watch.md index f2011f13..e112baff 100644 --- a/apiserver-watch.md +++ b/apiserver-watch.md @@ -18,6 +18,7 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. + The latest release of this document can be found [here](http://releases.k8s.io/release-1.1/docs/proposals/apiserver-watch.md). diff --git a/autoscaling.md b/autoscaling.md index 806d1ece..16fe863c 100644 --- a/autoscaling.md +++ b/autoscaling.md @@ -18,6 +18,7 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. + The latest release of this document can be found [here](http://releases.k8s.io/release-1.1/docs/proposals/autoscaling.md). diff --git a/client-package-structure.md b/client-package-structure.md index 2739f30a..14b32035 100644 --- a/client-package-structure.md +++ b/client-package-structure.md @@ -18,10 +18,6 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. - -The latest release of this document can be found -[here](http://releases.k8s.io/release-1.1/docs/proposals/client-package-structure.md). - Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/compute-resource-metrics-api.md b/compute-resource-metrics-api.md index 25ec76cd..f8ae1d3f 100644 --- a/compute-resource-metrics-api.md +++ b/compute-resource-metrics-api.md @@ -18,6 +18,7 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. + The latest release of this document can be found [here](http://releases.k8s.io/release-1.1/docs/proposals/compute-resource-metrics-api.md). diff --git a/configmap.md b/configmap.md index aa36f7a7..be0d0cc6 100644 --- a/configmap.md +++ b/configmap.md @@ -18,10 +18,6 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. - -The latest release of this document can be found -[here](http://releases.k8s.io/release-1.1/docs/proposals/configmap.md). - Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/custom-metrics.md b/custom-metrics.md index 6cdf1624..25e263b1 100644 --- a/custom-metrics.md +++ b/custom-metrics.md @@ -18,10 +18,6 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. - -The latest release of this document can be found -[here](http://releases.k8s.io/release-1.1/docs/proposals/custom-metrics.md). - Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/deployment.md b/deployment.md index c4d4cf88..9a3999a0 100644 --- a/deployment.md +++ b/deployment.md @@ -18,6 +18,7 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. + The latest release of this document can be found [here](http://releases.k8s.io/release-1.1/docs/proposals/deployment.md). diff --git a/federation.md b/federation.md index 0bf6c618..2b63bde4 100644 --- a/federation.md +++ b/federation.md @@ -18,6 +18,7 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. + The latest release of this document can be found [here](http://releases.k8s.io/release-1.1/docs/proposals/federation.md). diff --git a/flannel-integration.md b/flannel-integration.md index 417cab1d..c4cfc4e7 100644 --- a/flannel-integration.md +++ b/flannel-integration.md @@ -18,10 +18,6 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. - -The latest release of this document can be found -[here](http://releases.k8s.io/release-1.1/docs/proposals/flannel-integration.md). - Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/high-availability.md b/high-availability.md index 696c90be..dec20a66 100644 --- a/high-availability.md +++ b/high-availability.md @@ -18,6 +18,7 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. + The latest release of this document can be found [here](http://releases.k8s.io/release-1.1/docs/proposals/high-availability.md). diff --git a/initial-resources.md b/initial-resources.md index 1eace646..4c27b010 100644 --- a/initial-resources.md +++ b/initial-resources.md @@ -18,6 +18,7 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. + The latest release of this document can be found [here](http://releases.k8s.io/release-1.1/docs/proposals/initial-resources.md). diff --git a/job.md b/job.md index 6f8befa3..202e39ac 100644 --- a/job.md +++ b/job.md @@ -18,6 +18,7 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. + The latest release of this document can be found [here](http://releases.k8s.io/release-1.1/docs/proposals/job.md). diff --git a/kubemark.md b/kubemark.md index fb7f0e02..ef89b4be 100644 --- a/kubemark.md +++ b/kubemark.md @@ -18,6 +18,7 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. + The latest release of this document can be found [here](http://releases.k8s.io/release-1.1/docs/proposals/kubemark.md). diff --git a/metrics-plumbing.md b/metrics-plumbing.md index 41fbed9b..8489409a 100644 --- a/metrics-plumbing.md +++ b/metrics-plumbing.md @@ -18,10 +18,6 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. - -The latest release of this document can be found -[here](http://releases.k8s.io/release-1.1/docs/proposals/metrics-plumbing.md). - Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/multiple-schedulers.md b/multiple-schedulers.md index 4fadd601..82920653 100644 --- a/multiple-schedulers.md +++ b/multiple-schedulers.md @@ -18,10 +18,6 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. - -The latest release of this document can be found -[here](http://releases.k8s.io/release-1.1/docs/proposals/multiple-schedulers.md). - Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/node-allocatable.md b/node-allocatable.md index 8429eda9..c915bb6a 100644 --- a/node-allocatable.md +++ b/node-allocatable.md @@ -18,10 +18,6 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. - -The latest release of this document can be found -[here](http://releases.k8s.io/release-1.1/docs/proposals/node-allocatable.md). - Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/performance-related-monitoring.md b/performance-related-monitoring.md index f2752bec..e6612fb4 100644 --- a/performance-related-monitoring.md +++ b/performance-related-monitoring.md @@ -18,10 +18,6 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. - -The latest release of this document can be found -[here](http://releases.k8s.io/release-1.1/docs/proposals/performance-related-monitoring.md). - Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/pod-security-context.md b/pod-security-context.md index 0bf4e78c..42cf38f7 100644 --- a/pod-security-context.md +++ b/pod-security-context.md @@ -18,6 +18,7 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. + The latest release of this document can be found [here](http://releases.k8s.io/release-1.1/docs/proposals/pod-security-context.md). diff --git a/rescheduler.md b/rescheduler.md index f10eb6d3..a1bb1c12 100644 --- a/rescheduler.md +++ b/rescheduler.md @@ -18,6 +18,7 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. + The latest release of this document can be found [here](http://releases.k8s.io/release-1.1/docs/proposals/rescheduler.md). diff --git a/resource-qos.md b/resource-qos.md index 1f8dacca..c81d0725 100644 --- a/resource-qos.md +++ b/resource-qos.md @@ -18,6 +18,7 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. + The latest release of this document can be found [here](http://releases.k8s.io/release-1.1/docs/proposals/resource-qos.md). diff --git a/scalability-testing.md b/scalability-testing.md index edcf5172..06c936d4 100644 --- a/scalability-testing.md +++ b/scalability-testing.md @@ -19,6 +19,7 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. + The latest release of this document can be found [here](http://releases.k8s.io/release-1.1/docs/proposals/scalability-testing.md). diff --git a/selinux.md b/selinux.md index fd9eb73c..c4c21ab6 100644 --- a/selinux.md +++ b/selinux.md @@ -18,10 +18,6 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. - -The latest release of this document can be found -[here](http://releases.k8s.io/release-1.1/docs/proposals/selinux.md). - Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/volume-ownership-management.md b/volume-ownership-management.md index 8dd4b8bb..8054398a 100644 --- a/volume-ownership-management.md +++ b/volume-ownership-management.md @@ -18,10 +18,6 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. - -The latest release of this document can be found -[here](http://releases.k8s.io/release-1.1/docs/proposals/volume-ownership-management.md). - Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/volumes.md b/volumes.md index 5be43cf5..2368bd49 100644 --- a/volumes.md +++ b/volumes.md @@ -18,10 +18,6 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. - -The latest release of this document can be found -[here](http://releases.k8s.io/release-1.1/docs/proposals/volumes.md). - Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). -- cgit v1.2.3 From bdd5d6654f654d21a18df8c1364f066f0149071b Mon Sep 17 00:00:00 2001 From: Chao Xu Date: Mon, 14 Dec 2015 10:37:38 -0800 Subject: run hack/update-generated-docs.sh --- README.md | 1 + access.md | 1 + admission_control.md | 1 + admission_control_limit_range.md | 1 + admission_control_resource_quota.md | 1 + architecture.md | 1 + aws_under_the_hood.md | 4 ---- clustering.md | 1 + clustering/README.md | 1 + command_execution_port_forwarding.md | 1 + daemon.md | 1 + enhance-pluggable-policy.md | 4 ---- event_compression.md | 1 + expansion.md | 1 + extending-api.md | 1 + horizontal-pod-autoscaler.md | 1 + identifiers.md | 1 + namespaces.md | 1 + networking.md | 1 + persistent-storage.md | 1 + principles.md | 1 + resources.md | 1 + scheduler_extender.md | 4 ---- secrets.md | 1 + security.md | 1 + security_context.md | 1 + service_accounts.md | 1 + simple-rolling-update.md | 1 + versioning.md | 1 + 29 files changed, 26 insertions(+), 12 deletions(-) diff --git a/README.md b/README.md index ef5a1157..e7beb90b 100644 --- a/README.md +++ b/README.md @@ -18,6 +18,7 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. + The latest release of this document can be found [here](http://releases.k8s.io/release-1.1/docs/design/README.md). diff --git a/access.md b/access.md index 10a0c9fe..fa173392 100644 --- a/access.md +++ b/access.md @@ -18,6 +18,7 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. + The latest release of this document can be found [here](http://releases.k8s.io/release-1.1/docs/design/access.md). diff --git a/admission_control.md b/admission_control.md index e9303728..37cf5e1f 100644 --- a/admission_control.md +++ b/admission_control.md @@ -18,6 +18,7 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. + The latest release of this document can be found [here](http://releases.k8s.io/release-1.1/docs/design/admission_control.md). diff --git a/admission_control_limit_range.md b/admission_control_limit_range.md index d13a98f1..890ba37d 100644 --- a/admission_control_limit_range.md +++ b/admission_control_limit_range.md @@ -18,6 +18,7 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. + The latest release of this document can be found [here](http://releases.k8s.io/release-1.1/docs/design/admission_control_limit_range.md). diff --git a/admission_control_resource_quota.md b/admission_control_resource_quota.md index 31d4a147..2b01ea7e 100644 --- a/admission_control_resource_quota.md +++ b/admission_control_resource_quota.md @@ -18,6 +18,7 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. + The latest release of this document can be found [here](http://releases.k8s.io/release-1.1/docs/design/admission_control_resource_quota.md). diff --git a/architecture.md b/architecture.md index 3bb24e44..93213066 100644 --- a/architecture.md +++ b/architecture.md @@ -18,6 +18,7 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. + The latest release of this document can be found [here](http://releases.k8s.io/release-1.1/docs/design/architecture.md). diff --git a/aws_under_the_hood.md b/aws_under_the_hood.md index a55c09e3..7d895627 100644 --- a/aws_under_the_hood.md +++ b/aws_under_the_hood.md @@ -18,10 +18,6 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. - -The latest release of this document can be found -[here](http://releases.k8s.io/release-1.1/docs/design/aws_under_the_hood.md). - Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/clustering.md b/clustering.md index 66bd0784..01df7410 100644 --- a/clustering.md +++ b/clustering.md @@ -18,6 +18,7 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. + The latest release of this document can be found [here](http://releases.k8s.io/release-1.1/docs/design/clustering.md). diff --git a/clustering/README.md b/clustering/README.md index 073deb05..6f3d379c 100644 --- a/clustering/README.md +++ b/clustering/README.md @@ -18,6 +18,7 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. + The latest release of this document can be found [here](http://releases.k8s.io/release-1.1/docs/design/clustering/README.md). diff --git a/command_execution_port_forwarding.md b/command_execution_port_forwarding.md index dbd7b0eb..89ed7665 100644 --- a/command_execution_port_forwarding.md +++ b/command_execution_port_forwarding.md @@ -18,6 +18,7 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. + The latest release of this document can be found [here](http://releases.k8s.io/release-1.1/docs/design/command_execution_port_forwarding.md). diff --git a/daemon.md b/daemon.md index 29f7e913..d8ed8d43 100644 --- a/daemon.md +++ b/daemon.md @@ -18,6 +18,7 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. + The latest release of this document can be found [here](http://releases.k8s.io/release-1.1/docs/design/daemon.md). diff --git a/enhance-pluggable-policy.md b/enhance-pluggable-policy.md index 6a881250..1ee9bf29 100644 --- a/enhance-pluggable-policy.md +++ b/enhance-pluggable-policy.md @@ -18,10 +18,6 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. - -The latest release of this document can be found -[here](http://releases.k8s.io/release-1.1/docs/design/enhance-pluggable-policy.md). - Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/event_compression.md b/event_compression.md index a8c5916b..c8030559 100644 --- a/event_compression.md +++ b/event_compression.md @@ -18,6 +18,7 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. + The latest release of this document can be found [here](http://releases.k8s.io/release-1.1/docs/design/event_compression.md). diff --git a/expansion.md b/expansion.md index 770ec054..371f7c86 100644 --- a/expansion.md +++ b/expansion.md @@ -18,6 +18,7 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. + The latest release of this document can be found [here](http://releases.k8s.io/release-1.1/docs/design/expansion.md). diff --git a/extending-api.md b/extending-api.md index 1f76235f..5f5e6c0a 100644 --- a/extending-api.md +++ b/extending-api.md @@ -18,6 +18,7 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. + The latest release of this document can be found [here](http://releases.k8s.io/release-1.1/docs/design/extending-api.md). diff --git a/horizontal-pod-autoscaler.md b/horizontal-pod-autoscaler.md index 42cd27bb..7c54da06 100644 --- a/horizontal-pod-autoscaler.md +++ b/horizontal-pod-autoscaler.md @@ -18,6 +18,7 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. + The latest release of this document can be found [here](http://releases.k8s.io/release-1.1/docs/design/horizontal-pod-autoscaler.md). diff --git a/identifiers.md b/identifiers.md index 04ee4ab1..ca2c95df 100644 --- a/identifiers.md +++ b/identifiers.md @@ -18,6 +18,7 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. + The latest release of this document can be found [here](http://releases.k8s.io/release-1.1/docs/design/identifiers.md). diff --git a/namespaces.md b/namespaces.md index b5965348..45e07f72 100644 --- a/namespaces.md +++ b/namespaces.md @@ -18,6 +18,7 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. + The latest release of this document can be found [here](http://releases.k8s.io/release-1.1/docs/design/namespaces.md). diff --git a/networking.md b/networking.md index b110ca75..e5807b50 100644 --- a/networking.md +++ b/networking.md @@ -18,6 +18,7 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. + The latest release of this document can be found [here](http://releases.k8s.io/release-1.1/docs/design/networking.md). diff --git a/persistent-storage.md b/persistent-storage.md index a95ba305..7aa9bfa9 100644 --- a/persistent-storage.md +++ b/persistent-storage.md @@ -18,6 +18,7 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. + The latest release of this document can be found [here](http://releases.k8s.io/release-1.1/docs/design/persistent-storage.md). diff --git a/principles.md b/principles.md index 20343ac4..52b839fb 100644 --- a/principles.md +++ b/principles.md @@ -18,6 +18,7 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. + The latest release of this document can be found [here](http://releases.k8s.io/release-1.1/docs/design/principles.md). diff --git a/resources.md b/resources.md index 9b6ac51b..069ddd6c 100644 --- a/resources.md +++ b/resources.md @@ -18,6 +18,7 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. + The latest release of this document can be found [here](http://releases.k8s.io/release-1.1/docs/design/resources.md). diff --git a/scheduler_extender.md b/scheduler_extender.md index 0c10de59..3a55139d 100644 --- a/scheduler_extender.md +++ b/scheduler_extender.md @@ -18,10 +18,6 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. - -The latest release of this document can be found -[here](http://releases.k8s.io/release-1.1/docs/design/scheduler_extender.md). - Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). diff --git a/secrets.md b/secrets.md index 763c5567..a9941cb3 100644 --- a/secrets.md +++ b/secrets.md @@ -18,6 +18,7 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. + The latest release of this document can be found [here](http://releases.k8s.io/release-1.1/docs/design/secrets.md). diff --git a/security.md b/security.md index e845c925..db380250 100644 --- a/security.md +++ b/security.md @@ -18,6 +18,7 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. + The latest release of this document can be found [here](http://releases.k8s.io/release-1.1/docs/design/security.md). diff --git a/security_context.md b/security_context.md index 413e2a2e..8b9b8c12 100644 --- a/security_context.md +++ b/security_context.md @@ -18,6 +18,7 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. + The latest release of this document can be found [here](http://releases.k8s.io/release-1.1/docs/design/security_context.md). diff --git a/service_accounts.md b/service_accounts.md index fb065d1a..72c3df81 100644 --- a/service_accounts.md +++ b/service_accounts.md @@ -18,6 +18,7 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. + The latest release of this document can be found [here](http://releases.k8s.io/release-1.1/docs/design/service_accounts.md). diff --git a/simple-rolling-update.md b/simple-rolling-update.md index 31f31d67..e34e695c 100644 --- a/simple-rolling-update.md +++ b/simple-rolling-update.md @@ -18,6 +18,7 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. + The latest release of this document can be found [here](http://releases.k8s.io/release-1.1/docs/design/simple-rolling-update.md). diff --git a/versioning.md b/versioning.md index ab7d7ecb..99caa6e6 100644 --- a/versioning.md +++ b/versioning.md @@ -18,6 +18,7 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. + The latest release of this document can be found [here](http://releases.k8s.io/release-1.1/docs/design/versioning.md). -- cgit v1.2.3 From 6373ccfc741fb9a51c80be652f5a449b58d48db8 Mon Sep 17 00:00:00 2001 From: Filip Grzadkowski Date: Mon, 7 Dec 2015 12:05:38 +0100 Subject: Add proposal for simpler cluster deployment --- cluster-deployment.md | 204 ++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 204 insertions(+) create mode 100644 cluster-deployment.md diff --git a/cluster-deployment.md b/cluster-deployment.md new file mode 100644 index 00000000..6d2fc419 --- /dev/null +++ b/cluster-deployment.md @@ -0,0 +1,204 @@ + + + + +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/proposals/cluster-deployment.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- + + + + + +# Objective + +Simplify the cluster provisioning process for a cluster with one master and multiple worker nodes. +It should be secured with SSL and have all the default add-ons. There should not be significant +differences in the provisioning process across deployment targets (cloud provider + OS distribution) +once machines meet the node specification. + +# Overview + +Cluster provisioning can be broken into a number of phases, each with their own exit criteria. +In some cases, multiple phases will be combined together to more seamlessly automate the cluster setup, +but in all cases the phases can be run sequentially to provision a functional cluster. + +It is possible that for some platforms we will provide an optimized flow that combines some of the steps +together, but that is out of scope of this document. + +# Deployment flow + +**Note**: _Exit critieria_ in the following sections are not intended to list all tests that should pass, +rather list those that must pass. + +## Step 1: Provision cluster + +**Objective**: Create a set of machines (master + nodes) where we will deploy Kubernetes. + +For this phase to be completed successfully, the following requirements must be completed for all nodes: +- Basic connectivity between nodes (i.e. nodes can all ping each other) +- Docker installed (and in production setups should be monitored to be always running) +- One of the supported OS + +We will provide a node specification conformance test that will verify if provisioning has been successful. + +This step is provider specific and will be implemented for each cloud provider + OS distribution separately +using provider specific technology (cloud formation, deployment manager, PXE boot, etc). +Some OS distributions may meet the provisioning criteria without needing to run any post-boot steps as they +ship with all of the requirements for the node specification by default. + +**Substeps** (on the GCE example): + +1. Create network +2. Create firewall rules to allow communication inside the cluster +3. Create firewall rule to allow ```ssh``` to all machines +4. Create firewall rule to allow ```https``` to master +5. Create persistent disk for master +6. Create static IP address for master +7. Create master machine +8. Create node machines +9. Install docker on all machines + +**Exit critera**: + +1. Can ```ssh``` to all machines and run a test docker image +2. Can ```ssh``` to master and nodes and ping other machines + +## Step 2: Generate certificates + +**Objective**: Generate security certificates used to configure secure communication between client, master and nodes + +TODO: Enumerate ceritificates which have to be generated. + +## Step 3: Deploy master + +**Objective**: Run kubelet and all the required components (e.g. etcd, apiserver, scheduler, controllers) on the master machine. + +**Substeps**: + +1. copy certificates +2. copy manifests for static pods: + 1. etcd + 2. apiserver, controller manager, scheduler +3. run kubelet in docker container (configuration is read from apiserver Config object) +4. run kubelet-checker in docker container + +**v1.2 simplifications**: + +1. kubelet-runner.sh - we will provide a custom docker image to run kubelet; it will contain +kubelet binary and will run it using ```nsenter``` to workaround problem with mount propagation +1. kubelet config file - we will read kubelet configuration file from disk instead of apiserver; it will +be generated locally and copied to all nodes. + +**Exit criteria**: + +1. Can run basic API calls (e.g. create, list and delete pods) from the client side (e.g. replication +controller works - user can create RC object and RC manager can create pods based on that) +2. Critical master components works: + 1. scheduler + 2. controller manager + +## Step 4: Deploy nodes + +**Objective**: Start kubelet on all nodes and configure kubernetes network. +Each node can be deployed separately and the implementation should make it ~impossible to change this assumption. + +### Step 4.1: Run kubelet + +**Substeps**: + +1. copy certificates +2. run kubelet in docker container (configuration is read from apiserver Config object) +3. run kubelet-checker in docker container + +**v1.2 simplifications**: + +1. kubelet config file - we will read kubelet configuration file from disk instead of apiserver; it will +be generated locally and copied to all nodes. + +**Exit critera**: + +1. All nodes are registered, but not ready due to lack of kubernetes networking. + +### Step 4.2: Setup kubernetes networking + +**Objective**: Configure the Kubernetes networking to allow routing requests to pods and services. + +To keep default setup consistent across open source deployments we will use Flannel to configure +kubernetes networking. However, implementation of this step will allow to easily plug in different +network solutions. + +**Substeps**: + +1. copy manifest for flannel server to master machine +2. create a daemonset with flannel daemon (it will read assigned CIDR and configure network appropriately). + +**v1.2 simplifications**: + +1. flannel daemon will run as a standalone binary (not in docker container) +2. flannel server will assign CIDRs to nodes outside of kubernetes; this will require restarting kubelet +after reconfiguring network bridge on local machine; this will also require running master nad node differently +(```--configure-cbr0=false``` on node and ```--allocate-node-cidrs=false``` on master), which breaks encapsulation +between nodes + +**Exit criteria**: + +1. Pods correctly created, scheduled, run and accessible from all nodes. + +## Step 5: Add daemons + +**Objective:** Start all system daemons (e.g. kube-proxy) + +**Substeps:**: + +1. Create daemonset for kube-proxy + +**Exit criteria**: + +1. Services work correctly on all nodes. + +## Step 6: Add add-ons + +**Objective**: Add default add-ons (e.g. dns, dashboard) + +**Substeps:**: + +1. Create Deployments (and daemonsets if needed) for all add-ons + +## Deployment technology + +We will use Ansible as the default technology for deployment orchestration. It has low requirements on the cluster machines +and seems to be popular in kubernetes community which will help us to maintain it. + +For simpler UX we will provide simple bash scripts that will wrap all basic commands for deployment (e.g. ```up``` or ```down```) + +One disadvantage of using Ansible is that it adds a dependency on a machine which runs deployment scripts. We will workaround +this by distributing deployment scripts via a docker image so that user will run the following command to create a cluster: + +```docker run gcr.io/google_containers/deploy_kubernetes:v1.2 up --num-nodes=3 --provider=aws``` + + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/cluster-deployment.md?pixel)]() + -- cgit v1.2.3 From b3849ceb4436cc722929bd742e6614678835a3ce Mon Sep 17 00:00:00 2001 From: Ed Costello Date: Thu, 29 Oct 2015 14:36:29 -0400 Subject: Copy edits for typos --- api-conventions.md | 2 +- api_changes.md | 4 ++-- automation.md | 2 +- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/api-conventions.md b/api-conventions.md index ab049694..00c2ec62 100644 --- a/api-conventions.md +++ b/api-conventions.md @@ -403,7 +403,7 @@ Using the `omitempty` tag causes swagger documentation to reflect that the field Using a pointer allows distinguishing unset from the zero value for that type. There are some cases where, in principle, a pointer is not needed for an optional field -since the zero value is forbidden, and thus imples unset. There are examples of this in the +since the zero value is forbidden, and thus implies unset. There are examples of this in the codebase. However: - it can be difficult for implementors to anticipate all cases where an empty value might need to be diff --git a/api_changes.md b/api_changes.md index d2f0aea7..015bab3e 100644 --- a/api_changes.md +++ b/api_changes.md @@ -558,7 +558,7 @@ New feature development proceeds through a series of stages of increasing maturi - Development level - Object Versioning: no convention - - Availability: not commited to main kubernetes repo, and thus not available in offical releases + - Availability: not committed to main kubernetes repo, and thus not available in official releases - Audience: other developers closely collaborating on a feature or proof-of-concept - Upgradeability, Reliability, Completeness, and Support: no requirements or guarantees - Alpha level @@ -590,7 +590,7 @@ New feature development proceeds through a series of stages of increasing maturi tests complete; the API has had a thorough API review and is thought to be complete, though use during beta may frequently turn up API issues not thought of during review - Upgradeability: the object schema and semantics may change in a later software release; when - this happens, an upgrade path will be documentedr; in some cases, objects will be automatically + this happens, an upgrade path will be documented; in some cases, objects will be automatically converted to the new version; in other cases, a manual upgrade may be necessary; a manual upgrade may require downtime for anything relying on the new feature, and may require manual conversion of objects to the new version; when manual conversion is necessary, the diff --git a/automation.md b/automation.md index c21f4ed6..5b77425a 100644 --- a/automation.md +++ b/automation.md @@ -35,7 +35,7 @@ Documentation for other releases can be found at ## Overview -Kubernetes uses a variety of automated tools in an attempt to relieve developers of repeptitive, low +Kubernetes uses a variety of automated tools in an attempt to relieve developers of repetitive, low brain power work. This document attempts to describe these processes. -- cgit v1.2.3 From a5f8acafc511d5ea29ec35c8c6d3b2bf7a295e56 Mon Sep 17 00:00:00 2001 From: Ed Costello Date: Thu, 29 Oct 2015 14:36:29 -0400 Subject: Copy edits for typos --- deployment.md | 2 +- pod-security-context.md | 2 +- resource-qos.md | 4 ++-- selinux.md | 6 +++--- volumes.md | 4 ++-- 5 files changed, 9 insertions(+), 9 deletions(-) diff --git a/deployment.md b/deployment.md index c4d4cf88..2cbda972 100644 --- a/deployment.md +++ b/deployment.md @@ -188,7 +188,7 @@ For each pending deployment, it will: and the old RCs have been ramped down to 0. 6. Cleanup. -DeploymentController is stateless so that it can recover incase it crashes during a deployment. +DeploymentController is stateless so that it can recover in case it crashes during a deployment. ### MinReadySeconds diff --git a/pod-security-context.md b/pod-security-context.md index 0bf4e78c..8e03c0ce 100644 --- a/pod-security-context.md +++ b/pod-security-context.md @@ -250,7 +250,7 @@ defined as: > 3. It must be possible to round-trip your change (convert to different API versions and back) with > no loss of information. -Previous versions of this proposal attempted to deal with backward compatiblity by defining +Previous versions of this proposal attempted to deal with backward compatibility by defining the affect of setting the pod-level fields on the container-level fields. While trying to find consensus on this design, it became apparent that this approach was going to be extremely complex to implement, explain, and support. Instead, we will approach backward compatibility as follows: diff --git a/resource-qos.md b/resource-qos.md index 1f8dacca..cdd71d02 100644 --- a/resource-qos.md +++ b/resource-qos.md @@ -119,7 +119,7 @@ Supporting other platforms: Protecting containers and guarantees: - **Control loops**: The OOM score assignment is not perfect for burstable containers, and system OOM kills are expensive. TODO: Add a control loop to reduce memory pressure, while ensuring guarantees for various containers. -- **Kubelet, Kube-proxy, Docker daemon protection**: If a system is overcommitted with memory guaranteed containers, then all prcoesses will have an OOM_SCORE of 0. So Docker daemon could be killed instead of a container or pod being killed. TODO: Place all user-pods into a separate cgroup, and set a limit on the memory they can consume. Initially, the limits can be based on estimated memory usage of Kubelet, Kube-proxy, and CPU limits, eventually we can monitor the resources they consume. +- **Kubelet, Kube-proxy, Docker daemon protection**: If a system is overcommitted with memory guaranteed containers, then all processes will have an OOM_SCORE of 0. So Docker daemon could be killed instead of a container or pod being killed. TODO: Place all user-pods into a separate cgroup, and set a limit on the memory they can consume. Initially, the limits can be based on estimated memory usage of Kubelet, Kube-proxy, and CPU limits, eventually we can monitor the resources they consume. - **OOM Assignment Races**: We cannot set OOM_SCORE_ADJ of a process until it has launched. This could lead to races. For example, suppose that a memory burstable container is using 70% of the system’s memory, and another burstable container is using 30% of the system’s memory. A best-effort burstable container attempts to launch on the Kubelet. Initially the best-effort container is using 2% of memory, and has an OOM_SCORE_ADJ of 20. So its OOM_SCORE is lower than the burstable pod using 70% of system memory. The burstable pod will be evicted by the best-effort pod. Short-term TODO: Implement a restart policy where best-effort pods are immediately evicted if OOM killed, but burstable pods are given a few retries. Long-term TODO: push support for OOM scores in cgroups to the upstream Linux kernel. - **Swap Memory**: The QoS proposal assumes that swap memory is disabled. If swap is enabled, then resource guarantees (for pods that specify resource requirements) will not hold. For example, suppose 2 guaranteed pods have reached their memory limit. They can start allocating memory on swap space. Eventually, if there isn’t enough swap space, processes in the pods might get killed. TODO: ensure that swap space is disabled on our cluster setups scripts. @@ -128,7 +128,7 @@ Killing and eviction mechanics: - **Out of Resource Eviction**: If a container in a multi-container pod fails, we might want restart the entire pod instead of just restarting the container. In some cases (e.g. if a memory best-effort container is out of resource killed), we might change pods to "failed" phase and pods might need to be evicted. TODO: Draft a policy for out of resource eviction and implement it. Maintaining CPU performance: -- **CPU-sharing Issues** Suppose that a node is running 2 container: a container A requesting for 50% of CPU (but without a CPU limit), and a container B not requesting for resoruces. Suppose that both pods try to use as much CPU as possible. After the proposal is implemented, A will get 100% of the CPU, and B will get around 0% of the CPU. However, a fairer scheme would give the Burstable container 75% of the CPU and the Best-Effort container 25% of the CPU (since resources past the Burstable container’s request are not guaranteed). TODO: think about whether this issue to be solved, implement a solution. +- **CPU-sharing Issues** Suppose that a node is running 2 container: a container A requesting for 50% of CPU (but without a CPU limit), and a container B not requesting for resources. Suppose that both pods try to use as much CPU as possible. After the proposal is implemented, A will get 100% of the CPU, and B will get around 0% of the CPU. However, a fairer scheme would give the Burstable container 75% of the CPU and the Best-Effort container 25% of the CPU (since resources past the Burstable container’s request are not guaranteed). TODO: think about whether this issue to be solved, implement a solution. - **CPU kills**: System tasks or daemons like the Kubelet could consume more CPU, and we won't be able to guarantee containers the CPU amount they requested. If the situation persists, we might want to kill the container. TODO: Draft a policy for CPU usage killing and implement it. - **CPU limits**: Enabling CPU limits can be problematic, because processes might be hard capped and might stall for a while. TODO: Enable CPU limits intelligently using CPU quota and core allocation. diff --git a/selinux.md b/selinux.md index fd9eb73c..e3ab2b3e 100644 --- a/selinux.md +++ b/selinux.md @@ -64,7 +64,7 @@ Goals of this design: ### Docker Docker uses a base SELinux context and calculates a unique MCS label per container. The SELinux -context of a container can be overriden with the `SecurityOpt` api that allows setting the different +context of a container can be overridden with the `SecurityOpt` api that allows setting the different parts of the SELinux context individually. Docker has functionality to relabel bind-mounts with a usable SElinux and supports two different @@ -73,7 +73,7 @@ use-cases: 1. The `:Z` bind-mount flag, which tells Docker to relabel a bind-mount with the container's SELinux context 2. The `:z` bind-mount flag, which tells Docker to relabel a bind-mount with the container's - SElinux context, but remove the MCS labels, making the volume shareable beween containers + SElinux context, but remove the MCS labels, making the volume shareable between containers We should avoid using the `:z` flag, because it relaxes the SELinux context so that any container (from an SELinux standpoint) can use the volume. @@ -200,7 +200,7 @@ From the above, we know that label management must be applied: Volumes should be relabeled with the correct SELinux context. Docker has this capability today; it is desireable for other container runtime implementations to provide similar functionality. -Relabeling should be an optional aspect of a volume plugin to accomodate: +Relabeling should be an optional aspect of a volume plugin to accommodate: 1. volume types for which generalized relabeling support is not sufficient 2. testing for each volume plugin individually diff --git a/volumes.md b/volumes.md index 5be43cf5..3eaf4415 100644 --- a/volumes.md +++ b/volumes.md @@ -45,7 +45,7 @@ Goals of this design: 1. Enumerate the different use-cases for volume usage in pods 2. Define the desired goal state for ownership and permission management in Kubernetes -3. Describe the changes necessary to acheive desired state +3. Describe the changes necessary to achieve desired state ## Constraints and Assumptions @@ -250,7 +250,7 @@ override the primary GID and should be safe to use in images that expect GID 0. ### Setting ownership and permissions on volumes For `EmptyDir`-based volumes and unshared storage, `chown` and `chmod` on the node are sufficient to -set ownershp and permissions. Shared storage is different because: +set ownership and permissions. Shared storage is different because: 1. Shared storage may not live on the node a pod that uses it runs on 2. Shared storage may be externally managed -- cgit v1.2.3 From f04f12d31546c069df69a9f706fef41542e51a6e Mon Sep 17 00:00:00 2001 From: Ed Costello Date: Thu, 29 Oct 2015 14:36:29 -0400 Subject: Copy edits for typos --- aws_under_the_hood.md | 6 +++--- daemon.md | 2 +- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/aws_under_the_hood.md b/aws_under_the_hood.md index a55c09e3..d7feb8fc 100644 --- a/aws_under_the_hood.md +++ b/aws_under_the_hood.md @@ -95,7 +95,7 @@ you with sufficient instance storage for your needs. Note: The master uses a persistent volume ([etcd](architecture.md#etcd)) to track its state. Similar to nodes, containers are mostly run against instance -storage, except that we repoint some important data onto the peristent volume. +storage, except that we repoint some important data onto the persistent volume. The default storage driver for Docker images is aufs. Specifying btrfs (by passing the environment variable `DOCKER_STORAGE=btrfs` to kube-up) is also a good choice for a filesystem. btrfs @@ -176,7 +176,7 @@ a distribution file, and then are responsible for attaching and detaching EBS volumes from itself. The node policy is relatively minimal. The master policy is probably overly -permissive. The security concious may want to lock-down the IAM policies +permissive. The security conscious may want to lock-down the IAM policies further ([#11936](http://issues.k8s.io/11936)). We should make it easier to extend IAM permissions and also ensure that they @@ -275,7 +275,7 @@ Salt, for example). These objects can currently be manually created: * Set the `AWS_S3_BUCKET` environment variable to use an existing S3 bucket. * Set the `VPC_ID` environment variable to reuse an existing VPC. -* Set the `SUBNET_ID` environemnt variable to reuse an existing subnet. +* Set the `SUBNET_ID` environment variable to reuse an existing subnet. * If your route table has a matching `KubernetesCluster` tag, it will be reused. * If your security groups are appropriately named, they will be reused. diff --git a/daemon.md b/daemon.md index 29f7e913..a5ff3215 100644 --- a/daemon.md +++ b/daemon.md @@ -65,7 +65,7 @@ The DaemonSet supports standard API features: - Using the pod’s nodeSelector field, DaemonSets can be restricted to operate over nodes that have a certain label. For example, suppose that in a cluster some nodes are labeled ‘app=database’. You can use a DaemonSet to launch a datastore pod on exactly those nodes labeled ‘app=database’. - Using the pod's nodeName field, DaemonSets can be restricted to operate on a specified node. - The PodTemplateSpec used by the DaemonSet is the same as the PodTemplateSpec used by the Replication Controller. - - The initial implementation will not guarnatee that DaemonSet pods are created on nodes before other pods. + - The initial implementation will not guarantee that DaemonSet pods are created on nodes before other pods. - The initial implementation of DaemonSet does not guarantee that DaemonSet pods show up on nodes (for example because of resource limitations of the node), but makes a best effort to launch DaemonSet pods (like Replication Controllers do with pods). Subsequent revisions might ensure that DaemonSet pods show up on nodes, preempting other pods if necessary. - The DaemonSet controller adds an annotation "kubernetes.io/created-by: \" - YAML example: -- cgit v1.2.3 From d8b1f8d6aed960aa01683a736eeee0ff91dbb2b3 Mon Sep 17 00:00:00 2001 From: hurf Date: Sat, 10 Oct 2015 09:51:09 +0800 Subject: Clean up standalone conversion tool Remove kube-version-change for all its functionalities are covered by kubectl convert command. Also changed the related docs. --- adding-an-APIGroup.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/adding-an-APIGroup.md b/adding-an-APIGroup.md index 8f67a0ab..0541af61 100644 --- a/adding-an-APIGroup.md +++ b/adding-an-APIGroup.md @@ -42,7 +42,7 @@ We plan on improving the way the types are factored in the future; see [#16062]( 2. Create pkg/apis/``/{register.go, ``/register.go} to register this group's API objects to the encoding/decoding scheme (e.g., [pkg/apis/extensions/register.go](../../pkg/apis/extensions/register.go) and [pkg/apis/extensions/v1beta1/register.go](../../pkg/apis/extensions/v1beta1/register.go); -3. Add a pkg/apis/``/install/install.go, which is responsible for adding the group to the `latest` package, so that other packages can access the group's meta through `latest.Group`. You probably only need to change the name of group and version in the [example](../../pkg/apis/extensions/install/install.go)). You need to import this `install` package in {pkg/master, pkg/client/unversioned, cmd/kube-version-change}/import_known_versions.go, if you want to make your group accessible to other packages in the kube-apiserver binary, binaries that uses the client package, or the kube-version-change tool. +3. Add a pkg/apis/``/install/install.go, which is responsible for adding the group to the `latest` package, so that other packages can access the group's meta through `latest.Group`. You probably only need to change the name of group and version in the [example](../../pkg/apis/extensions/install/install.go)). You need to import this `install` package in {pkg/master, pkg/client/unversioned}/import_known_versions.go, if you want to make your group accessible to other packages in the kube-apiserver binary, binaries that uses the client package. Step 2 and 3 are mechanical, we plan on autogenerate these using the cmd/libs/go2idl/ tool. -- cgit v1.2.3 From 11f05fc202c256e0ad84797bddcf68ea9af8e96e Mon Sep 17 00:00:00 2001 From: Brandon Philips Date: Sat, 26 Dec 2015 10:19:03 -0800 Subject: docs: apiserver-watch: minor cleanups Reading through this doc I found a few grammar things to fix. --- apiserver-watch.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/apiserver-watch.md b/apiserver-watch.md index f2011f13..8e0d2a44 100644 --- a/apiserver-watch.md +++ b/apiserver-watch.md @@ -33,9 +33,9 @@ Documentation for other releases can be found at ## Abstract -In the current system, all watch requests send to apiserver are in general -redirected to etcd. This means that for every watch request to apiserver, -apiserver opens a watch on etcd. +In the current system, most watch requests sent to apiserver are redirected to +etcd. This means that for every watch request the apiserver opens a watch on +etcd. The purpose of the proposal is to improve the overall performance of the system by solving the following problems: @@ -98,7 +98,7 @@ to implement the proposal. 1. Since we want the watch in apiserver to be optional for different resource types, this needs to be self-contained and hidden behind a well defined API. This should be a layer very close to etcd - in particular all registries: -"pkg/registry/generic/etcd" should be build on top of it. +"pkg/registry/generic/etcd" should be built on top of it. We will solve it by turning tools.EtcdHelper by extracting its interface and treating this interface as this API - the whole watch mechanisms in apiserver will be hidden behind that interface. @@ -168,8 +168,8 @@ the same time, we can introduce an additional etcd event type: in places like [Reflector](../../pkg/client/cache/reflector.go) - However, this might turn out to be unnecessary optimization if apiserver - will always keep up (which is possible in the new design). We will work + However, this might turn out to be unnecessary optimization if apiserver + will always keep up (which is possible in the new design). We will work out all necessary details at that point. -- cgit v1.2.3 From 4f4703bb1ad27a90b4d6263d34843159b126fd7c Mon Sep 17 00:00:00 2001 From: Justin Santa Barbara Date: Sun, 29 Nov 2015 14:00:49 -0500 Subject: Ubernetes Lite: Volumes can dictate zone scheduling For AWS EBS, a volume can only be attached to a node in the same AZ. The scheduler must therefore detect if a volume is being attached to a pod, and ensure that the pod is scheduled on a node in the same AZ as the volume. So that the scheduler need not query the cloud provider every time, and to support decoupled operation (e.g. bare metal) we tag the volume with our placement labels. This is done automatically by means of an admission controller on AWS when a PersistentVolume is created backed by an EBS volume. Support for tagging GCE PVs will follow. Pods that specify a volume directly (i.e. without using a PersistentVolumeClaim) will not currently be scheduled correctly (i.e. they will be scheduled without zone-awareness). --- scheduler_algorithm.md | 1 + 1 file changed, 1 insertion(+) diff --git a/scheduler_algorithm.md b/scheduler_algorithm.md index 06c482fd..00a812a5 100755 --- a/scheduler_algorithm.md +++ b/scheduler_algorithm.md @@ -41,6 +41,7 @@ For each unscheduled Pod, the Kubernetes scheduler tries to find a node across t The purpose of filtering the nodes is to filter out the nodes that do not meet certain requirements of the Pod. For example, if the free resource on a node (measured by the capacity minus the sum of the resource requests of all the Pods that already run on the node) is less than the Pod's required resource, the node should not be considered in the ranking phase so it is filtered out. Currently, there are several "predicates" implementing different filtering policies, including: - `NoDiskConflict`: Evaluate if a pod can fit due to the volumes it requests, and those that are already mounted. +- `NoVolumeZoneConflict`: Evaluate if the volumes a pod requests are available on the node, given the Zone restrictions. - `PodFitsResources`: Check if the free resource (CPU and Memory) meets the requirement of the Pod. The free resource is measured by the capacity minus the sum of requests of all Pods on the node. To learn more about the resource QoS in Kubernetes, please check [QoS proposal](../proposals/resource-qos.md). - `PodFitsHostPorts`: Check if any HostPort required by the Pod is already occupied on the node. - `PodFitsHost`: Filter out all nodes except the one specified in the PodSpec's NodeName field. -- cgit v1.2.3 From 7ea61a4e4c2d63a9e7f2dc10a5b79b4e530e7396 Mon Sep 17 00:00:00 2001 From: David O'Riordan Date: Sun, 3 Jan 2016 14:37:15 +0000 Subject: Add Scala to client library list --- client-libraries.md | 1 + 1 file changed, 1 insertion(+) diff --git a/client-libraries.md b/client-libraries.md index a6f3e6ff..94453c17 100644 --- a/client-libraries.md +++ b/client-libraries.md @@ -50,6 +50,7 @@ Documentation for other releases can be found at * [Node.js](https://github.com/tenxcloud/node-kubernetes-client) * [Perl](https://metacpan.org/pod/Net::Kubernetes) * [Clojure](https://github.com/yanatan16/clj-kubernetes-api) + * [Scala](https://github.com/doriordan/skuber) -- cgit v1.2.3 From 1990cfaf48595c946a74485e9aebfcb763d59383 Mon Sep 17 00:00:00 2001 From: gmarek Date: Tue, 5 Jan 2016 14:43:29 +0100 Subject: Fix generated docs --- cluster-deployment.md | 4 ---- 1 file changed, 4 deletions(-) diff --git a/cluster-deployment.md b/cluster-deployment.md index 6d2fc419..4caf2b9c 100644 --- a/cluster-deployment.md +++ b/cluster-deployment.md @@ -18,10 +18,6 @@ If you are using a released version of Kubernetes, you should refer to the docs that go with that version. - -The latest release of this document can be found -[here](http://releases.k8s.io/release-1.1/docs/proposals/cluster-deployment.md). - Documentation for other releases can be found at [releases.k8s.io](http://releases.k8s.io). -- cgit v1.2.3 From bab0f320bbe2db65b5255da8db60b82a66db60cc Mon Sep 17 00:00:00 2001 From: Justin Santa Barbara Date: Mon, 4 Jan 2016 10:29:11 -0500 Subject: Design doc for Ubernetes Lite Documentation of the Ubernetes-Lite idea and the main design & implementation points. --- federation-lite.md | 230 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 230 insertions(+) create mode 100644 federation-lite.md diff --git a/federation-lite.md b/federation-lite.md new file mode 100644 index 00000000..44fe52d4 --- /dev/null +++ b/federation-lite.md @@ -0,0 +1,230 @@ + + + + +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). +
+-- + + + + + +# Kubernetes Multi-AZ Clusters + +## (a.k.a. "Ubernetes-Lite") + +## Introduction + +Full Ubernetes will offer sophisticated federation between multiple kuberentes +clusters, offering true high-availability, multiple provider support & +cloud-bursting, multiple region support etc. However, many users have +expressed a desire for a "reasonably" high-available cluster, that runs in +multiple zones on GCE or availablity zones in AWS, and can tolerate the failure +of a single zone without the complexity of running multiple clusters. + +Ubernetes-Lite aims to deliver exactly that functionality: to run a single +Kubernetes cluster in multiple zones. It will attempt to make reasonable +scheduling decisions, in particular so that a replication controller's pods are +spread across zones, and it will try to be aware of constraints - for example +that a volume cannot be mounted on a node in a different zone. + +Ubernetes-Lite is deliberately limited in scope; for many advanced functions +the answer will be "use Ubernetes (full)". For example, multiple-region +support is not in scope. Routing affinity (e.g. so that a webserver will +prefer to talk to a backend service in the same zone) is similarly not in +scope. + +## Design + +These are the main requirements: + +1. kube-up must allow bringing up a cluster that spans multiple zones. +1. pods in a replication controller should attempt to spread across zones. +1. pods which require volumes should not be scheduled onto nodes in a different zone. +1. load-balanced services should work reasonably + +### kube-up support + +kube-up support for multiple zones will initially be considered +advanced/experimental functionality, so the interface is not initially going to +be particularly user-friendly. As we design the evolution of kube-up, we will +make multiple zones better supported. + +For the initial implemenation, kube-up must be run multiple times, once for +each zone. The first kube-up will take place as normal, but then for each +additional zone the user must run kube-up again, specifying +`KUBE_SHARE_MASTER=true` and `KUBE_SUBNET_CIDR=172.20.x.0/24`. This will then +create additional nodes in a different zone, but will register them with the +existing master. + +### Zone spreading + +This will be implemented by modifying the existing scheduler priority function +`SelectorSpread`. Currently this priority function aims to put pods in an RC +on different hosts, but it will be extended first to spread across zones, and +then to spread across hosts. + +So that the scheduler does not need to call out to the cloud provider on every +scheduling decision, we must somehow record the zone information for each node. +The implementation of this will be described in the implementation section. + +Note that zone spreading is 'best effort'; zones are just be one of the factors +in making scheduling decisions, and thus it is not guaranteed that pods will +spread evenly across zones. However, this is likely desireable: if a zone is +overloaded or failing, we still want to schedule the requested number of pods. + +### Volume affinity + +Most cloud providers (at least GCE and AWS) cannot attach their persistent +volumes across zones. Thus when a pod is being scheduled, if there is a volume +attached, that will dictate the zone. This will be implemented using a new +scheduler predicate (a hard constraint): `VolumeZonePredicate`. + +When `VolumeZonePredicate` observes a pod scheduling request that includes a +volume, if that volume is zone-specific, `VolumeZonePredicate` will exclude any +nodes not in that zone. + +Again, to avoid the scheduler calling out to the cloud provider, this will rely +on information attached to the volumes. This means that this will only support +PersistentVolumeClaims, because direct mounts do not have a place to attach +zone information. PersistentVolumes will then include zone information where +volumes are zone-specific. + +### Load-balanced services should operate reasonably + +For both AWS & GCE, Kubernetes creates a native cloud load-balancer for each +service of type LoadBalancer. The native cloud load-balancers on both AWS & +GCE are region-level, and support load-balancing across instances in multiple +zones (in the same region). For both clouds, the behaviour of the native cloud +load-balancer is reasonable in the face of failures (indeed, this is why clouds +provide load-balancing as a primitve). + +For Ubernetes-Lite we will therefore simply rely on the native cloud provider +load balancer behaviour, and we do not anticipate substantial code changes. + +One notable shortcoming here is that load-balanced traffic still goes through +kube-proxy controlled routing, and kube-proxy does not (currently) favor +targeting a pod running on the same instance or even the same zone. This will +likely produce a lot of unnecessary cross-zone traffic (which is likely slower +and more expensive). This might be sufficiently low-hanging fruit that we +choose to address it in kube-proxy / Ubernetes-Lite, but this can be addressed +after the initial Ubernetes-Lite implementation. + + +## Implementation + +The main implementation points are: + +1. how to attach zone information to Nodes and PersistentVolumes +1. how nodes get zone information +1. how volumes get zone information + +### Attaching zone information + +We must attach zone information to Nodes and PersistentVolumes, and possibly to +other resources in future. There are two obvious alternatives: we can use +labels/annotations, or we can extend the schema to include the information. + +For the initial implementation, we propose to use labels. The reasoning is: + +1. It is considerably easier to implement. +1. We will reserve the two labels `failure-domain.alpha.kubernetes.io/zone` and +`failure-domain.alpha.kubernetes.io/region` for the two pieces of information +we need. By putting this under the `kubernetes.io` namespace there is no risk +of collision, and by putting it under `alpha.kubernetes.io` we clearly mark +this as an experimental feature. +1. We do not yet know whether these labels will be sufficient for all +environments, nor which entities will require zone information. Labels give us +more flexibility here. +1. Because the labels are reserved, we can move to schema-defined fields in +future using our cross-version mapping techniques. + +### Node labeling + +We do not want to require an administrator to manually label nodes. We instead +modify the kubelet to include the appropriate labels when it registers itself. +The information is easily obtained by the kubelet from the cloud provider. + +### Volume labeling + +As with nodes, we do not want to require an administrator to manually label +volumes. We will create an admission controller `PersistentVolumeLabel`. +`PersistentVolumeLabel` will intercept requests to create PersistentVolumes, +and will label them appropriately by calling in to the cloud provider. + +## AWS Specific Considerations + +The AWS implementation here is fairly straightforward. The AWS API is +region-wide, meaning that a single call will find instances and volumes in all +zones. In addition, instance ids and volume ids are unique per-region (and +hence also per-zone). I believe they are actually globally unique, but I do +not know if this is guaranteed; in any case we only need global uniqueness if +we are to span regions, which will not be supported by Ubernetes-Lite (to do +that correctly requires an Ubernetes-Full type approach). + +## GCE Specific Considerations + +The GCE implementation is more complicated than the AWS implementation because +GCE APIs are zone-scoped. To perform an operation, we must perform one REST +call per zone and combine the results, unless we can determine in advance that +an operation references a particular zone. For many operations, we can make +that determination, but in some cases - such as listing all instances, we must +combine results from calls in all relevant zones. + +A further complexity is that GCE volume names are scoped per-zone, not +per-region. Thus it is permitted to have two volumes both named `myvolume` in +two different GCE zones. (Instance names are currently unique per-region, and +thus are not a problem for Ubernetes-Lite). + +The volume scoping leads to a (small) behavioural change for Ubernetes-Lite on +GCE. If you had two volumes both named `myvolume` in two different GCE zones, +this would not be ambiguous when Kubernetes is operating only in a single zone. +But, if Ubernetes-Lite is operating in multiple zones, `myvolume` is no longer +sufficient to specify a volume uniquely. Worse, the fact that a volume happens +to be unambigious at a particular time is no guarantee that it will continue to +be unambigious in future, because a volume with the same name could +subsequently be created in a second zone. While perhaps unlikely in practice, +we cannot automatically enable Ubernetes-Lite for GCE users if this then causes +volume mounts to stop working. + +This suggests that (at least on GCE), Ubernetes-Lite must be optional (i.e. +there must be a feature-flag). It may be that we can make this feature +semi-automatic in future, by detecting whether nodes are running in multiple +zones, but it seems likely that kube-up could instead simply set this flag. + +For the initial implementation, creating volumes with identical names will +yield undefined results. Later, we may add some way to specify the zone for a +volume (and possibly require that volumes have their zone specified when +running with Ubernetes-Lite). We could add a new `zone` field to the +PersistentVolume type for GCE PD volumes, or we could use a DNS-style dotted +name for the volume name (.) + +Initially therefore, the GCE changes will be to: + +1. change kube-up to support creation of a cluster in multiple zones +1. pass a flag enabling Ubernetes-Lite with kube-up +1. change the kuberentes cloud provider to iterate through relevant zones when resolving items +1. tag GCE PD volumes with the appropriate zone information + + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/federation-lite.md?pixel)]() + -- cgit v1.2.3 From 0e671553a511e9eb1a8728e03cf39a8751fdca58 Mon Sep 17 00:00:00 2001 From: Mike Danese Date: Wed, 30 Dec 2015 11:39:57 -0800 Subject: docs: move local getting started guide to docs/devel/ Signed-off-by: Mike Danese --- README.md | 2 + running-locally.md | 176 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 178 insertions(+) create mode 100644 running-locally.md diff --git a/README.md b/README.md index ed586cd0..8a01a8d6 100644 --- a/README.md +++ b/README.md @@ -73,6 +73,8 @@ Guide](../admin/README.md). * **Coding Conventions** ([coding-conventions.md](coding-conventions.md)): Coding style advice for contributors. +* **Running a cluster locally** ([running-locally.md](running-locally.md)): + A fast and lightweight local cluster deployment for developement. ## Developing against the Kubernetes API diff --git a/running-locally.md b/running-locally.md new file mode 100644 index 00000000..257b2522 --- /dev/null +++ b/running-locally.md @@ -0,0 +1,176 @@ + + + + +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). +
+-- + + + + +Getting started locally +----------------------- + +**Table of Contents** + +- [Requirements](#requirements) + - [Linux](#linux) + - [Docker](#docker) + - [etcd](#etcd) + - [go](#go) +- [Clone the repository](#clone-the-repository) +- [Starting the cluster](#starting-the-cluster) +- [Running a container](#running-a-container) +- [Running a user defined pod](#running-a-user-defined-pod) +- [Troubleshooting](#troubleshooting) + - [I cannot reach service IPs on the network.](#i-cannot-reach-service-ips-on-the-network) + - [I cannot create a replication controller with replica size greater than 1! What gives?](#i-cannot-create-a-replication-controller-with-replica-size-greater-than-1--what-gives) + - [I changed Kubernetes code, how do I run it?](#i-changed-kubernetes-code-how-do-i-run-it) + - [kubectl claims to start a container but `get pods` and `docker ps` don't show it.](#kubectl-claims-to-start-a-container-but-get-pods-and-docker-ps-dont-show-it) + - [The pods fail to connect to the services by host names](#the-pods-fail-to-connect-to-the-services-by-host-names) + +### Requirements + +#### Linux + +Not running Linux? Consider running Linux in a local virtual machine with [Vagrant](../getting-started-guides/vagrant.md), or on a cloud provider like [Google Compute Engine](../getting-started-guides/gce.md). + +#### Docker + +At least [Docker](https://docs.docker.com/installation/#installation) +1.3+. Ensure the Docker daemon is running and can be contacted (try `docker +ps`). Some of the Kubernetes components need to run as root, which normally +works fine with docker. + +#### etcd + +You need an [etcd](https://github.com/coreos/etcd/releases) in your path, please make sure it is installed and in your ``$PATH``. + +#### go + +You need [go](https://golang.org/doc/install) in your path (see [here](development.md#go-versions) for supported versions), please make sure it is installed and in your ``$PATH``. + +### Clone the repository + +In order to run kubernetes you must have the kubernetes code on the local machine. Cloning this repository is sufficient. + +```$ git clone --depth=1 https://github.com/kubernetes/kubernetes.git``` + +The `--depth=1` parameter is optional and will ensure a smaller download. + +### Starting the cluster + +In a separate tab of your terminal, run the following (since one needs sudo access to start/stop Kubernetes daemons, it is easier to run the entire script as root): + +```sh +cd kubernetes +hack/local-up-cluster.sh +``` + +This will build and start a lightweight local cluster, consisting of a master +and a single node. Type Control-C to shut it down. + +You can use the cluster/kubectl.sh script to interact with the local cluster. hack/local-up-cluster.sh will +print the commands to run to point kubectl at the local cluster. + + +### Running a container + +Your cluster is running, and you want to start running containers! + +You can now use any of the cluster/kubectl.sh commands to interact with your local setup. + +```sh +cluster/kubectl.sh get pods +cluster/kubectl.sh get services +cluster/kubectl.sh get replicationcontrollers +cluster/kubectl.sh run my-nginx --image=nginx --replicas=2 --port=80 + + +## begin wait for provision to complete, you can monitor the docker pull by opening a new terminal + sudo docker images + ## you should see it pulling the nginx image, once the above command returns it + sudo docker ps + ## you should see your container running! + exit +## end wait + +## introspect Kubernetes! +cluster/kubectl.sh get pods +cluster/kubectl.sh get services +cluster/kubectl.sh get replicationcontrollers +``` + + +### Running a user defined pod + +Note the difference between a [container](../user-guide/containers.md) +and a [pod](../user-guide/pods.md). Since you only asked for the former, Kubernetes will create a wrapper pod for you. +However you cannot view the nginx start page on localhost. To verify that nginx is running you need to run `curl` within the docker container (try `docker exec`). + +You can control the specifications of a pod via a user defined manifest, and reach nginx through your browser on the port specified therein: + +```sh +cluster/kubectl.sh create -f docs/user-guide/pod.yaml +``` + +Congratulations! + +### Troubleshooting + +#### I cannot reach service IPs on the network. + +Some firewall software that uses iptables may not interact well with +kubernetes. If you have trouble around networking, try disabling any +firewall or other iptables-using systems, first. Also, you can check +if SELinux is blocking anything by running a command such as `journalctl --since yesterday | grep avc`. + +By default the IP range for service cluster IPs is 10.0.*.* - depending on your +docker installation, this may conflict with IPs for containers. If you find +containers running with IPs in this range, edit hack/local-cluster-up.sh and +change the service-cluster-ip-range flag to something else. + +#### I cannot create a replication controller with replica size greater than 1! What gives? + +You are running a single node setup. This has the limitation of only supporting a single replica of a given pod. If you are interested in running with larger replica sizes, we encourage you to try the local vagrant setup or one of the cloud providers. + +#### I changed Kubernetes code, how do I run it? + +```sh +cd kubernetes +hack/build-go.sh +hack/local-up-cluster.sh +``` + +#### kubectl claims to start a container but `get pods` and `docker ps` don't show it. + +One or more of the Kubernetes daemons might've crashed. Tail the logs of each in /tmp. + +#### The pods fail to connect to the services by host names + +The local-up-cluster.sh script doesn't start a DNS service. Similar situation can be found [here](http://issue.k8s.io/6667). You can start a manually. Related documents can be found [here](../../cluster/addons/dns/#how-do-i-configure-it) + + + + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/running-locally.md?pixel)]() + -- cgit v1.2.3 From 957857f1e082addf2a2013dfae8921bd4eb96a36 Mon Sep 17 00:00:00 2001 From: Haoran Wang Date: Wed, 6 Jan 2016 13:09:43 +0800 Subject: fix wrong submit-queue.go link --- automation.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/automation.md b/automation.md index c1851e84..99688de1 100644 --- a/automation.md +++ b/automation.md @@ -47,7 +47,7 @@ In an effort to * maintain e2e stability * load test githubs label feature -We have added an automated [submit-queue](https://github.com/kubernetes/contrib/blob/master/mungegithub/pulls/submit-queue.go) to the +We have added an automated [submit-queue](https://github.com/kubernetes/contrib/blob/master/mungegithub/mungers/submit-queue.go) to the [github "munger"](https://github.com/kubernetes/contrib/tree/master/mungegithub) for kubernetes. The submit-queue does the following: -- cgit v1.2.3 From a751615a47b630dfb5accb0108a90a34020644c9 Mon Sep 17 00:00:00 2001 From: "Tim St. Clair" Date: Wed, 6 Jan 2016 15:19:05 -0800 Subject: Add node performance measuring guide Add a development guide for measuring performance of node components. The purpose of this guide is threefold: 1. Document the nuances of measuring kubelet performance so we don't forget or need to reinvent the wheel. 2. Make it easier for new contributors to analyze performance. 3. Share tips and tricks that current team members might not be aware of. --- node-performance-testing.md | 147 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 147 insertions(+) create mode 100644 node-performance-testing.md diff --git a/node-performance-testing.md b/node-performance-testing.md new file mode 100644 index 00000000..8a14eedc --- /dev/null +++ b/node-performance-testing.md @@ -0,0 +1,147 @@ + + + + +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- + + + + + +# Measuring Node Performance + +This document outlines the issues and pitfalls of measuring Node performance, as well as the tools +available. + +## Cluster Set-up + +There are lots of factors which can affect node performance numbers, so care must be taken in +setting up the cluster to make the intended measurements. In addition to taking the following steps +into consideration, it is important to document precisely which setup was used. For example, +performance can vary wildly from commit-to-commit, so it is very important to **document which commit +or version** of Kubernetes was used, which Docker version was used, etc. + +### Addon pods + +Be aware of which addon pods are running on which nodes. By default Kubernetes runs 8 addon pods, +plus another 2 per node (`fluentd-elasticsearch` and `kube-proxy`) in the `kube-system` +namespace. The addon pods can be disabled for more consistent results, but doing so can also have +performance implications. + +For example, Heapster polls each node regularly to collect stats data. Disabling Heapster will hide +the performance cost of serving those stats in the Kubelet. + +#### Disabling Add-ons + +Disabling addons is simple. Just ssh into the Kubernetes master and move the addon from +`/etc/kubernetes/addons/` to a backup location. More details [here](../../cluster/addons/). + +### Which / how many pods? + +Performance will vary a lot between a node with 0 pods and a node with 100 pods. In many cases +you'll want to make measurements with several different amounts of pods. On a single node cluster +scaling a replication controller makes this easy, just make sure the system reaches a steady-state +before starting the measurement. E.g. `kubectl scale replicationcontroller pause --replicas=100` + +In most cases pause pods will yield the most consistent measurements since the system will not be +affected by pod load. However, in some special cases Kubernetes has been tuned to optimize pods that +are not doing anything, such as the cAdvisor housekeeping (stats gathering). In these cases, +performing a very light task (such as a simple network ping) can make a difference. + +Finally, you should also consider which features yours pods should be using. For example, if you +want to measure performance with probing, you should obviously use pods with liveness or readiness +probes configured. Likewise for volumes, number of containers, etc. + +### Other Tips + +**Number of nodes** - On the one hand, it can be easier to manage logs, pods, environment etc. with + a single node to worry about. On the other hand, having multiple nodes will let you gather more + data in parallel for more robust sampling. + +## E2E Performance Test + +There is an end-to-end test for collecting overall resource usage of node components: +[kubelet_perf.go](../../test/e2e/kubelet_perf.go). To +run the test, simply make sure you have an e2e cluster running (`go run hack/e2e.go -up`) and +[set up](#cluster-set-up) correctly. + +Run the test with `go run hack/e2e.go -v -test +--test_args="--ginkgo.focus=resource\susage\stracking"`. You may also wish to customise the number of +pods or other parameters of the test (remember to rerun `make WHAT=test/e2e/e2e.test` after you do). + +## Profiling + +Kubelet installs the [go pprof handlers](https://golang.org/pkg/net/http/pprof/), which can be +queried for CPU profiles: + +```console +$ kubectl proxy & +Starting to serve on 127.0.0.1:8001 +$ curl -G "http://localhost:8001/api/v1/proxy/nodes/${NODE}:10250/debug/pprof/profile?seconds=${DURATION_SECONDS}" > $OUTPUT +$ KUBELET_BIN=_output/dockerized/bin/linux/amd64/kubelet +$ go tool pprof -web $KUBELET_BIN $OUTPUT +``` + +`pprof` can also provide heap usage, from the `/debug/pprof/heap` endpoint +(e.g. `http://localhost:8001/api/v1/proxy/nodes/${NODE}:10250/debug/pprof/heap`). + +More information on go profiling can be found [here](http://blog.golang.org/profiling-go-programs). + +## Benchmarks + +Before jumping through all the hoops to measure a live Kubernetes node in a real cluster, it is +worth considering whether the data you need can be gathered through a Benchmark test. Go provides a +really simple benchmarking mechanism, just add a unit test of the form: + +```go +// In foo_test.go +func BenchmarkFoo(b *testing.B) { + b.StopTimer() + setupFoo() // Perform any global setup + b.StartTimer() + for i := 0; i < b.N; i++ { + foo() // Functionality to measure + } +} +``` + +Then: + +```console +$ go test -bench=. -benchtime=${SECONDS}s foo_test.go +``` + +More details on benchmarking [here](https://golang.org/pkg/testing/). + +## TODO + +- (taotao) Measuring docker performance +- Expand cluster set-up section +- (vishh) Measuring disk usage +- (yujuhong) Measuring memory usage +- Add section on monitoring kubelet metrics (e.g. with prometheus) + + + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/node-performance-testing.md?pixel)]() + -- cgit v1.2.3 From e4948237a6bcfcf015d3288ef63a51ea886830db Mon Sep 17 00:00:00 2001 From: Vishnu kannan Date: Thu, 5 Nov 2015 15:32:16 -0800 Subject: Proposal for disk accounting. Signed-off-by: Vishnu Kannan --- disk-accounting.md | 644 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 644 insertions(+) create mode 100644 disk-accounting.md diff --git a/disk-accounting.md b/disk-accounting.md new file mode 100644 index 00000000..7d91c8e8 --- /dev/null +++ b/disk-accounting.md @@ -0,0 +1,644 @@ + + + + +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- + + + + + +**Author**: Vishnu Kannan + +**Last** **Updated**: 11/16/2015 + +**Status**: Pending Review + +This proposal is an attempt to come up with a means for accounting disk usage in Kubernetes clusters that are running docker as the container runtime. Some of the principles here might apply for other runtimes too. + +### Why is disk accounting necessary? + +As of kubernetes v1.1 clusters become unusable over time due to the local disk becoming full. The kubelets on the node attempt to perform garbage collection of old containers and images, but that doesn’t prevent running pods from using up all the available disk space. + +Kubernetes users have no insight into how the disk is being consumed. + +Large images and rapid logging can lead to temporary downtime on the nodes. The node has to free up disk space by deleting images and containers. During this cleanup, existing pods can fail and new pods cannot be started. The node will also transition into an `OutOfDisk` condition, preventing more pods from being scheduled to the node. + +Automated eviction of pods that are hogging the local disk is not possible since proper accounting isn’t available. + +Since local disk is a non-compressible resource, users need means to restrict usage of local disk by pods and containers. Proper disk accounting is a prerequisite. As of today, a misconfigured low QoS class pod can end up bringing down the entire cluster by taking up all the available disk space (misconfigured logging for example) + +### Goals + +1. Account for disk usage on the nodes. + +2. Compatibility with the most common docker storage backends - devicemapper, aufs and overlayfs + +3. Provide a roadmap for enabling disk as a schedulable resource in the future. + +4. Provide a plugin interface for extending support to non-default filesystems and storage drivers. + +### Non Goals + +1. Compatibility with all storage backends. The matrix is pretty large already and the priority is to get disk accounting to on most widely deployed platforms. + +2. Support for filesystems other than ext4 and xfs. + +### Introduction + +Disk accounting in Kubernetes cluster running with docker is complex because of the plethora of ways in which disk gets utilized by a container. + +Disk can be consumed for: + +1. Container images + +2. Container’s writable layer + +3. Container’s logs - when written to stdout/stderr and default logging backend in docker is used. + +4. Local volumes - hostPath, emptyDir, gitRepo, etc. + +As of Kubernetes v1.1, kubelet exposes disk usage for the entire node and the container’s writable layer for aufs docker storage driver. +This information is made available to end users via the heapster monitoring pipeline. + +#### Image layers + +Image layers are shared between containers (COW) and so accounting for images is complicated. + +Image layers will have to be accounted as system overhead. + +As of today, it is not possible to check if there is enough disk space available on the node before an image is pulled. + +#### Writable Layer + +Docker creates a writable layer for every container on the host. Depending on the storage driver, the location and the underlying filesystem of this layer will change. + +Any files that the container creates or updates (assuming there are no volumes) will be considered as writable layer usage. + +The underlying filesystem is whatever the docker storage directory resides on. It is ext4 by default on most distributions, and xfs on RHEL. + +#### Container logs + +Docker engine provides a pluggable logging interface. Kubernetes is currently using the default logging mode which is `local file`. In this mode, the docker daemon stores bytes written by containers to their stdout or stderr, to local disk. These log files are contained in a special directory that is managed by the docker daemon. These logs are exposed via `docker logs` interface which is then exposed via kubelet and apiserver APIs. Currently, there is a hard-requirement for persisting these log files on the disk. + +#### Local Volumes + +Volumes are slightly different from other local disk use cases. They are pod scoped. Their lifetime is tied to that of a pod. Due to this property accounting of volumes will also be at the pod level. + +As of now, the volume types that can use local disk directly are ‘HostPath’, ‘EmptyDir’, and ‘GitRepo’. Secretes and Downwards API volumes wrap these primitive volumes. +Everything else is a network based volume. + +‘HostPath’ volumes map in existing directories in the host filesystem into a pod. Kubernetes manages only the mapping. It does not manage the source on the host filesystem. + +In addition to this, the changes introduced by a pod on the source of a hostPath volume is not cleaned by kubernetes once the pod exits. Due to these limitations, we will have to account hostPath volumes to system overhead. We should explicitly discourage use of HostPath in read-write mode. + +`EmptyDir`, `GitRepo` and other local storage volumes map to a directory on the host root filesystem, that is managed by Kubernetes (kubelet). Their contents are erased as soon as the pod exits. Tracking and potentially restricting usage for volumes is possible. + +### Docker storage model + +Before we start exploring solutions, let’s get familiar with how docker handles storage for images, writable layer and logs. + +On all storage drivers, logs are stored under `/containers//` + +The default location of the docker root directory is `/var/lib/docker`. + +Volumes are handled by kubernetes. +*Caveat: Volumes specified as part of Docker images are not handled by Kubernetes currently.* + +Container images and writable layers are managed by docker and their location will change depending on the storage driver. Each image layer and writable layer is referred to by an ID. The image layers are read-only. Once saved, existing writable layers can be frozen. Saving feature is not of importance to kubernetes since it works only on immutable images. + +*Note: Image layer IDs can be obtained by running `docker history -q --no-trunc `* + +##### Aufs + +Image layers and writable layers are stored under `/var/lib/docker/aufs/diff/`. + +The writable layers ID is equivalent to that of the container ID. + +##### Devicemapper + +Each container and each image gets own block device. Since this driver works at the block level, it is not possible to access the layers directly without mounting them. Each container gets its own block device while running. + +##### Overlayfs + +Image layers and writable layers are stored under `/var/lib/docker/overlay/`. + +Identical files are hardlinked between images. + +The image layers contain all their data under a `root` subdirectory. + +Everything under `/var/lib/docker/overlay/` are files required for running the container, including its writable layer. + +### Improve disk accounting + +Disk accounting is dependent on the storage driver in docker. A common solution that works across all storage drivers isn't available. + +I’m listing a few possible solutions for disk accounting below along with their limitations. + +We need a plugin model for disk accounting. Some storage drivers in docker will require special plugins. + +#### Container Images + +As of today, the partition that is holding docker images is flagged by cadvisor, and it uses filesystem stats to identify the overall disk usage of that partition. + +Isolated usage of just image layers is available today using `docker history `. +But isolated usage isn't of much use because image layers are shared between containers and so it is not possible to charge a single pod for image disk usage. + +Continuing to use the entire partition availability for garbage collection purposes in kubelet, should not affect reliability. +We might garbage collect more often. +As long as we do not expose features that require persisting old containers, computing image layer usage wouldn’t be necessary. + +Main goals for images are +1. Capturing total image disk usage +2. Check if a new image will fit on disk. + +In case we choose to compute the size of image layers alone, the following are some of the ways to achieve that. + +*Note that some of the strategies mentioned below are applicable in general to other kinds of storage like volumes, etc.* + +##### Docker History + +It is possible to run `docker history` and then create a graph of all images and corresponding image layers. +This graph will let us figure out the disk usage of all the images. + +**Pros** +* Compatible across storage drivers. + +**Cons** +* Requires maintaining an internal representation of images. + +##### Enhance docker + +Docker handles the upload and download of image layers. It can embed enough information about each layer. If docker is enhanced to expose this information, we can statically identify space about to be occupied by read-only image layers, even before the image layers are downloaded. + +A new [docker feature](https://github.com/docker/docker/pull/16450) (docker pull --dry-run) is pending review, which outputs the disk space that will be consumed by new images. Once this feature lands, we can perform feasibility checks and reject pods that will consume more disk space that what is current availability on the node. + +Another option is to expose disk usage of all images together as a first-class feature. + +**Pros** + +* Works across all storage drivers since docker abstracts the storage drivers. + +* Less code to maintain in kubelet. + +**Cons** + +* Not available today. + +* Requires serialized image pulls. + +* Metadata files are not tracked. + +##### Overlayfs and Aufs + +####### `du` + +We can list all the image layer specific directories, excluding container directories, and run `du` on each of those directories. + +**Pros**: + +* This is the least-intrusive approach. + +* It will work off the box without requiring any additional configuration. + +**Cons**: + +* `du` can consume a lot of cpu and memory. There have been several issues reported against the kubelet in the past that were related to `du`. + +* It is time consuming. Cannot be run frequently. Requires special handling to constrain resource usage - setting lower nice value or running in a sub-container. + +* Can block container deletion by keeping file descriptors open. + + +####### Linux gid based Disk Quota + +[Disk quota](https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/ch-disk-quotas.html) feature provided by the linux kernel can be used to track the usage of image layers. Ideally, we need `project` support for disk quota, which lets us track usage of directory hierarchies using `project ids`. Unfortunately, that feature is only available for zfs filesystems. Since most of our distributions use `ext4` by default, we will have to use either `uid` or `gid` based quota tracking. + +Both `uids` and `gids` are meant for security. Overloading that concept for disk tracking is painful and ugly. But, that is what we have today. + +Kubelet needs to define a gid for tracking image layers and make that gid or group the owner of `/var/lib/docker/[aufs | overlayfs]` recursively. Once this is done, the quota sub-system in the kernel will report the blocks being consumed by the storage driver on the underlying partition. + +Since this number also includes the container’s writable layer, we will have to somehow subtract that usage from the overall usage of the storage driver directory. Luckily, we can use the same mechanism for tracking container’s writable layer. Once we apply a different `gid` to the container’s writable layer, which is located under `/var/lib/docker//diff/`, the quota subsystem will not include the container’s writable layer usage. + +Xfs on the other hand support project quota which lets us track disk usage of arbitrary directories using a project. Support for this feature in ext4 is being reviewed. So on xfs, we can use quota without having to clobber the writable layer's uid and gid. + +**Pros**: + +* Low overhead tracking provided by the kernel. + + +**Cons** + +* Requires updates to default ownership on docker’s internal storage driver directories. We will have to deal with storage driver implementation details in any approach that is not docker native. + +* Requires additional node configuration - quota subsystem needs to be setup on the node. This can either be automated or made a requirement for the node. + +* Kubelet needs to perform gid management. A range of gids have to allocated to the kubelet for the purposes of quota management. This range must not be used for any other purposes out of band. Not required if project quota is available. + +* Breaks `docker save` semantics. Since kubernetes assumes immutable images, this is not a blocker. To support quota in docker, we will need user-namespaces along with custom gid mapping for each container. This feature does not exist today. This is not an issue with project quota. + +*Note: Refer to the [Appendix](#appendix) section more real examples on using quota with docker.* + +**Project Quota** + +Project Quota support for ext4 is currently being reviewed upstream. If that feature lands in upstream sometime soon, project IDs will be used to disk tracking instead of uids and gids. + + +##### Devicemapper + +Devicemapper storage driver will setup two volumes, metadata and data, that will be used to store image layers and container writable layer. The volumes can be real devices or loopback. A Pool device is created which uses the underlying volume for real storage. + +A new thinly-provisioned volume, based on the pool, will be created for running container’s. + +The kernel tracks the usage of the pool device at the block device layer. The usage here includes image layers and container’s writable layers. + +Since the kubelet has to track the writable layer usage anyways, we can subtract the aggregated root filesystem usage from the overall pool device usage to get the image layer’s disk usage. + +Linux quota and `du` will not work with device mapper. + +A docker dry run option (mentioned above) is another possibility. + + +#### Container Writable Layer + +###### Overlayfs / Aufs + +Docker creates a separate directory for the container’s writable layer which is then overlayed on top of read-only image layers. + +Both the previously mentioned options of `du` and `Linux Quota` will work for this case as well. + +Kubelet can use `du` to track usage and enforce `limits` once disk becomes a schedulable resource. As mentioned earlier `du` is resource intensive. + +To use Disk quota, kubelet will have to allocate a separate gid per container. Kubelet can reuse the same gid for multiple instances of the same container (restart scenario). As and when kubelet garbage collects dead containers, the usage of the container will drop. + +If local disk becomes a schedulable resource, `linux quota` can be used to impose `request` and `limits` on the container writable layer. +`limits` can be enforced using hard limits. Enforcing `request` will be tricky. One option is to enforce `requests` only when the disk availability drops below a threshold (10%). Kubelet can at this point evict pods that are exceeding their requested space. Other options include using `soft limits` with grace periods, but this option is complex. + +###### Devicemapper + +FIXME: How to calculate writable layer usage with devicemapper? + +To enforce `limits` the volume created for the container’s writable layer filesystem can be dynamically [resized](https://jpetazzo.github.io/2014/01/29/docker-device-mapper-resize/), to not use more than `limit`. `request` will have to be enforced by the kubelet. + + +#### Container logs + +Container logs are not storage driver specific. We can use either `du` or `quota` to track log usage per container. Log files are stored under `/var/lib/docker/containers/`. + +In the case of quota, we can create a separate gid for tracking log usage. This will let users track log usage and writable layer’s usage individually. + +For the purposes of enforcing limits though, kubelet will use the sum of logs and writable layer. + +In the future, we can consider adding log rotation support for these log files either in kubelet or via docker. + + +#### Volumes + +The local disk based volumes map to a directory on the disk. We can use `du` or `quota` to track the usage of volumes. + +There exists a concept called `FsGroup` today in kubernetes, which lets users specify a gid for all volumes in a pod. If that is set, we can use the `FsGroup` gid for quota purposes. This requires `limits` for volumes to be a pod level resource though. + + +### Yet to be explored + +* Support for filesystems other than ext4 and xfs like `zfs` + +* Support for Btrfs + +It should be clear at this point that we need a plugin based model for disk accounting. Support for other filesystems both CoW and regular can be added as and when required. As we progress towards making accounting work on the above mentioned storage drivers, we can come up with an abstraction for storage plugins in general. + + +### Implementation Plan and Milestones + +#### Milestone 1 - Get accounting to just work! + +This milestone targets exposing the following categories of disk usage from the kubelet - infrastructure (images, sys daemons, etc), containers (log + writable layer) and volumes. + +* `du` works today. Use `du` for all the categories and ensure that it works on both on aufs and overlayfs. + +* Add device mapper support. + +* Define a storage driver based pluggable disk accounting interface in cadvisor. + +* Reuse that interface for accounting volumes in kubelet. + +* Define a disk manager module in kubelet that will serve as a source of disk usage information for the rest of the kubelet. + +* Ensure that the kubelet metrics APIs (/apis/metrics/v1beta1) exposes the disk usage information. Add an integration test. + + +#### Milestone 2 - node reliability + +Improve user experience by doing whatever is necessary to keep the node running. + +NOTE: [`Out of Resource Killing`](https://github.com/kubernetes/kubernetes/issues/17186) design is a prerequisite. + +* Disk manager will evict pods and containers based on QoS class whenever the disk availability is below a critical level. + +* Explore combining existing container and image garbage collection logic into disk manager. + +Ideally, this phase should be completed before v1.2. + + +#### Milestone 3 - Performance improvements + +In this milestone, we will add support for quota and make it opt-in. There should be no user visible changes in this phase. + +* Add gid allocation manager to kubelet + +* Reconcile gids allocated after restart. + +* Configure linux quota automatically on startup. Do not set any limits in this phase. + +* Allocate gids for pod volumes, container’s writable layer and logs, and also for image layers. + +* Update the docker runtime plugin in kubelet to perform the necessary `chown’s` and `chmod’s` between container creation and startup. + +* Pass the allocated gids as supplementary gids to containers. + +* Update disk manager in kubelet to use quota when configured. + + +#### Milestone 4 - Users manage local disks + +In this milestone, we will make local disk a schedulable resource. + +* Finalize volume accounting - is it at the pod level or per-volume. + +* Finalize multi-disk management policy. Will additional disks be handled as whole units? + +* Set aside some space for image layers and rest of the infra overhead - node allocable resources includes local disk. + +* `du` plugin triggers container or pod eviction whenever usage exceeds limit. + +* Quota plugin sets hard limits equal to user specified `limits`. + +* Devicemapper plugin resizes writable layer to not exceed the container’s disk `limit`. + +* Disk manager evicts pods based on `usage` - `request` delta instead of just QoS class. + +* Sufficient integration testing to this feature. + + +### Appendix + + +#### Implementation Notes + +The following is a rough outline of the testing I performed to corroborate by prior design ideas. + +Test setup information + +* Testing was performed on GCE virtual machines + +* All the test VMs were using ext4. + +* Distribution tested against is mentioned as part of each graph driver. + +##### AUFS testing notes: + +Tested on Debian jessie + +1. Setup Linux Quota following this [tutorial](https://www.google.com/url?q=https://www.howtoforge.com/tutorial/linux-quota-ubuntu-debian/&sa=D&ust=1446146816105000&usg=AFQjCNHThn4nwfj1YLoVmv5fJ6kqAQ9FlQ). + +2. Create a new group ‘x’ on the host and enable quota for that group + + 1. `groupadd -g 9000 x` + + 2. `setquota -g 9000 -a 0 100 0 100` // 100 blocks (4096 bytes each*) + + 3. `quota -g 9000 -v` // Check that quota is enabled + +3. Create a docker container + + 4. `docker create -it busybox /bin/sh -c "dd if=/dev/zero of=/file count=10 bs=1M"` + + 8d8c56dcfbf5cda9f9bfec7c6615577753292d9772ab455f581951d9a92d169d + +4. Change group on the writable layer directory for this container + + 5. `chmod a+s /var/lib/docker/aufs/diff/8d8c56dcfbf5cda9f9bfec7c6615577753292d9772ab455f581951d9a92d169d` + + 6. `chown :x /var/lib/docker/aufs/diff/8d8c56dcfbf5cda9f9bfec7c6615577753292d9772ab455f581951d9a92d169d` + +5. Start the docker container + + 7. `docker start 8d` + + 8. Check usage using quota and group ‘x’ + + ```shell + $ quota -g x -v + + Disk quotas for group x (gid 9000): + + Filesystem **blocks** quota limit grace files quota limit grace + + /dev/sda1 **10248** 0 0 3 0 0 + ``` + + Using the same workflow, we can add new sticky group IDs to emptyDir volumes and account for their usage against pods. + + Since each container requires a gid for the purposes of quota, we will have to reserve ranges of gids for use by the kubelet. Since kubelet does not checkpoint its state, recovery of group id allocations will be an interesting problem. More on this later. + +Track the space occupied by images after it has been pulled locally as follows. + +*Note: This approach requires serialized image pulls to be of any use to the kubelet.* + +1. Create a group specifically for the graph driver + + 1. `groupadd -g 9001 docker-images` + +2. Update group ownership on the ‘graph’ (tracks image metadata) and ‘storage driver’ directories. + + 2. `chown -R :9001 /var/lib/docker/[overlay | aufs]` + + 3. `chmod a+s /var/lib/docker/[overlay | aufs]` + + 4. `chown -R :9001 /var/lib/docker/graph` + + 5. `chmod a+s /var/lib/docker/graph` + +3. Any new images pulled or containers created will be accounted to the `docker-images` group by default. + +4. Once we update the group ownership on newly created containers to a different gid, the container writable layer’s specific disk usage gets dropped from this group. + +#### Overlayfs + +Tested on Ubuntu 15.10. + +Overlayfs works similar to Aufs. The path to the writable directory for container writable layer changes. + +* Setup Linux Quota following this [tutorial](https://www.google.com/url?q=https://www.howtoforge.com/tutorial/linux-quota-ubuntu-debian/&sa=D&ust=1446146816105000&usg=AFQjCNHThn4nwfj1YLoVmv5fJ6kqAQ9FlQ). + +* Create a new group ‘x’ on the host and enable quota for that group + + * `groupadd -g 9000 x` + + * `setquota -g 9000 -a 0 100 0 100` // 100 blocks (4096 bytes each*) + + * `quota -g 9000 -v` // Check that quota is enabled + +* Create a docker container + + * `docker create -it busybox /bin/sh -c "dd if=/dev/zero of=/file count=10 bs=1M"` + + * `b8cc9fae3851f9bcefe922952b7bca0eb33aa31e68e9203ce0639fc9d3f3c61` + +* Change group on the writable layer’s directory for this container + + * `chmod -R a+s /var/lib/docker/overlay/b8cc9fae3851f9bcefe922952b7bca0eb33aa31e68e9203ce0639fc9d3f3c61b/*` + + * `chown -R :9000 /var/lib/docker/overlay/b8cc9fae3851f9bcefe922952b7bca0eb33aa31e68e9203ce0639fc9d3f3c61b/*` + +* Check quota before and after running the container. + + ```shell + $ quota -g x -v + + Disk quotas for group x (gid 9000): + + Filesystem blocks quota limit grace files quota limit grace + + /dev/sda1 48 0 0 19 0 0 + ``` + + * Start the docker container + + * `docker start b8` + + * ```shell + quota -g x -v + + Disk quotas for group x (gid 9000): + + Filesystem **blocks** quota limit grace files quota limit grace + + /dev/sda1 **10288** 0 0 20 0 0 + + ``` + +##### Device mapper + +Usage of Linux Quota should be possible for the purposes of volumes and log files. + +Devicemapper storage driver in docker uses ["thin targets"](https://www.kernel.org/doc/Documentation/device-mapper/thin-provisioning.txt). Underneath there are two block devices devices - “data” and “metadata”, using which more block devices are created for containers. More information [here](http://www.projectatomic.io/docs/filesystems/). + +These devices can be loopback or real storage devices. + +The base device has a maximum storage capacity. This means that the sum total of storage space occupied by images and containers cannot exceed this capacity. + +By default, all images and containers are created from an initial filesystem with a 10GB limit. + +A separate filesystem is created for each container as part of start (not create). + +It is possible to [resize](https://jpetazzo.github.io/2014/01/29/docker-device-mapper-resize/) the container filesystem. + +For the purposes of image space tracking, we can + +####Testing notes: + +* ```shell +$ docker info + +... + +Storage Driver: devicemapper + + Pool Name: **docker-8:1-268480-pool** + + Pool Blocksize: 65.54 kB + + Backing Filesystem: extfs + + Data file: /dev/loop0 + + Metadata file: /dev/loop1 + + Data Space Used: 2.059 GB + + Data Space Total: 107.4 GB + + Data Space Available: 48.45 GB + + Metadata Space Used: 1.806 MB + + Metadata Space Total: 2.147 GB + + Metadata Space Available: 2.146 GB + + Udev Sync Supported: true + + Deferred Removal Enabled: false + + Data loop file: /var/lib/docker/devicemapper/devicemapper/data + + Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata + + Library Version: 1.02.99 (2015-06-20) +``` + +```shell +$ dmsetup table docker-8\:1-268480-pool + +0 209715200 thin-pool 7:1 7:0 **128** 32768 1 skip_block_zeroing +``` + +128 is the data block size + +Usage from kernel for the primary block device + +```shell +$ dmsetup status docker-8\:1-268480-pool + +0 209715200 thin-pool 37 441/524288 **31424/1638400** - rw discard_passdown queue_if_no_space - +``` + +Usage/Available - 31424/1638400 + +Usage in MB = 31424 * 512 * 128 (block size from above) bytes = 1964 MB + +Capacity in MB = 1638400 * 512 * 128 bytes = 100 GB + +#### Log file accounting + +* Setup Linux quota for a container as mentioned above. + +* Update group ownership on the following directories to that of the container group ID created for graphing. Adapting the examples above: + + * `chmod -R a+s /var/lib/docker/**containers**/b8cc9fae3851f9bcefe922952b7bca0eb33aa31e68e9203ce0639fc9d3f3c61b/*` + + * `chown -R :9000 /var/lib/docker/**container**/b8cc9fae3851f9bcefe922952b7bca0eb33aa31e68e9203ce0639fc9d3f3c61b/*` + +##### Testing titbits + +* Ubuntu 15.10 doesn’t ship with the quota module on virtual machines. [Install ‘linux-image-extra-virtual’](http://askubuntu.com/questions/109585/quota-format-not-supported-in-kernel) package to get quota to work. + +* Overlay storage driver needs kernels >= 3.18. I used Ubuntu 15.10 to test Overlayfs. + +* If you use a non-default location for docker storage, change `/var/lib/docker` in the examples to your storage location. + + + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/disk-accounting.md?pixel)]() + -- cgit v1.2.3 From 7810af0a5a69a756a76968c7e69db9ebbc15254f Mon Sep 17 00:00:00 2001 From: Yu-Ju Hong Date: Sat, 15 Aug 2015 08:40:19 -0700 Subject: Proposal: add pod lifecycle event generator for kubelet --- pleg.png | Bin 0 -> 49079 bytes pod-lifecycle-event-generator.md | 230 +++++++++++++++++++++++++++++++++++++++ 2 files changed, 230 insertions(+) create mode 100644 pleg.png create mode 100644 pod-lifecycle-event-generator.md diff --git a/pleg.png b/pleg.png new file mode 100644 index 00000000..f15c5d83 Binary files /dev/null and b/pleg.png differ diff --git a/pod-lifecycle-event-generator.md b/pod-lifecycle-event-generator.md new file mode 100644 index 00000000..cec047de --- /dev/null +++ b/pod-lifecycle-event-generator.md @@ -0,0 +1,230 @@ + + + + +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- + + + + + +# Kubelet: Pod Lifecycle Event Generator (PLEG) + +In Kubernetes, Kubelet is a per-node daemon that manages the pods on the node, +driving the pod states to match their pod specifications (specs). To achieve +this, Kubelet needs to react to changes in both (1) pod specs and (2) the +container states. For the former, Kubelet watches the pod specs changes from +multiple sources; for the latter, Kubelet polls the container runtime +periodically (e.g., 10s) for the latest states for all containers. + +Polling incurs non-negligible overhead as the number of pods/containers increases, +and is exacerbated by Kubelet's parallelism -- one worker (goroutine) per pod, which +queries the container runtime individually. Periodic, concurrent, large number +of requests causes high CPU usage spikes (even when there is no spec/state +change), poor performance, and reliability problems due to overwhelmed container +runtime. Ultimately, it limits Kubelet's scalability. + +(Related issues reported by users: [#10451](https://issues.k8s.io/10451), +[#12099](https://issues.k8s.io/12099), [#12082](https://issues.k8s.io/12082)) + +## Goals and Requirements + +The goal of this proposal is to improve Kubelet's scalability and performance +by lowering the pod management overhead. + - Reduce unnecessary work during inactivity (no spec/state changes) + - Lower the concurrent requests to the container runtime. + +The design should be generic so that it can support different container runtimes +(e.g., Docker and rkt). + +## Overview + +This proposal aims to replace the periodic polling with a pod lifecycle event +watcher. + +![pleg](pleg.png) + +## Pod Lifecycle Event + +A pod lifecycle event interprets the underlying container state change at the +pod-level abstraction, making it container-runtime-agnostic. The abstraction +shields Kubelet from the runtime specifics. + +```go +type PodLifeCycleEventType string + +const ( + ContainerStarted PodLifeCycleEventType = "ContainerStarted" + ContainerStopped PodLifeCycleEventType = "ContainerStopped" + NetworkSetupCompleted PodLifeCycleEventType = "NetworkSetupCompleted" + NetworkFailed PodLifeCycleEventType = "NetworkFailed" +) + +// PodLifecycleEvent is an event reflects the change of the pod state. +type PodLifecycleEvent struct { + // The pod ID. + ID types.UID + // The type of the event. + Type PodLifeCycleEventType + // The accompanied data which varies based on the event type. + Data interface{} +} +``` + +Using Docker as an example, starting of a POD infra container would be +translated to a NetworkSetupCompleted`pod lifecycle event. + + +## Detect Changes in Container States Via Relisting + +In order to generate pod lifecycle events, PLEG needs to detect changes in +container states. We can achieve this by periodically relisting all containers +(e.g., docker ps). Although this is similar to Kubelet's polling today, it will +only be performed by a single thread (PLEG). This means that we still +benefit from not having all pod workers hitting the container runtime +concurrently. Moreover, only the relevant pod worker would be woken up +to perform a sync. + +The upside of relying on relisting is that it is container runtime-agnostic, +and requires no external dependency. + +### Relist period + +The shorter the relist period is, the sooner that Kubelet can detect the +change. Shorter relist period also implies higher cpu usage. Moreover, the +relist latency depends on the underlying container runtime, and usually +increases as the number of containers/pods grows. We should set a default +relist period based on measurements. Regardless of what period we set, it will +likely be significantly shorter than the current pod sync period (10s), i.e., +Kubelet will detect container changes sooner. + + +## Impact on the Pod Worker Control Flow + +Kubelet is responsible for dispatching an event to the appropriate pod +worker based on the pod ID. Only one pod worker would be woken up for +each event. + +Today, the pod syncing routine in Kubelet is idempotent as it always +examines the pod state and the spec, and tries to drive to state to +match the spec by performing a series of operations. It should be +noted that this proposal does not intend to change this property -- +the sync pod routine would still perform all necessary checks, +regardless of the event type. This trades some efficiency for +reliability and eliminate the need to build a state machine that is +compatible with different runtimes. + +## Leverage Upstream Container Events + +Instead of relying on relisting, PLEG can leverage other components which +provide container events, and translate these events into pod lifecycle +events. This will further improve Kubelet's responsiveness and reduce the +resource usage caused by frequent relisting. + +The upstream container events can come from: + +(1). *Event stream provided by each container runtime* + +Docker's API exposes an [event +stream](https://docs.docker.com/reference/api/docker_remote_api_v1.17/#monitor-docker-s-events). +Nonetheless, rkt does not support this yet, but they will eventually support it +(see [coreos/rkt#1193](https://github.com/coreos/rkt/issues/1193)). + +(2). *cgroups event stream by cAdvisor* + +cAdvisor is integrated in Kubelet to provide container stats. It watches cgroups +containers using inotify and exposes an event stream. Even though it does not +support rkt yet, it should be straightforward to add such a support. + +Option (1) may provide richer sets of events, but option (2) has the advantage +to be more universal across runtimes, as long as the container runtime uses +cgroups. Regardless of what one chooses to implement now, the container event +stream should be easily swappable with a clearly defined interface. + +Note that we cannot solely rely on the upstream container events due to the +possibility of missing events. PLEG should relist infrequently to ensure no +events are missed. + +## Generate Expected Events + +*This is optional for PLEGs which performs only relisting, but required for +PLEGs that watch upstream events.* + +A pod worker's actions could lead to pod lifecycle events (e.g., +create/kill a container), which the worker would not observe until +later. The pod worker should ignore such events to avoid unnecessary +work. + +For example, assume a pod has two containers, A and B. The worker + + - Creates container A + - Receives an event `(ContainerStopped, B)` + - Receives an event `(ContainerStarted, A)` + + +The worker should ignore the `(ContainerStarted, A)` event since it is +expected. Arguably, the worker could process `(ContainerStopped, B)` +as soon as it receives the event, before observing the creation of +A. However, it is desirable to wait until the expected event +`(ContainerStarted, A)` is observed to keep a consistent per-pod view +at the worker. Therefore, the control flow of a single pod worker +should adhere to the following rules: + +1. Pod worker should process the events sequentially. +2. Pod worker should not start syncing until it observes the outcome of its own + actions in the last sync to maintain a consistent view. + +In other words, a pod worker should record the expected events, and +only wake up to perform the next sync until all expectations are met. + + - Creates container A, records an expected event `(ContainerStarted, A)` + - Receives `(ContainerStopped, B)`; stores the event and goes back to sleep. + - Receives `(ContainerStarted, A)`; clears the expectation. Proceeds to handle + `(ContainerStopped, B)`. + +We should set an expiration time for each expected events to prevent the worker +from being stalled indefinitely by missing events. + +## TODOs for v1.2 + +For v1.2, we will add a generic PLEG which relists periodically, and leave +adopting container events for future work. We will also *not* implement the +optimization that generate and filters out expected events to minimize +redundant syncs. + +- Add a generic PLEG using relisting. Modify the container runtime interface + to provide all necessary information to detect container state changes + in `GetPods()` (#13571). + +- Benchmark docker to adjust relising frequency. + +- Fix/adapt features that rely on frequent, periodic pod syncing. + * Liveness/Readiness probing: Create a separate probing manager using + explicitly container probing period [#10878](https://issues.k8s.io/10878). + * Instruct pod workers to set up a wake-up call if syncing failed, so that + it can retry. + + + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/pod-lifecycle-event-generator.md?pixel)]() + -- cgit v1.2.3 From 629e14cd12f0568c4316f1948d41eb064215dc99 Mon Sep 17 00:00:00 2001 From: Greg Taylor Date: Sun, 27 Dec 2015 11:50:04 -0800 Subject: Alphabetize user contributed libraries list. --- client-libraries.md | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/client-libraries.md b/client-libraries.md index a8a3f613..69661ff4 100644 --- a/client-libraries.md +++ b/client-libraries.md @@ -42,18 +42,17 @@ Documentation for other releases can be found at *Note: Libraries provided by outside parties are supported by their authors, not the core Kubernetes team* + * [Clojure](https://github.com/yanatan16/clj-kubernetes-api) * [Java (OSGi)](https://bitbucket.org/amdatulabs/amdatu-kubernetes) * [Java (Fabric8, OSGi)](https://github.com/fabric8io/kubernetes-client) - * [Ruby](https://github.com/Ch00k/kuber) - * [Ruby](https://github.com/abonas/kubeclient) - * [PHP](https://github.com/devstub/kubernetes-api-php-client) - * [PHP](https://github.com/maclof/kubernetes-client) * [Node.js](https://github.com/tenxcloud/node-kubernetes-client) * [Perl](https://metacpan.org/pod/Net::Kubernetes) - * [Clojure](https://github.com/yanatan16/clj-kubernetes-api) + * [PHP](https://github.com/devstub/kubernetes-api-php-client) + * [PHP](https://github.com/maclof/kubernetes-client) + * [Ruby](https://github.com/Ch00k/kuber) + * [Ruby](https://github.com/abonas/kubeclient) * [Scala](https://github.com/doriordan/skuber) - [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/client-libraries.md?pixel)]() -- cgit v1.2.3 From 6251b732135dc5d8a35a6a399aaaf0b462746b77 Mon Sep 17 00:00:00 2001 From: Janet Kuo Date: Tue, 8 Dec 2015 16:23:49 -0800 Subject: Proposal for deploy in kubectl --- deploy.md | 152 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 152 insertions(+) create mode 100644 deploy.md diff --git a/deploy.md b/deploy.md new file mode 100644 index 00000000..45a59e89 --- /dev/null +++ b/deploy.md @@ -0,0 +1,152 @@ + + + + +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + + +The latest release of this document can be found +[here](http://releases.k8s.io/release-1.1/docs/proposals/deploy.md). + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- + + + + + + + +- [Deploy through CLI](#deploy-through-cli) + - [Motivation](#motivation) + - [Requirements](#requirements) + - [Related `kubectl` Commands](#related-kubectl-commands) + - [`kubectl run`](#kubectl-run) + - [`kubectl scale` and `kubectl autoscale`](#kubectl-scale-and-kubectl-autoscale) + - [`kubectl rollout`](#kubectl-rollout) + - [`kubectl set`](#kubectl-set) + - [Example](#example) + - [Support in Deployment](#support-in-deployment) + - [Deployment Status](#deployment-status) + - [Deployment Version](#deployment-version) + - [Inert Deployments](#inert-deployments) + - [Perm-failed Deployments](#perm-failed-deployments) + + + +# Deploy through CLI + +## Motivation + +Users can use Deployments or `kubectl rolling-update` to deploy in their Kubernetes clusters. A Deployment provides declarative update for Pods and ReplicationControllers, whereas `rolling-update` allows the users to update their earlier deployment without worrying about schemas and configurations. Users need a way that's similar to `rolling-update` to manage their Deployments more easily. + +`rolling-update` expects ReplicationController as the only resource type it deals with. It's not trivial to support exactly the same behavior with Deployment, which requires: +- Print out scaling up/down events. +- Stop the deployment if users press Ctrl-c. +- The controller should not make any more changes once the process ends. (Delete the deployment when status.replicas=status.updatedReplicas=spec.replicas) + +So, instead, this document proposes another way to support easier deployment management via Kubernetes CLI (`kubectl`). + +## Requirements + +The followings are operations we need to support for the users to easily managing deployments: + +- **Create**: To create deployments. +- **Rollback**: To restore to an earlier version of deployment. +- **Watch the status**: To watch for the status update of deployments. +- **Pause/resume**: To pause a deployment mid-way, and to resume it. (A use case is to support canary deployment.) +- **Version information**: To record and show version information that's meaningful to users. This can be useful for rollback. + +## Related `kubectl` Commands + +### `kubectl run` + +`kubectl run` should support the creation of Deployment (already implemented) and DaemonSet resources. + +### `kubectl scale` and `kubectl autoscale` + +Users may use `kubectl scale` or `kubectl autoscale` to scale up and down Deployments (both already implemented). + +### `kubectl rollout` + +`kubectl rollout` supports both Deployment and DaemonSet. It has the following subcommands: +- `kubectl rollout undo` works like rollback; it allows the users to rollback to a previous version of deployment. +- `kubectl rollout pause` allows the users to pause a deployment. +- `kubectl rollout resume` allows the users to resume a paused deployment. +- `kubectl rollout status` shows the status of a deployment. +- `kubectl rollout history` shows meaningful version information of all previous deployments. + +### `kubectl set` + +`kubectl set` has the following subcommands: +- `kubectl set env` allows the users to set environment variables of Kubernetes resources. It should support Pod, ReplicationController, ReplicaSet, Deployment, and DaemonSet. +- `kubectl set image` allows the users to update the image of deployments. Users will use `--container` and `--image` flags to update the image of a container. + +### Example + +With the commands introduced above, here's an example of deployment management: + +```console +# Create a Deployment +$ kubectl run nginx --image=nginx --replicas=2 --generator=deployment/v1beta1 + +# Watch the Deployment status +$ kubectl rollout status deployment/nginx + +# Update the Deployment +$ kubectl set image deployment/nginx --container=nginx --image=nginx: + +# Pause the Deployment +$ kubectl rollout pause deployment/nginx + +# Resume the Deployment +$ kubectl rollout resume deployment/nginx + +# Check the change history (deployment versions) +$ kubectl rollout history deployment/nginx + +# Rollback to a previous version. +$ kubectl rollout undo deployment/nginx --to-version= +``` + +## Support in Deployment + +### Deployment Status + +Deployment status should summarize information about Pods, which includes: +- The number of pods of each version. +- The number of ready/not ready pods. + +See issue [#17164](https://github.com/kubernetes/kubernetes/issues/17164). + +### Deployment Version + +We store previous deployment versions information in deployment annotation `kubectl.kubernetes.io/deployment-version-