diff options
| author | Lee Verberne <verb@google.com> | 2017-01-18 17:27:53 -0800 |
|---|---|---|
| committer | Lee Verberne <verb@google.com> | 2017-01-18 17:27:53 -0800 |
| commit | d4789e1112bec3b75f06e331e16727babdcca2d7 (patch) | |
| tree | 029fcdf11b2bf7bb25c95b2e4ee2e699396eec30 | |
| parent | d3b09aa70d44644625c33afc5a752ca5b431cecd (diff) | |
Require shared PID namespace in CRI & plan rollout
3 files changed, 79 insertions, 71 deletions
diff --git a/contributors/design-proposals/container-runtime-interface-v1.md b/contributors/design-proposals/container-runtime-interface-v1.md index 024b1e10..d305aaaa 100644 --- a/contributors/design-proposals/container-runtime-interface-v1.md +++ b/contributors/design-proposals/container-runtime-interface-v1.md @@ -86,7 +86,7 @@ container setup that are not currently trackable as Pod constraints, e.g., filesystem setup, container image pulling, etc.* A container in a PodSandbox maps to an application in the Pod Spec. For Linux -containers, they are expected to share at least network and IPC namespaces, +containers, they are expected to share at least network, PID and IPC namespaces, with sharing more namespaces discussed in [#1615](https://issues.k8s.io/1615). diff --git a/contributors/design-proposals/pod-pid-namespace-docker.md b/contributors/design-proposals/pod-pid-namespace-docker.md deleted file mode 100644 index 924b626d..00000000 --- a/contributors/design-proposals/pod-pid-namespace-docker.md +++ /dev/null @@ -1,70 +0,0 @@ -# Shared PID Namespace for the Docker Runtime - -Pods share many namespaces, but the ability to share a PID namespace was not -supported by Docker until version 1.12. SIG Node approved a change to the -default behavior contingent on a brief rollout plan, which is this document. -Please refer to [#1615](https://issues.k8s.io/1615) for full technical details. - -## Motivation - -Sharing a PID namespace is discussed in [#1615](https://issues.k8s.io/1615), -and enables: - - 1. signaling between containers, which is useful for side cars (e.g. for - signaling a daemon process after rotating logs). - 2. easier troubleshooting of pods. - 3. addressing [Docker's zombie problem][1] by reaping orphaned zombies in the - infra container. - -## Goals and Non-Goals - -Goals include: - - Changing default behavior in the Kubernetes Docker runtime - -Non-goals include: - - Creating an init solution that works for all runtimes - - Supporting isolated PID namespace indefinitely - - Addressing the larger issue of requiring shared namespaces in all runtimes - -Kubernetes does not currently specify how runtimes must support a PID namespace, -but many runtimes (e.g. cri-o & rkt) already support a shared namespace. This -rolls out support for Docker. - -## Rollout Plan - -Sharing the PID namespace changes an implicit behavior of the Docker runtime -whereby the command run by the container image is always PID 1. This is a side -effect of isolated namespaces rather than intentional behavior, but users may -have built upon this assumption so we should change the default behavior over -the course of multiple releases. (The following release numbers are earliest -possible releases and may change based on implementation and community -feedback.) - - 1. Release 1.6: Enable the shared PID namespace for pods annotated with - `docker.kubernetes.io/shared-pid: true` (i.e. opt-in) when running with - Docker >= 1.12. Pods with this annotation will fail to start with older - Docker versions rather than failing to meet a user's expectation. - 2. Release 1.7: Enable the shared PID namespace for pods unless annotated - with `docker.kubernetes.io/shared-pid: false` (i.e. opt-out) when running - with Docker >= 1.12. - 3. Release 1.8: Remove the annotation. All pods receive a shared PID - namespace when running with Docker >= 1.12. - -With each step we will add a release note that clearly describes the change. -After each release we will poll kubernetes-users to determine what, if any, -applications were impacted by this change. If we discover a use case which -cannot be accommodated by a shared PID namespace, we will abort step 3 and -instead formalize a shared-pid field into the pod spec. - -## Alternatives Considered - -Changing this behavior over the course of 6 months is a bit conservative. We -could instead change the behavior in 2 releases by omitting the first step, but -the opt-in phase allows users to test the change with fewer surprises. - -[1]: https://blog.phusion.nl/2015/01/20/docker-and-the-pid-1-zombie-reaping-problem/ - - -<!-- BEGIN MUNGE: GENERATED_ANALYTICS --> -[]() -<!-- END MUNGE: GENERATED_ANALYTICS --> diff --git a/contributors/design-proposals/pod-pid-namespace.md b/contributors/design-proposals/pod-pid-namespace.md new file mode 100644 index 00000000..f5c48e3f --- /dev/null +++ b/contributors/design-proposals/pod-pid-namespace.md @@ -0,0 +1,78 @@ +# Shared PID Namespace + +Pods share namespaces where possible, but a requirement for sharing the PID +namespace has not been defined due to lack of support in Docker. Docker began +supporting a shared PID namespace in 1.12, and other Kubernetes runtimes (rkt, +cri-o, hyper) have already implemented a shared PID namespace. + +This proposal defines a shared PID namespace as a requirement of the Container +Runtime Interface and links its rollout in Docker to that of the CRI. + +## Motivation + +Sharing a PID namespace is discussed in [#1615](https://issues.k8s.io/1615), +and enables: + + 1. signaling between containers, which is useful for side cars (e.g. for + signaling a daemon process after rotating logs). + 2. easier troubleshooting of pods. + 3. addressing [Docker's zombie problem][1] by reaping orphaned zombies in the + infra container. + +## Goals and Non-Goals + +Goals include: + - Changing default behavior in the Docker runtime as implemented by the CRI + - Making Docker behavior compatible with the other Kubernetes runtimes + +Non-goals include: + - Creating an init solution that works for all runtimes + - Supporting isolated PID namespace indefinitely + +## Modification to the Docker Runtime + +We will modify the Docker implementation of the CRI to use a shared PID +namespace when running with a version of Docker >= 1.12. The legacy +`dockertools` implementation will not be changed. + +Linking this change to the CRI means that Kubernetes users who care to test such +changes can test the combined changes at once. Users who do not care to test +such changes will be insulated by Kubernetes not recommending Docker >= 1.12 +until after switching to the CRI. + +Other changes that must be made to support this change: + +1. Ensure all containers restart if the infra container responsible for the + PodSandbox dies. (Note: With Docker 1.12 if the source of the PID namespace + dies all containers sharing that namespace are killed as well.) +2. Modify the Infra container used by the Docker runtime to reap orphaned + zombies ([#36853](https://pr.k8s.io/36853)). + +## Rollout Plan + +SIG Node is planning to switch to the CRI as a default in 1.6, at which point +users with Docker >= 1.12 will be able to test Shared namespaces. Switching +back to isolated PID namespaces will require disabling the CRI. + +At some point, say 1.7, SIG Node will remove support for disabling the CRI. +After this point users must roll back to a previous version of Kubernetes or +Docker to achieve PID namespace isolation. This is acceptable because: + +* No one has been able to identify a concrete use case requiring isolated PID + namespaces. +* The lack of use cases means we can't justify the complexity required to make + PID namespace type configurable. +* Users will already be looking for issues due to the major version upgrade and + prepared for a rollback to the previous release. + +Alternatively, we could create a flag in the kublet to disable shared PID +namespace, but this wouldn't be especially useful to users of a hosted +Kubernetes cluster. + + +[1]: https://blog.phusion.nl/2015/01/20/docker-and-the-pid-1-zombie-reaping-problem/ + + +<!-- BEGIN MUNGE: GENERATED_ANALYTICS --> +[]() +<!-- END MUNGE: GENERATED_ANALYTICS --> |
