summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorDawn Chen <dawnchen@google.com>2017-01-25 18:05:52 -0800
committerGitHub <noreply@github.com>2017-01-25 18:05:52 -0800
commit160e9b655ae43cba0a745050d94bba8741b1754f (patch)
tree998c1be1c309414c725b51b23cf815f4d9075e3a
parentd5e1d5ccb9d2bcf42f2cad5e5ced48d93c0aa4be (diff)
parent3da9bf8cf55f519bc757976480cb0433fe0aad0c (diff)
Merge pull request #207 from verb/sharedpid-rollout
Propose rollout for Docker shared PID namespace
-rw-r--r--contributors/design-proposals/container-runtime-interface-v1.md2
-rw-r--r--contributors/design-proposals/pod-pid-namespace.md77
2 files changed, 78 insertions, 1 deletions
diff --git a/contributors/design-proposals/container-runtime-interface-v1.md b/contributors/design-proposals/container-runtime-interface-v1.md
index 024b1e10..d305aaaa 100644
--- a/contributors/design-proposals/container-runtime-interface-v1.md
+++ b/contributors/design-proposals/container-runtime-interface-v1.md
@@ -86,7 +86,7 @@ container setup that are not currently trackable as Pod constraints, e.g.,
filesystem setup, container image pulling, etc.*
A container in a PodSandbox maps to an application in the Pod Spec. For Linux
-containers, they are expected to share at least network and IPC namespaces,
+containers, they are expected to share at least network, PID and IPC namespaces,
with sharing more namespaces discussed in [#1615](https://issues.k8s.io/1615).
diff --git a/contributors/design-proposals/pod-pid-namespace.md b/contributors/design-proposals/pod-pid-namespace.md
new file mode 100644
index 00000000..43c38f22
--- /dev/null
+++ b/contributors/design-proposals/pod-pid-namespace.md
@@ -0,0 +1,77 @@
+# Shared PID Namespace
+
+Pods share namespaces where possible, but a requirement for sharing the PID
+namespace has not been defined due to lack of support in Docker. Docker began
+supporting a shared PID namespace in 1.12, and other Kubernetes runtimes (rkt,
+cri-o, hyper) have already implemented a shared PID namespace.
+
+This proposal defines a shared PID namespace as a requirement of the Container
+Runtime Interface and links its rollout in Docker to that of the CRI.
+
+## Motivation
+
+Sharing a PID namespace between containers in a pod is discussed in
+[#1615](https://issues.k8s.io/1615), and enables:
+
+ 1. signaling between containers, which is useful for side cars (e.g. for
+ signaling a daemon process after rotating logs).
+ 2. easier troubleshooting of pods.
+ 3. addressing [Docker's zombie problem][1] by reaping orphaned zombies in the
+ infra container.
+
+## Goals and Non-Goals
+
+Goals include:
+ - Changing default behavior in the Docker runtime as implemented by the CRI
+ - Making Docker behavior compatible with the other Kubernetes runtimes
+
+Non-goals include:
+ - Creating an init solution that works for all runtimes
+ - Supporting isolated PID namespace indefinitely
+
+## Modification to the Docker Runtime
+
+We will modify the Docker implementation of the CRI to use a shared PID
+namespace when running with a version of Docker >= 1.12. The legacy
+`dockertools` implementation will not be changed.
+
+Linking this change to the CRI means that Kubernetes users who care to test such
+changes can test the combined changes at once. Users who do not care to test
+such changes will be insulated by Kubernetes not recommending Docker >= 1.12
+until after switching to the CRI.
+
+Other changes that must be made to support this change:
+
+1. Add a test to verify all containers restart if the infra container
+ responsible for the PodSandbox dies. (Note: With Docker 1.12 if the source
+ of the PID namespace dies all containers sharing that namespace are killed
+ as well.)
+2. Modify the Infra container used by the Docker runtime to reap orphaned
+ zombies ([#36853](https://pr.k8s.io/36853)).
+
+## Rollout Plan
+
+SIG Node is planning to switch to the CRI as a default in 1.6, at which point
+users with Docker >= 1.12 will receive a shared PID namespace by default.
+Cluster administrators will be able to disable this behavior by providing a flag
+to the kubelet which will cause the dockershim to revert to previous behavior.
+
+The ability to disable shared PID namespaces is intended as a way to roll back
+to prior behavior in the event of unforeseen problems. It won't be possible to
+configure the behavior per-pod. We believe this is acceptable because:
+
+* We have not identified a concrete use case requiring isolated PID namespaces.
+* Making PID namespace configurable requires changing the CRI, which we would
+ like to avoid since there are no use cases.
+
+In a future release, SIG Node will recommend docker >= 1.12. Unless a compelling
+use case for isolated PID namespaces is discovered, we will remove the ability
+to disable the shared PID namespace in the subsequent release.
+
+
+[1]: https://blog.phusion.nl/2015/01/20/docker-and-the-pid-1-zombie-reaping-problem/
+
+
+<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
+[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/pod-pid-namespace.md?pixel)]()
+<!-- END MUNGE: GENERATED_ANALYTICS -->