summaryrefslogtreecommitdiff
path: root/contributors
diff options
context:
space:
mode:
authorLee Verberne <verb@google.com>2019-09-30 18:07:44 +0200
committerLee Verberne <verb@google.com>2020-01-13 17:31:04 +0100
commit0cfe22d04a561675482fee2d0876de072a17f0b5 (patch)
treecaf7d9dda04a77527a37a8810ce8a4186cc69abc /contributors
parent5d62001346d1cdfa8b3b141bcab01d90eb108157 (diff)
Redirect shareProcessNamespace proposal to KEP
Diffstat (limited to 'contributors')
-rw-r--r--contributors/design-proposals/node/pod-pid-namespace.md319
1 files changed, 5 insertions, 314 deletions
diff --git a/contributors/design-proposals/node/pod-pid-namespace.md b/contributors/design-proposals/node/pod-pid-namespace.md
index e73bdafe..aeac92fe 100644
--- a/contributors/design-proposals/node/pod-pid-namespace.md
+++ b/contributors/design-proposals/node/pod-pid-namespace.md
@@ -1,319 +1,10 @@
# Shared PID Namespace
-* Status: Pending
-* Version: Alpha
+* Status: Superseded
+* Version: N/A
* Implementation Owner: [@verb](https://github.com/verb)
-## Motivation
+The Shared PID Namespace proposal has moved to the
+[Shared PID Namespace KEP][shared-pid-kep].
-Pods share namespaces where possible, but support for sharing the PID namespace
-had not been defined due to lack of support in Docker. This created an implicit
-API on which certain container images now rely. This document proposes adding
-support for sharing a process namespace between containers in a pod while
-maintaining backwards compatibility with the existing implicit API.
-
-## Proposal
-
-### Goals and Non-Goals
-
-Goals include:
-
-* Backwards compatibility with container images expecting `pid == 1` semantics
-* Per-pod configuration of PID namespace sharing
-* Ability to change default sharing behavior in `v2.Pod`
-
-Non-goals include:
-
-* Creating a general purpose container init solution
-* Multiple shared PID namespaces per pod
-* Per-container configuration of PID namespace sharing
-
-### Summary
-
-We will add support for configuring pod-shared process namespaces by adding a
-new boolean field `ShareProcessNamespace` to the pod spec. The default to false
-means that each container will have a separate process namespace. When set to
-true, all containers in the pod will share a single process namespace.
-
-The Container Runtime Interface (CRI) will be updated to support three namespace
-modes: Container, Pod & Node. The Runtime Manager will translate the pod spec
-into one of these modes as follows:
-
-Pod `shareProcessNamespace` | Pod `hostPID` | CRI PID Mode
---------------------------- | ------------- | ------------
-false | false | Container
-false | true | Node
-true | false | Pod
-true | true | *Error*
-
-If a runtime does not implement a particular PID mode, it must return an error.
-For reference, Docker will support all three modes when using version >= 1.13.1.
-
-The shared PID functionality will be hidden behind a new feature gate in both
-the API server and the kubelet, and the existing `--docker-disable-shared-pid`
-flag will be removed from the kubelet, subject to [deprecation
-policy](https://kubernetes.io/docs/reference/deprecation-policy/).
-
-## User Experience
-
-### Use Cases
-
-Sharing a PID namespace between containers in a pod is discussed in
-[#1615](https://issues.k8s.io/1615) and enables:
-
-1. signaling between containers, which is useful for side cars (e.g. for
- signaling a daemon process after rotating logs).
-1. easier troubleshooting of pods.
-1. addressing [Docker's zombie
- problem](https://blog.phusion.nl/2015/01/20/docker-and-the-pid-1-zombie-reaping-problem/)
- by reaping orphaned zombies in the infra container.
-
-### Behavioral Changes
-
-Sharing a process namespace fits well with Kubernetes' pod abstraction, but it's
-a significant departure from the traditional behavior of Docker. This may break
-container images and development patterns that have come to rely on process
-isolation. Notably:
-
-1. **The main container process no longer has PID 1**. It cannot be signalled
- using `kill 1`, and attempting to do so will instead signal the
- infrastructure container and potentially restart the pod. Containers
- shipping an init system like systemd may [require additional
- flags](https://github.com/kubernetes/kubernetes/issues/48937#issuecomment-321243669).
-1. **Processes are visible to other containers in the pod**. This includes all
- information visible in `/proc`, such as passwords as arguments or
- environment variables, and process signalling. This can be somewhat
- mitigated by running processes as separate, non-root users.
-1. **Container filesystems are visible to other containers in the pod through
- the <code>/proc/$pid/root</code> magic symlink**. This makes debugging
- easier, but it also means that secrets are protected only by standard
- filesystem permissions.
-
-## Implementation
-
-### Kubernetes API Changes
-
-`v1.PodSpec` gains a new field named `ShareProcessNamespace`:
-
-```
-// PodSpec is a description of a pod.
-type PodSpec struct {
- ...
- // Use the host's pid namespace.
- // Note that HostPID and ShareProcessNamespace cannot both be set.
- // Optional: Default to false.
- // +k8s:conversion-gen=false
- // +optional
- HostPID bool `json:"hostPID,omitempty" protobuf:"varint,12,opt,name=hostPID"`
- // Share a single process namespace between all of the containers in a pod.
- // Note that HostPID and ShareProcessNamespace cannot both be set.
- // Optional: Default to false.
- // +k8s:conversion-gen=false
- // +optional
- ShareProcessNamespace *bool `json:"shareProcessNamespace,omitempty" protobuf:"varint,XX,opt,name=shareProcessNamespace"`
- ...
-```
-
-The field name deviates from that of HostPID in an attempt to [better signal the
-consequences](https://github.com/kubernetes/community/pull/1048/files#r159146536)
-of setting the option. Setting both `ShareProcessNamespace` and `HostPID` will
-cause a validation error.
-
-### Container Runtime Interface Changes
-
-Namespace options in the CRI are currently specified for both `PodSandbox` and
-`Container` creation requests via booleans in `NamespaceOption`:
-
-```
-message NamespaceOption {
- // If set, use the host's network namespace.
- bool host_network = 1;
- // If set, use the host's PID namespace.
- bool host_pid = 2;
- // If set, use the host's IPC namespace.
- bool host_ipc = 3;
-}
-```
-
-We will change `NamespaceOption` to use a `NamespaceMode` enumeration for the
-existing namespace options:
-
-```
-enum NamespaceMode {
- POD = 0;
- CONTAINER = 1;
- NODE = 2;
-}
-
-// NamespaceOption provides options for Linux namespaces.
-message NamespaceOption {
- // Network namespace for this container/sandbox.
- // Runtimes must support: POD, NODE
- NamespaceMode network = 1;
- // PID namespace for this container/sandbox.
- // Note: The CRI default is POD, but the v1.PodSpec default is CONTAINER.
- // The kubelet's runtime manager will set this to CONTAINER explicitly for v1 pods.
- // Runtimes must support: POD, CONTAINER, NODE
- NamespaceMode pid = 2;
- // IPC namespace for this container/sandbox.
- // Runtimes must support: POD, NODE
- NamespaceMode ipc = 3;
-}
-```
-
-Note that this breaks backwards compatibility in the CRI, which is still in
-alpha.
-
-The protocol default for a namespace is `POD` because that's the default for
-network and IPC, and we will consider making it the default for PID in `v2.Pod`.
-The kubelet will explicitly set `pid` to `CONTAINER` for `v1.Pod` by default so
-that the default behavior of `v1.Pod` does not change.
-
-This CRI design allows different namespace configuration for each of the
-containers in the pod and the sandbox, but currently we have no plans to support
-this in the Kubernetes API. The kubelet will translate namespace booleans from
-v1.PodSpec into a single `NamespaceMode` to be used for the sandbox and all
-regular and init containers in a pod.
-
-#### Targeting a Specific Container's Namespace
-
-Though we don't intend to support this in general pod configuration, there is a
-use case for mixed process namespaces within a single pod. [Troubleshooting
-Running Pods](troubleshooting-running-pods.md) allows inserting an ephemeral
-Debug Container in an existing, running pod. In order for this to be useful we
-want to share, within the pod, a process namespace between the new container
-performing the debugging and its existing target container.
-
-This is done with the additional `NamespaceMode` `TARGET` and field `target_id`:
-
-```
-enum NamespaceMode {
- POD = 0;
- CONTAINER = 1;
- NODE = 2;
- TARGET = 3;
-}
-
-// NamespaceOption provides options for Linux namespaces.
-message NamespaceOption {
- // Network namespace for this container/sandbox.
- // Runtimes must support: POD, NODE
- NamespaceMode network = 1;
- // PID namespace for this container/sandbox.
- // Note: The CRI default is POD, but the v1.PodSpec default is CONTAINER.
- // The kubelet's runtime manager will set this to CONTAINER explicitly for v1 pods.
- // Runtimes must support: POD, CONTAINER, NODE, TARGET
- NamespaceMode pid = 2;
- // IPC namespace for this container/sandbox.
- // Runtimes must support: POD, NODE
- NamespaceMode ipc = 3;
- // Target Container ID for NamespaceMode of TARGET. This container must be in the
- // same pod as the target container.
- string target_id = 4;
-}
-```
-
-When `NamespaceOption.pid` is set to `TARGET`, a runtime must create the new
-container in the namespace used by the container ID in `target_id`. If the
-target container has `NamespaceOption.pid` set to `POD`, then the new container
-should also use the pod namespace. If the target container has an isolated
-process namespace, then the new container will join only that container's
-namespace. Examples are provided for dockershim below.
-
-There is no mechanism in the Kubernetes API for an end-user to set `TARGET`. It
-exists for the kubelet to run automation or debugging from a container image in
-the namespace of an existing pod and container. Additionally, we choose to
-explicitly not support sharing namespaces between different pods. The kubelet
-must not generate such a reference, and the runtime should not accept it. That
-is, for pod{Container `A`, Container `B`, Sandbox `S}` and any other unrelated
-Container `C`:
-
-valid `target_id` | invalid `target_id`
------------------ | -------------------
-containerID(A) | sandboxID(S)
-containerID(B) | containerID(C)
-
-### dockershim Changes
-
-The Docker runtime implements the pod sandbox as a container running the pause
-container image. When configured for `POD` namespace sharing, the PID namespace
-of the sandbox will become the single PID namespace for the pod. This means a
-namespace of `POD` and `CONTAINER` are equivalent for the sandbox. The mapping
-of the _sandbox's_ PID mode to docker's `HostConfig.PidMode` is (`v1.Pod`
-settings provided as reference):
-
-ShareProcessNamespace | HostPID | Sandbox PID Mode | HostConfig.PidMode
---------------------- | ------- | ---------------- | ------------------
-false | false | CONTAINER | *unset*
-true | false | POD | *unset*
-false | true | NODE | "host"
-\- | \- | TARGET | *Error*
-
-For _containers_, `HostConfig.PidMode` will be set as follows:
-
-ShareProcessNamespace | HostPID | Container PID Mode | HostConfig.PidMode
---------------------- | ------- | ------------------ | ------------------
-false | false | CONTAINER | *unset*
-true | false | POD | "container:[sandbox-container-id]"
-false | true | NODE | "host"
-false | false | TARGET | "container:[target-container-id]"
-true | false | TARGET | "container:[sandbox-container-id]"
-false | true | TARGET | "host"
-
-If the Docker runtime version does not support sharing pid namespaces, a
-`CreateContainerRequest` with `namespace_options.pid` set to `POD` will return
-an error.
-
-### Deprecation of existing kubelet flag
-
-SIG Node did not anticipate the strong objections to migrating from isolated to
-shared process namespaces for Docker. The previous (now abandoned) migration
-plan introduced a kubelet flag to toggle the shared namespace behavior, but
-objections did not materialize until the flag had moved from experimental to GA.
-
-The `--docker-disable-shared-pid` (default: true) kubelet flag disables the use
-of shared process namespaces for the Docker runtime. We will immediately mark it
-as deprecated, but according to the [deprecation
-policy](https://kubernetes.io/docs/reference/deprecation-policy/) we must
-support it for 6 months.
-
-We must provide a transition path for users setting this kubelet flag to false.
-Setting this flag asserts a desire to override the default Kubernetes behavior
-for all pods. Until the flag is removed, the kubelet will honor this assertion
-by ignoring the value of `ShareProcessNamespace` and logging a warning to the
-event log.
-
-## Alternatives Considered
-
-### Explicit Container/Sandbox ID Targeting
-
-Rather than using a `NamespaceMode`, `NamespaceOption.pid` could be a string
-that explicitly targets a container or sandbox ID:
-
-```
-// NamespaceOption provides options for Linux namespaces.
-message NamespaceOption {
- ...
- // ID of Sandbox or Container to use for PID namespace, or "host"
- string pid = 2;
- ...
-}
-```
-
-This removes the need for a separate `TARGET` mode, but a mode enumeration
-better captures the intent of the option.
-
-### Defaulting to PID Namespace Sharing
-
-Other Kubernetes runtimes already share a single PID namespace between
-containers in a pod. We could easily change the Docker runtime to always share a
-PID namespace when supported by the installed Docker version, but this would
-cause problems for container images that assume they will always be PID 1.
-
-### Migration to Shared-only Namespaces
-
-Rather than adding support to the API for configuring namespaces we could allow
-changing the default behavior with pod annotations with the intention of
-removing support for isolated PID namespaces in v2.Pod. Many members of the
-community want to use the isolated namespaces as security boundary between
-containers in a pod, however.
+[shared-pid-kep]: https://git.k8s.io/enhancements/keps/sig-node/20190920-pod-pid-namespace.md