Add KEP tombstones

Signed-off-by: Stephen Augustus <foo@agst.us>
author: Stephen Augustus <foo@agst.us> 2018-12-01 02:40:42 -0500
committer: Stephen Augustus <foo@agst.us> 2018-12-01 02:40:42 -0500
commit: 1004e56177eb12d85b6e0f6cf1ccd00431f7336b (patch)
tree: e2a87f95b32e046ed32a2eea6cde661704e61fbd /keps/sig-node
parent: 973b19523840d207ae206175ac2093d3b564668c (diff)
5 files changed, 20 insertions, 1973 deletions
diff --git a/keps/sig-node/0008-20180430-promote-sysctl-annotations-to-fields.md b/keps/sig-node/0008-20180430-promote-sysctl-annotations-to-fields.md
index 4a2090a1..cfd1f5fa 100644
--- a/keps/sig-node/0008-20180430-promote-sysctl-annotations-to-fields.md
+++ b/keps/sig-node/0008-20180430-promote-sysctl-annotations-to-fields.md
@@ -1,225 +1,4 @@
----
-kep-number: 8
-title: Protomote sysctl annotations to fields
-authors:
-  - "@ingvagabund"
-owning-sig: sig-node
-participating-sigs:
-  - sig-auth
-reviewers:
-  - "@sjenning"
-  - "@derekwaynecarr"
-approvers:
-  - "@sjenning "
-  - "@derekwaynecarr"
-editor:
-creation-date: 2018-04-30
-last-updated: 2018-05-02
-status: provisional
-see-also:
-replaces:
-superseded-by:
----
-
-# Promote sysctl annotations to fields
-
-## Table of Contents
-
-* [Promote sysctl annotations to fields](#promote-sysctl-annotations-to-fields)
-   * [Table of Contents](#table-of-contents)
-   * [Summary](#summary)
-   * [Motivation](#motivation)
-      * [Promote annotations to fields](#promote-annotations-to-fields)
-      * [Promote --experimental-allowed-unsafe-sysctls kubelet flag to kubelet config api option](#promote---experimental-allowed-unsafe-sysctls-kubelet-flag-to-kubelet-config-api-option)
-      * [Gate the feature](#gate-the-feature)
-   * [Proposal](#proposal)
-      * [User Stories](#user-stories)
-      * [Implementation Details/Notes/Constraints](#implementation-detailsnotesconstraints)
-      * [Risks and Mitigations](#risks-and-mitigations)
-   * [Graduation Criteria](#graduation-criteria)
-   * [Implementation History](#implementation-history)
-
-## Summary
-
-Setting the `sysctl` parameters through annotations provided a successful story
-for defining better constraints of running applications.
-The `sysctl` feature has been tested by a number of people without any serious
-complaints. Promoting the annotations to fields (i.e. to beta) is another step in making the
-`sysctl` feature closer towards the stable API.
-
-Currently, the `sysctl` provides `security.alpha.kubernetes.io/sysctls` and `security.alpha.kubernetes.io/unsafe-sysctls` annotations that can be used
-in the following way:
-  ```yaml
-  apiVersion: v1
-  kind: Pod
-  metadata:
-    name: sysctl-example
-    annotations:
-      security.alpha.kubernetes.io/sysctls: kernel.shm_rmid_forced=1
-      security.alpha.kubernetes.io/unsafe-sysctls: net.ipv4.route.min_pmtu=1000,kernel.msgmax=1 2 3
-  spec:
-    ...
-  ```
-
-  The goal is to transition into native fields on pods:
-
-  ```yaml
-  apiVersion: v1
-  kind: Pod
-  metadata:
-    name: sysctl-example
-  spec:
-    securityContext:
-      sysctls:
-      - name: kernel.shm_rmid_forced
-        value: 1
-      - name: net.ipv4.route.min_pmtu
-        value: 1000
-        unsafe: true
-      - name: kernel.msgmax
-        value: "1 2 3"
-        unsafe: true
-    ...
-  ```
-
-The `sysctl` design document with more details and rationals is available at [design-proposals/node/sysctl.md](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/node/sysctl.md#pod-api-changes)
-
-## Motivation
-
-As mentioned in [contributors/devel/api_changes.md#alpha-field-in-existing-api-version](https://github.com/kubernetes/community/blob/master/contributors/devel/api_changes.md#alpha-field-in-existing-api-version):
-
-> Previously, annotations were used for experimental alpha features, but are no longer recommended for several reasons:
->
->    They expose the cluster to "time-bomb" data added as unstructured annotations against an earlier API server (https://issue.k8s.io/30819)
->    They cannot be migrated to first-class fields in the same API version (see the issues with representing a single value in multiple places in backward compatibility gotchas)
->
-> The preferred approach adds an alpha field to the existing object, and ensures it is disabled by default:
->
-> ...
-
-The annotations as a means to set `sysctl` are no longer necessary.
-The original intent of annotations was to provide additional description of Kubernetes
-objects through metadata.
-It's time to separate the ability to annotate from the ability to change sysctls settings
-so a cluster operator can elevate the distinction between experimental and supported usage
-of the feature.
-
-### Promote annotations to fields
-
-* Introduce native `sysctl` fields in pods through `spec.securityContext.sysctl` field as:
-
-  ```yaml
-  sysctl:
-  - name: SYSCTL_PATH_NAME
-    value: SYSCTL_PATH_VALUE
-    unsafe: true    # optional field
-  ```
-
-* Introduce native `sysctl` fields in [PSP](https://kubernetes.io/docs/concepts/policy/pod-security-policy/) as:
-
-  ```yaml
-  apiVersion: v1
-  kind: PodSecurityPolicy
-  metadata:
-    name: psp-example
-  spec:
-    sysctls:
-    - kernel.shmmax
-    - kernel.shmall
-    - net.*
-  ```
-
-  More examples at [design-proposals/node/sysctl.md#allowing-only-certain-sysctls](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/node/sysctl.md#allowing-only-certain-sysctls)
-
-### Promote `--experimental-allowed-unsafe-sysctls` kubelet flag to kubelet config api option
-
-As there is no longer a need to consider the `sysctl` feature experimental,
-the list of unsafe sysctls can be configured accordingly through:
-
-```go
-// KubeletConfiguration contains the configuration for the Kubelet
-type KubeletConfiguration struct {
-  ...
-  // Whitelist of unsafe sysctls or unsafe sysctl patterns (ending in *).
-  // Default: nil
-  // +optional
-  AllowedUnsafeSysctls []string `json:"allowedUnsafeSysctls,omitempty"`
-}
-```
-
-Upstream issue: https://github.com/kubernetes/kubernetes/issues/61669
-
-### Gate the feature
-
-As the `sysctl` feature stabilizes, it's time to gate the feature [1] and enable it by default.
-
-* Expected feature gate key: `Sysctls`
-* Expected default value: `true`
-
-With the `Sysctl` feature enabled, both sysctl fields in `Pod` and `PodSecurityPolicy`
-and the whitelist of unsafed sysctls are acknowledged.
-If disabled, the fields and the whitelist are just ignored.
-
-[1] https://kubernetes.io/docs/reference/command-line-tools-reference/feature-gates/
-
-## Proposal
-
-This is where we get down to the nitty gritty of what the proposal actually is.
-
-### User Stories
-
-* As a cluster admin, I want to have `sysctl` feature versioned so I can assure backward compatibility
-  and proper transformation between versioned to internal representation and back..
-* As a cluster admin, I want to be confident the `sysctl` feature is stable enough and well supported so
-  applications are properly isolated
-* As a cluster admin, I want to be able to apply the `sysctl` constraints on the cluster level so
-  I can define the default constraints for all pods.
-
-### Implementation Details/Notes/Constraints
-
-Extending `SecurityContext` struct with `Sysctls` field:
-
-```go
-// PodSecurityContext holds pod-level security attributes and common container settings.
-// Some fields are also present in container.securityContext.  Field values of
-// container.securityContext take precedence over field values of PodSecurityContext.
-type PodSecurityContext struct {
-    ...
-    // Sysctls is a white list of allowed sysctls in a pod spec.
-    Sysctls []Sysctl `json:"sysctls,omitempty"`
-}
-```
-
-Extending `PodSecurityPolicySpec` struct with `Sysctls` field:
-
-```go
-// PodSecurityPolicySpec defines the policy enforced on sysctls.
-type PodSecurityPolicySpec struct {
-    ...
-    // Sysctls is a white list of allowed sysctls in a pod spec.
-    Sysctls []Sysctl `json:"sysctls,omitempty"`
-}
-```
-
-Following steps in [devel/api_changes.md#alpha-field-in-existing-api-version](https://github.com/kubernetes/community/blob/master/contributors/devel/api_changes.md#alpha-field-in-existing-api-version)
-during implementation.
-
-Validation checks implemented as part of [#27180](https://github.com/kubernetes/kubernetes/pull/27180).
-
-### Risks and Mitigations
-
-We need to assure backward compatibility, i.e. object specifications with `sysctl` annotations
-must still work after the graduation.
-
-## Graduation Criteria
-
-* API changes allowing to configure the pod-scoped `sysctl` via `spec.securityContext` field.
-* API changes allowing to configure the cluster-scoped `sysctl` via `PodSecurityPolicy` object
-* Promote `--experimental-allowed-unsafe-sysctls` kubelet flag to kubelet config api option
-* feature gate enabled by default
-* e2e tests
-
-## Implementation History
-
-The `sysctl` feature is tracked as part of [features#34](https://github.com/kubernetes/features/issues/34).
-This is one of the goals to promote the annotations to fields.
+KEPs have moved to https://git.k8s.io/enhancements/.
+<!--
+This file is a placeholder to preserve links. Please remove after 6 months or the release of Kubernetes 1.15, whichever comes first.
+-->
+\ No newline at end of file
diff --git a/keps/sig-node/0009-node-heartbeat.md b/keps/sig-node/0009-node-heartbeat.md
index f80b9609..cfd1f5fa 100644
--- a/keps/sig-node/0009-node-heartbeat.md
+++ b/keps/sig-node/0009-node-heartbeat.md
@@ -1,392 +1,4 @@
----
-kep-number: 8
-title: Efficient Node Heartbeat
-authors:
-  - "@wojtek-t"
-  - "with input from @bgrant0607, @dchen1107, @yujuhong, @lavalamp"
-owning-sig: sig-node
-participating-sigs:
-  - sig-scalability
-  - sig-apimachinery
-  - sig-scheduling
-reviewers:
-  - "@deads2k"
-  - "@lavalamp"
-approvers:
-  - "@dchen1107"
-  - "@derekwaynecarr"
-editor: TBD
-creation-date: 2018-04-27
-last-updated: 2018-04-27
-status: implementable
-see-also:
-  - https://github.com/kubernetes/kubernetes/issues/14733
-  - https://github.com/kubernetes/kubernetes/pull/14735
-replaces:
-  - n/a
-superseded-by:
-  - n/a
----
-
-# Efficient Node Heartbeats
-
-## Table of Contents
-
-Table of Contents
-=================
-
-* [Efficient Node Heartbeats](#efficient-node-heartbeats)
-   * [Table of Contents](#table-of-contents)
-   * [Summary](#summary)
-   * [Motivation](#motivation)
-      * [Goals](#goals)
-      * [Non-Goals](#non-goals)
-   * [Proposal](#proposal)
-      * [Risks and Mitigations](#risks-and-mitigations)
-   * [Graduation Criteria](#graduation-criteria)
-   * [Implementation History](#implementation-history)
-   * [Alternatives](#alternatives)
-      * [Dedicated “heartbeat” object instead of “leader election” one](#dedicated-heartbeat-object-instead-of-leader-election-one)
-      * [Events instead of dedicated heartbeat object](#events-instead-of-dedicated-heartbeat-object)
-      * [Reuse the Component Registration mechanisms](#reuse-the-component-registration-mechanisms)
-      * [Split Node object into two parts at etcd level](#split-node-object-into-two-parts-at-etcd-level)
-      * [Delta compression in etcd](#delta-compression-in-etcd)
-      * [Replace etcd with other database](#replace-etcd-with-other-database)
-
-## Summary
-
-Node heartbeats are necessary for correct functioning of Kubernetes cluster.
-This proposal makes them significantly cheaper from both scalability and
-performance perspective.
-
-## Motivation
-
-While running different scalability tests we observed that in big enough clusters
-(more than 2000 nodes) with non-trivial number of images used by pods on all
-nodes (10-15), we were hitting etcd limits for its database size. That effectively
-means that etcd enters "alert mode" and stops accepting all write requests.
-
-The underlying root cause is combination of:
-
-- etcd keeping both current state and transaction log with copy-on-write
-- node heartbeats being pontetially very large objects (note that images
-  are only one potential problem, the second are volumes and customers
-  want to mount 100+ volumes to a single node) - they may easily exceed 15kB;
-  even though the patch send over network is small, in etcd we store the
-	whole Node object
-- Kubelet sending heartbeats every 10s
-
-This proposal presents a proper solution for that problem.
-
-
-Note that currently (by default):
-
-- Lack of NodeStatus update for `<node-monitor-grace-period>` (default: 40s)
-  results in NodeController marking node as NotReady (pods are no longer
-  scheduled on that node)
-- Lack of NodeStatus updates for `<pod-eviction-timeout>` (default: 5m)
-  results in NodeController starting pod evictions from that node
-
-We would like to preserve that behavior.
-
-
-### Goals
-
-- Reduce size of etcd by making node heartbeats cheaper
-
-### Non-Goals
-
-The following are nice-to-haves, but not primary goals:
-
-- Reduce resource usage (cpu/memory) of control plane (e.g. due to processing
-  less and/or smaller objects)
-- Reduce watch-related load on Node objects
-
-## Proposal
-
-We propose introducing a new `Lease` built-in API in the newly create API group
-`coordination.k8s.io`. To make it easily reusable for other purposes it will
-be namespaced. Its schema will be as following:
-
-```
-type Lease struct {
-  metav1.TypeMeta `json:",inline"`
-  // Standard object's metadata.
-  // More info: https://git.k8s.io/community/contributors/devel/api-conventions.md#metadata
-  // +optional
-  ObjectMeta metav1.ObjectMeta `json:"metadata,omitempty"`
-
-  // Specification of the Lease.
-  // More info: https://git.k8s.io/community/contributors/devel/api-conventions.md#spec-and-status
-  // +optional
-  Spec LeaseSpec `json:"spec,omitempty"`
-}
-
-type LeaseSpec struct {
-  HolderIdentity       string           `json:"holderIdentity"`
-  LeaseDurationSeconds int32            `json:"leaseDurationSeconds"`
-  AcquireTime          metav1.MicroTime `json:"acquireTime"`
-  RenewTime            metav1.MicroTime `json:"renewTime"`
-  LeaseTransitions     int32            `json:"leaseTransitions"`
-}
-```
-
-The Spec is effectively of already existing (and thus proved) [LeaderElectionRecord][].
-The only difference is using `MicroTime` instead of `Time` for better precision.
-That would hopefully allow us go get directly to Beta.
-
-We will use that object to represent node heartbeat - for each Node there will
-be a corresponding `Lease` object with Name equal to Node name in a newly
-created dedicated namespace (we considered using `kube-system` namespace but
-decided that it's already too overloaded).
-That namespace should be created automatically (similarly to "default" and
-"kube-system", probably by NodeController) and never be deleted (so that nodes
-don't require permission for it).
-
-We considered using CRD instead of built-in API. However, even though CRDs are
-`the new way` for creating new APIs, they don't yet have versioning support
-and are significantly less performant (due to lack of protobuf support yet).
-We also don't know whether we could seamlessly transition storage from a CRD
-to a built-in API if we ran into a performance or any other problems.
-As a result, we decided to proceed with built-in API.
-
-
-With this new API in place, we will change Kubelet so that:
-
-1. Kubelet is periodically computing NodeStatus every 10s (at it is now), but that will
-   be independent from reporting status
-1. Kubelet is reporting NodeStatus if:
-   - there was a meaningful change in it (initially we can probably assume that every
-     change is meaningful, including e.g. images on the node)
-   - or it didn’t report it over last `node-status-update-period` seconds
-1. Kubelet creates and periodically updates its own Lease object and frequency
-   of those updates is independent from NodeStatus update frequency.
-
-In the meantime, we will change `NodeController` to treat both updates of NodeStatus
-object as well as updates of the new `Lease` object corresponding to a given
-node as healthiness signal from a given Kubelet. This will make it work for both old
-and new Kubelets.
-
-We should also:
-
-1. audit all other existing core controllers to verify if they also don’t require
-   similar changes in their logic ([ttl controller][] being one of the examples)
-1. change controller manager to auto-register that `Lease` CRD
-1. ensure that `Lease` resource is deleted when corresponding node is
-   deleted (probably via owner references)
-1. [out-of-scope] migrate all LeaderElection code to use that CRD
-
-Once all the code changes are done, we will:
-
-1. start updating `Lease` object every 10s by default, at the same time
-   reducing frequency of NodeStatus updates initially to 40s by default.
-   We will reduce it further later.
-   Note that it doesn't reduce frequency by which Kubelet sends "meaningful"
-   changes - it only impacts the frequency of "lastHeartbeatTime" changes.
-   <br> TODO: That still results in higher average QPS. It should be acceptable but
-   needs to be verified.
-1. announce that we are going to reduce frequency of NodeStatus updates further
-   and give people 1-2 releases to switch their code to use `Lease`
-   object (if they relied on frequent NodeStatus changes)
-1. further reduce NodeStatus updates frequency to not less often than once per
-   1 minute.
-   We can’t stop periodically updating NodeStatus as it would be API breaking change,
-   but it’s fine to reduce its frequency (though we should continue writing it at
-   least once per eviction period).
-
-
-To be considered:
-
-1. We may consider reducing frequency of NodeStatus updates to once every 5 minutes
-   (instead of 1 minute). That would help with performance/scalability even more.
-   Caveats:
-   - NodeProblemDetector is currently updating (some) node conditions every 1 minute
-     (unconditionally, because lastHeartbeatTime always changes). To make reduction
-     of NodeStatus updates frequency really useful, we should also change NPD to
-     work in a similar mode (check periodically if condition changes, but report only
-     when something changed or no status was reported for a given time) and decrease
-     its reporting frequency too.
-   - In general, we recommend to keep frequencies of NodeStatus reporting in both
-     Kubelet and NodeProblemDetector in sync (once all changes will be done) and
-     that should be reflected in [NPD documentation][].
-   - Note that reducing frequency to 1 minute already gives us almost 6x improvement.
-     It seems more than enough for any foreseeable future assuming we won’t
-     significantly increase the size of object Node.
-     Note that if we keep adding node conditions owned by other components, the
-     number of writes of Node object will go up. But that issue is separate from
-     that proposal.
-
-Other notes:
-
-1. Additional advantage of using Lease for that purpose would be the
-   ability to exclude it from audit profile and thus reduce the audit logs footprint.
-
-[LeaderElectionRecord]: https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/client-go/tools/leaderelection/resourcelock/interface.go#L37
-[ttl controller]: https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/ttl/ttl_controller.go#L155
-[NPD documentation]: https://kubernetes.io/docs/tasks/debug-application-cluster/monitor-node-health/
-[kubernetes/kubernetes#63667]: https://github.com/kubernetes/kubernetes/issues/63677
-
-### Risks and Mitigations
-
-Increasing default frequency of NodeStatus updates may potentially break clients
-relying on frequent Node object updates. However, in non-managed solutions, customers
-will still be able to restore previous behavior by setting appropriate flag values.
-Thus, changing defaults to what we recommend is the path to go with.
-
-## Graduation Criteria
-
-The API can be immediately promoted to Beta, as the API is effectively a copy of
-already existing LeaderElectionRecord. It will be promoted to GA once it's gone
-a sufficient amount of time as Beta with no changes.
-
-The changes in components logic (Kubelet, NodeController) should be done behind
-a feature gate. We suggest making that enabled by default once the feature is
-implemented.
-
-## Implementation History
-
-- RRRR-MM-DD: KEP Summary, Motivation and Proposal merged
-
-## Alternatives
-
-We considered a number of alternatives, most important mentioned below.
-
-### Dedicated “heartbeat” object instead of “leader election” one
-
-Instead of introducing and using “lease” object, we considered
-introducing a dedicated “heartbeat” object for that purpose. Apart from that,
-all the details about the solution remain pretty much the same.
-
-Pros:
-
-- Conceptually easier to understand what the object is for
-
-Cons:
-
-- Introduces a new, narrow-purpose API. Lease is already used by other
-  components, implemented using annotations on Endpoints and ConfigMaps.
-
-### Events instead of dedicated heartbeat object
-
-Instead of introducing a dedicated object, we considered using “Event” object
-for that purpose. At the high-level the solution looks very similar. 
-The differences from the initial proposal are:
-
-- we use existing “Event” api instead of introducing a new API
-- we create a dedicated namespace; events that should be treated as healthiness
-  signal by NodeController will be written by Kubelets (unconditionally) to that
-  namespace
-- NodeController will be watching only Events from that namespace to avoid
-  processing all events in the system (the volume of all events will be huge)
-- dedicated namespace also helps with security - we can give access to write to
-  that namespace only to Kubelets
-
-Pros:
-
-- No need to introduce new API
-   - We can use that approach much earlier due to that.
-- We already need to optimize event throughput - separate etcd instance we have
-  for them may help with tuning
-- Low-risk roll-forward/roll-back: no new objects is involved (node controller
-  starts watching events, kubelet just reduces the frequency of heartbeats)
-
-Cons:
-
-- Events are conceptually “best-effort” in the system:
-   - they may be silently dropped in case of problems in the system (the event recorder
-     library doesn’t retry on errors, e.g. to not make things worse when control-plane
-     is starved)
-   - currently, components reporting events don’t even know if it succeeded or not (the
-     library is built in a way that you throw the event into it and are not notified if
-     that was successfully submitted or not).
-     Kubelet sending any other update has full control on how/if retry errors.
-   - lack of fairness mechanisms means that even when some events are being successfully
-     send, there is no guarantee that any event from  a given Kubelet will be submitted
-     over a given time period
-	So this would require a different mechanism of reporting those “heartbeat” events.
-- Once we have “request priority” concept, I think events should have the lowest one.
-  Even though no particular heartbeat is important, guarantee that some heartbeats will
-  be successfully send it crucial (not delivering any of them will result in unnecessary
-  evictions or not-scheduling to a given node). So heartbeats should be of the highest
-  priority. OTOH, node heartbeats are one of the most important things in the system
-  (not delivering them may result in unnecessary evictions), so they should have the
-  highest priority.
-- No core component in the system is currently watching events
-   - it would make system’s operation harder to explain
-- Users watch Node objects for heartbeats (even though we didn’t recommend it).
-  Introducing a new object for the purpose of heartbeat will allow those users to
-  migrate, while using events for that purpose breaks that ability. (Watching events
-  may put us in tough situation also from performance reasons.)
-- Deleting all events (e.g. event etcd failure + playbook response) should continue to
-  not cause a catastrophic failure and the design will need to account for this.
-
-### Reuse the Component Registration mechanisms
-
-Kubelet is one of control-place components (shared controller). Some time ago, Component
-Registration proposal converged into three parts:
-
-- Introducing an API for registering non-pod endpoints, including readiness information: #18610
-- Changing endpoints controller to also watch those endpoints
-- Identifying some of those endpoints as “components”
-
-We could reuse that mechanism to represent Kubelets as non-pod endpoint API.
-
-Pros:
-
-- Utilizes desired API
-
-Cons:
-
-- Requires introducing that new API
-- Stabilizing the API would take some time
-- Implementing that API requires multiple changes in different components
-
-### Split Node object into two parts at etcd level
-
-We may stick to existing Node API and solve the problem at storage layer. At the
-high level, this means splitting the Node object into two parts in etcd (frequently
-modified one and the rest).
-
-Pros:
-
-- No need to introduce new API
-- No need to change any components other than kube-apiserver
-
-Cons:
-
-- Very complicated to support watch
-- Not very generic (e.g. splitting Spec and Status doesn’t help, it needs to be just
-  heartbeat part)
-- [minor] Doesn’t reduce amount of data that should be processed in the system (writes,
-  reads, watches, …)
-
-### Delta compression in etcd
-
-An alternative for the above can be solving this completely at the etcd layer. To
-achieve that, instead of storing full updates in etcd transaction log, we will just
-store “deltas” and snapshot the whole object only every X seconds/minutes.
-
-Pros:
-
-- Doesn’t require any changes to any Kubernetes components
-
-Cons:
-
-- Computing delta is tricky (etcd doesn’t understand Kubernetes data model, and
-  delta between two protobuf-encoded objects is not necessary small)
-- May require a major rewrite of etcd code and not even be accepted by its maintainers
-- More expensive computationally to get an object in a given resource version (which
-  is what e.g. watch is doing)
-
-### Replace etcd with other database
-
-Instead of using etcd, we may also consider using some other open-source solution.
-
-Pros:
-
-- Doesn’t require new API
-
-Cons:
-
-- We don’t even know if there exists solution that solves our problems and can be used.
-- Migration will take us years.
+KEPs have moved to https://git.k8s.io/enhancements/.
+<!--
+This file is a placeholder to preserve links. Please remove after 6 months or the release of Kubernetes 1.15, whichever comes first.
+-->
+\ No newline at end of file
diff --git a/keps/sig-node/0014-runtime-class.md b/keps/sig-node/0014-runtime-class.md
index 1d1cac28..cfd1f5fa 100644
--- a/keps/sig-node/0014-runtime-class.md
+++ b/keps/sig-node/0014-runtime-class.md
@@ -1,399 +1,4 @@
----
-kep-number: 14
-title: Runtime Class
-authors:
-  - "@tallclair"
-owning-sig: sig-node
-participating-sigs:
-  - sig-architecture
-reviewers:
-  - dchen1107
-  - derekwaynecarr
-  - yujuhong
-approvers:
-  - dchen1107
-  - derekwaynecarr
-creation-date: 2018-06-19
-status: implementable
----
-
-# Runtime Class
-
-## Table of Contents
-
-* [Summary](#summary)
-* [Motivation](#motivation)
-  * [Goals](#goals)
-  * [Non\-Goals](#non-goals)
-  * [User Stories](#user-stories)
-* [Proposal](#proposal)
-  * [API](#api)
-    * [Runtime Handler](#runtime-handler)
-  * [Versioning, Updates, and Rollouts](#versioning-updates-and-rollouts)
-  * [Implementation Details](#implementation-details)
-  * [Risks and Mitigations](#risks-and-mitigations)
-* [Graduation Criteria](#graduation-criteria)
-* [Implementation History](#implementation-history)
-* [Appendix](#appendix)
-  * [Examples of runtime variation](#examples-of-runtime-variation)
-
-## Summary
-
-`RuntimeClass` is a new cluster-scoped resource that surfaces container runtime properties to the
-control plane. RuntimeClasses are assigned to pods through a `runtimeClass` field on the
-`PodSpec`. This provides a new mechanism for supporting multiple runtimes in a cluster and/or node.
-
-## Motivation
-
-There is growing interest in using different runtimes within a cluster. [Sandboxes][] are the
-primary motivator for this right now, with both Kata containers and gVisor looking to integrate with
-Kubernetes. Other runtime models such as Windows containers or even remote runtimes will also
-require support in the future. RuntimeClass provides a way to select between different runtimes
-configured in the cluster and surface their properties (both to the cluster & the user).
-
-In addition to selecting the runtime to use, supporting multiple runtimes raises other problems to
-the control plane level, including: accounting for runtime overhead, scheduling to nodes that
-support the runtime, and surfacing which optional features are supported by different
-runtimes. Although these problems are not tackled by this initial proposal, RuntimeClass provides a
-cluster-scoped resource tied to the runtime that can help solve these problems in a future update.
-
-[Sandboxes]: https://docs.google.com/document/d/1QQ5u1RBDLXWvC8K3pscTtTRThsOeBSts_imYEoRyw8A/edit
-
-### Goals
-
-- Provide a mechanism for surfacing container runtime properties to the control plane
-- Support multiple runtimes per-cluster, and provide a mechanism for users to select the desired
-  runtime
-
-### Non-Goals
-
-- RuntimeClass is NOT RuntimeComponentConfig.
-- RuntimeClass is NOT a general policy mechanism.
-- RuntimeClass is NOT "NodeClass". Although different nodes may run different runtimes, in general
-  RuntimeClass should not be a cross product of runtime properties and node properties.
-
-The following goals are out-of-scope for the initial implementation, but may be explored in a future
-iteration:
-
-- Surfacing support for optional features by runtimes, and surfacing errors caused by
-  incompatible features & runtimes earlier.
-- Automatic runtime or feature discovery - initially RuntimeClasses are manually defined (by the
-  cluster admin or provider), and are asserted to be an accurate representation of the runtime.
-- Scheduling in heterogeneous clusters - it is possible to operate a heterogeneous cluster
-  (different runtime configurations on different nodes) through scheduling primitives like
-  `NodeAffinity` and `Taints+Tolerations`, but the user is responsible for setting these up and
-  automatic runtime-aware scheduling is out-of-scope.
-- Define standardized or conformant runtime classes - although I would like to declare some
-  predefined RuntimeClasses with specific properties, doing so is out-of-scope for this initial KEP.
-- [Pod Overhead][] - Although RuntimeClass is likely to be the configuration mechanism of choice,
-  the details of how pod resource overhead will be implemented is out of scope for this KEP.
-- Provide a mechanism to dynamically register or provision additional runtimes.
-- Requiring specific RuntimeClasses according to policy. This should be addressed by other
-  cluster-level policy mechanisms, such as PodSecurityPolicy.
-- "Fitting" a RuntimeClass to pod requirements - In other words, specifying runtime properties and
-  letting the system match an appropriate RuntimeClass, rather than explicitly assigning a
-  RuntimeClass by name. This approach can increase portability, but can be added seamlessly in a
-  future iteration.
-
-[Pod Overhead]: https://docs.google.com/document/d/1EJKT4gyl58-kzt2bnwkv08MIUZ6lkDpXcxkHqCvvAp4/edit
-
-### User Stories
-
-- As a cluster operator, I want to provide multiple runtime options to support a wide variety of
-  workloads. Examples include native linux containers, "sandboxed" containers, and windows
-  containers.
-- As a cluster operator, I want to provide stable rolling upgrades of runtimes. For
-  example, rolling out an update with backwards incompatible changes or previously unsupported
-  features.
-- As an application developer, I want to select the runtime that best fits my workload.
-- As an application developer, I don't want to study the nitty-gritty details of different runtime
-  implementations, but rather choose from pre-configured classes.
-- As an application developer, I want my application to be portable across clusters that use similar
-  but different variants of a "class" of runtimes.
-
-## Proposal
-
-The initial design includes:
-
-- `RuntimeClass` API resource definition
-- `RuntimeClass` pod field for specifying the RuntimeClass the pod should be run with
-- Kubelet implementation for fetching & interpreting the RuntimeClass
-- CRI API & implementation for passing along the [RuntimeHandler](#runtime-handler).
-
-### API
-
-`RuntimeClass` is a new cluster-scoped resource in the `node.k8s.io` API group.
-
-> _The `node.k8s.io` API group would eventually hold the Node resource when `core` is retired.
-> Alternatives considered: `runtime.k8s.io`, `cluster.k8s.io`_
-
-_(This is a simplified declaration, syntactic details will be covered in the API PR review)_
-
-```go
-type RuntimeClass struct {
-    metav1.TypeMeta
-    // ObjectMeta minimally includes the RuntimeClass name, which is used to reference the class.
-    // Namespace should be left blank.
-    metav1.ObjectMeta
-
-    Spec RuntimeClassSpec
-}
-
-type RuntimeClassSpec struct {
-    // RuntimeHandler specifies the underlying runtime the CRI calls to handle pod and/or container
-    // creation. The possible values are specific to a given configuration & CRI implementation.
-    // The empty string is equivalent to the default behavior.
-    // +optional
-    RuntimeHandler string
-}
-```
-
-The runtime is selected by the pod by specifying the RuntimeClass in the PodSpec. Once the pod is
-scheduled, the RuntimeClass cannot be changed.
-
-```go
-type PodSpec struct {
-    ...
-    // RuntimeClassName refers to a RuntimeClass object with the same name,
-    // which should be used to run this pod.
-    // +optional
-    RuntimeClassName string
-    ...
-}
-```
-
-The `legacy` RuntimeClass name is reserved. The legacy RuntimeClass is defined to be fully backwards
-compatible with current Kubernetes. This means that the legacy runtime does not specify any
-RuntimeHandler or perform any feature validation (all features are "supported").
-
-```go
-const (
-    // RuntimeClassNameLegacy is a reserved RuntimeClass name. The legacy
-    // RuntimeClass does not specify a runtime handler or perform any
-    // feature validation.
-    RuntimeClassNameLegacy = "legacy"
-)
-```
-
-An unspecified RuntimeClassName `""` is equivalent to the `legacy` RuntimeClass, though the field is
-not defaulted to `legacy` (to leave room for configurable defaults in a future update).
-
-#### Examples
-
-Suppose we operate a cluster that lets users choose between native runc containers, and gvisor and
-kata-container sandboxes. We might create the following runtime classes:
-
-```yaml
-kind: RuntimeClass
-apiVersion: node.k8s.io/v1alpha1
-metadata:
-    name: native  # equivalent to 'legacy' for now
-spec:
-    runtimeHandler: runc
----
-kind: RuntimeClass
-apiVersion: node.k8s.io/v1alpha1
-metadata:
-    name: gvisor
-spec:
-    runtimeHandler: gvisor
-----
-kind: RuntimeClass
-apiVersion: node.k8s.io/v1alpha1
-metadata:
-    name: kata-containers
-spec:
-    runtimeHandler: kata-containers
-----
-# provides the default sandbox runtime when users don't care about which they're getting.
-kind: RuntimeClass
-apiVersion: node.k8s.io/v1alpha1
-metadata:
-  name: sandboxed
-spec:
-  runtimeHandler: gvisor
-```
-
-Then when a user creates a workload, they can choose the desired runtime class to use (or not, if
-they want the default).
-
-```yaml
-apiVersion: extensions/v1beta1
-kind: Deployment
-metadata:
-  name: sandboxed-nginx
-spec:
-  replicas: 2
-  selector:
-    matchLabels:
-      app: sandboxed-nginx
-  template:
-    metadata:
-      labels:
-        app: sandboxed-nginx
-    spec:
-      runtimeClassName: sandboxed   #   <----  Reference the desired RuntimeClass
-      containers:
-      - name: nginx
-        image: nginx
-        ports:
-        - containerPort: 80
-          protocol: TCP
-```
-
-#### Runtime Handler
-
-The `RuntimeHandler` is passed to the CRI as part of the `RunPodSandboxRequest`:
-
-```proto
-message RunPodSandboxRequest {
-    // Configuration for creating a PodSandbox.
-    PodSandboxConfig config = 1;
-    // Named runtime configuration to use for this PodSandbox.
-    string RuntimeHandler = 2;
-}
-```
-
-The RuntimeHandler is provided as a mechanism for CRI implementations to select between different
-predetermined configurations. The initial use case is replacing the experimental pod annotations
-currently used for selecting a sandboxed runtime by various CRI implementations:
-
-| CRI Runtime | Pod Annotation                                              |
-| ------------|-------------------------------------------------------------|
-| CRIO        | io.kubernetes.cri-o.TrustedSandbox: "false"                 |
-| containerd  | io.kubernetes.cri.untrusted-workload: "true"                |
-| frakti      | runtime.frakti.alpha.kubernetes.io/OSContainer: "true"<br>runtime.frakti.alpha.kubernetes.io/Unikernel: "true" |
-| windows     | experimental.windows.kubernetes.io/isolation-type: "hyperv" |
-
-These implementations could stick with scheme ("trusted" and "untrusted"), but the preferred
-approach is a non-binary one wherein arbitrary handlers can be configured with a name that can be
-matched against the specified RuntimeHandler. For example, containerd might have a configuration
-corresponding to a "kata-runtime" handler:
-
-```
-[plugins.cri.containerd.kata-runtime]
-    runtime_type = "io.containerd.runtime.v1.linux"
-    runtime_engine = "/opt/kata/bin/kata-runtime"
-    runtime_root = ""
-```
-
-This non-binary approach is more flexible: it can still map to a binary RuntimeClass selection
-(e.g. `sandboxed` or `untrusted` RuntimeClasses), but can also support multiple parallel sandbox
-types (e.g. `kata-containers` or `gvisor` RuntimeClasses).
-
-### Versioning, Updates, and Rollouts
-
-Getting upgrades and rollouts right is a very nuanced and complicated problem. For the initial alpha
-implementation, we will kick the can down the road by making the `RuntimeClassSpec` **immutable**,
-thereby requiring changes to be pushed as a newly named RuntimeClass instance. This means that pods
-must be updated to reference the new RuntimeClass, and comes with the advantage of native support
-for rolling updates through the same mechanisms as any other application update. The
-`RuntimeClassName` pod field is also immutable post scheduling.
-
-This conservative approach is preferred since it's much easier to relax constraints in a backwards
-compatible way than tighten them. We should revisit this decision prior to graduating RuntimeClass
-to beta.
-
-### Implementation Details
-
-The Kubelet uses an Informer to keep a local cache of all RuntimeClass objects. When a new pod is
-added, the Kubelet resolves the Pod's RuntimeClass against the local RuntimeClass cache.  Once
-resolved, the RuntimeHandler field is passed to the CRI as part of the
-[`RunPodSandboxRequest`][runpodsandbox]. At that point, the interpretation of the RuntimeHandler is
-left to the CRI implementation, but it should be cached if needed for subsequent calls.
-
-If the RuntimeClass cannot be resolved (e.g. doesn't exist) at Pod creation, then the request will
-be rejected in admission (controller to be detailed in a following update). If the RuntimeClass
-cannot be resolved by the Kubelet when `RunPodSandbox` should be called, then the Kubelet will fail
-the Pod. The admission check on a replica recreation will prevent the scheduler from thrashing. If
-the `RuntimeHandler` is not recognized by the CRI implementation, then `RunPodSandbox` will return
-an error.
-
-[runpodsandbox]: https://github.com/kubernetes/kubernetes/blob/b05a61e299777c2030fbcf27a396aff21b35f01b/pkg/kubelet/apis/cri/runtime/v1alpha2/api.proto#L344
-
-### Risks and Mitigations
-
-**Scope creep.** RuntimeClass has a fairly broad charter, but it should not become a default
-dumping ground for every new feature exposed by the node. For each feature, careful consideration
-should be made about whether it belongs on the Pod, Node, RuntimeClass, or some other resource. The
-[non-goals](#non-goals) should be kept in mind when considering RuntimeClass features.
-
-**Becoming a general policy mechanism.** RuntimeClass should not be used a replacement for
-PodSecurityPolicy. The use cases for defining multiple RuntimeClasses for the same underlying
-runtime implementation should be extremely limited (generally only around updates & rollouts). To
-enforce this, no authorization or restrictions are placed directly on RuntimeClass use; in order to
-restrict a user to a specific RuntimeClass, you must use another policy mechanism such as
-PodSecurityPolicy.
-
-**Pushing complexity to the user.** RuntimeClass is a new resource in order to hide the complexity
-of runtime configuration from most users (aside from the cluster admin or provisioner). However, we
-are still side-stepping the issue of precisely defining specific types of runtimes like
-"Sandboxed". However, it is still up for debate whether precisely defining such runtime categories
-is even possible. RuntimeClass allows us to decouple this specification from the implementation, but
-it is still something I hope we can address in a future iteration through the concept of pre-defined
-or "conformant" RuntimeClasses.
-
-**Non-portability.** We are already in a world of non-portability for many features (see [examples
-of runtime variation](#examples-of-runtime-variation). Future improvements to RuntimeClass can help
-address this issue by formally declaring supported features, or matching the runtime that supports a
-given workload automitaclly. Another issue is that pods need to refer to a RuntimeClass by name,
-which may not be defined in every cluster. This is something that can be addressed through
-pre-defined runtime classes (see previous risk), and/or by "fitting" pod requirements to compatible
-RuntimeClasses.
-
-## Graduation Criteria
-
-Alpha:
-
-- Everything described in the current proposal:
-  - Introduce the RuntimeClass API resource
-  - Add a RuntimeClassName field to the PodSpec
-  - Add a RuntimeHandler field to the CRI `RunPodSandboxRequest`
-  - Lookup the RuntimeClass for pods & plumb through the RuntimeHandler in the Kubelet (feature
-    gated)
-- RuntimeClass support in at least one CRI runtime & dockershim
-  - Runtime Handlers can be statically configured by the runtime, and referenced via RuntimeClass
-  - An error is reported when the handler or is unknown or unsupported
-- Testing
-  - [CRI validation tests][cri-validation]
-  - Kubernetes E2E tests (only validating single runtime handler cases)
-
-[cri-validation]: https://github.com/kubernetes-sigs/cri-tools/blob/master/docs/validation.md
-
-Beta:
-
-- Most runtimes support RuntimeClass, and the current [untrusted annotations](#runtime-handler) are
-  deprecated.
-- RuntimeClasses are configured in the E2E environment with test coverage of a non-legacy RuntimeClass
-- The update & upgrade story is revisited, and a longer-term approach is implemented as necessary.
-- The cluster admin can choose which RuntimeClass is the default in a cluster.
-- Additional requirements TBD
-
-## Implementation History
-
-- 2018-06-11: SIG-Node decision to move forward with proposal
-- 2018-06-19: Initial KEP published.
-
-## Appendix
-
-### Examples of runtime variation
-
-- Linux Security Module (LSM) choice - Kubernetes supports both AppArmor & SELinux options on pods,
-  but those are mutually exclusive, and support of either is not required by the runtime. The
-  default configuration is also not well defined.
-- Seccomp-bpf - Kubernetes has alpha support for specifying a seccomp profile, but the default is
-  defined by the runtime, and support is not guaranteed.
-- Windows containers - isolation features are very OS-specific, and most of the current features are
-  limited to linux. As we build out Windows container support, we'll need to add windows-specific
-  features as well.
-- Host namespaces (Network,PID,IPC) may not be supported by virtualization-based runtimes
-  (e.g. Kata-containers & gVisor).
-- Per-pod and Per-container resource overhead varies by runtime.
-- Device support (e.g. GPUs) varies wildly by runtime & nodes.
-- Supported volume types varies by node - it remains TBD whether this information belongs in
-  RuntimeClass.
-- The list of default capabilities is defined in Docker, but not Kubernetes. Future runtimes may
-  have differing defaults, or support a subset of capabilities.
-- `Privileged` mode is not well defined, and thus may have differing implementations.
-- Support for resource over-commit and dynamic resource sizing (e.g. Burstable vs Guaranteed
-  workloads)
+KEPs have moved to https://git.k8s.io/enhancements/.
+<!--
+This file is a placeholder to preserve links. Please remove after 6 months or the release of Kubernetes 1.15, whichever comes first.
+-->
+\ No newline at end of file
diff --git a/keps/sig-node/0030-20180906-quotas-for-ephemeral-storage.md b/keps/sig-node/0030-20180906-quotas-for-ephemeral-storage.md
index a6c5aaba..cfd1f5fa 100644
--- a/keps/sig-node/0030-20180906-quotas-for-ephemeral-storage.md
+++ b/keps/sig-node/0030-20180906-quotas-for-ephemeral-storage.md
@@ -1,807 +1,4 @@
----
-kep-number: 0
-title: Quotas for Ephemeral Storage
-authors:
-  - "@RobertKrawitz"
-owning-sig: sig-xxx
-participating-sigs:
-  - sig-node
-reviewers:
-  - TBD
-approvers:
-  - "@dchen1107"
-  - "@derekwaynecarr"
-editor: TBD
-creation-date: yyyy-mm-dd
-last-updated: yyyy-mm-dd
-status: provisional
-see-also:
-replaces:
-superseded-by:
----
-
-# Quotas for Ephemeral Storage
-
-## Table of Contents
-<!-- markdown-toc start - Don't edit this section. Run M-x markdown-toc-generate-toc again -->
-**Table of Contents**
-
-- [Quotas for Ephemeral Storage](#quotas-for-ephemeral-storage)
-    - [Table of Contents](#table-of-contents)
-    - [Summary](#summary)
-        - [Project Quotas](#project-quotas)
-    - [Motivation](#motivation)
-        - [Goals](#goals)
-        - [Non-Goals](#non-goals)
-        - [Future Work](#future-work)
-    - [Proposal](#proposal)
-        - [Control over Use of Quotas](#control-over-use-of-quotas)
-        - [Operation Flow -- Applying a Quota](#operation-flow----applying-a-quota)
-        - [Operation Flow -- Retrieving Storage Consumption](#operation-flow----retrieving-storage-consumption)
-        - [Operation Flow -- Removing a Quota.](#operation-flow----removing-a-quota)
-        - [Operation Notes](#operation-notes)
-            - [Selecting a Project ID](#selecting-a-project-id)
-            - [Determine Whether a Project ID Applies To a Directory](#determine-whether-a-project-id-applies-to-a-directory)
-            - [Return a Project ID To the System](#return-a-project-id-to-the-system)
-        - [Implementation Details/Notes/Constraints [optional]](#implementation-detailsnotesconstraints-optional)
-            - [Notes on Implementation](#notes-on-implementation)
-            - [Notes on Code Changes](#notes-on-code-changes)
-            - [Testing Strategy](#testing-strategy)
-        - [Risks and Mitigations](#risks-and-mitigations)
-    - [Graduation Criteria](#graduation-criteria)
-    - [Implementation History](#implementation-history)
-    - [Drawbacks [optional]](#drawbacks-optional)
-    - [Alternatives [optional]](#alternatives-optional)
-        - [Alternative quota-based implementation](#alternative-quota-based-implementation)
-        - [Alternative loop filesystem-based implementation](#alternative-loop-filesystem-based-implementation)
-    - [Infrastructure Needed [optional]](#infrastructure-needed-optional)
-    - [References](#references)
-        - [Bugs Opened Against Filesystem Quotas](#bugs-opened-against-filesystem-quotas)
-            - [CVE](#cve)
-            - [Other Security Issues Without CVE](#other-security-issues-without-cve)
-        - [Other Linux Quota-Related Bugs Since 2012](#other-linux-quota-related-bugs-since-2012)
-
-<!-- markdown-toc end -->
-
-[Tools for generating]: https://github.com/ekalinin/github-markdown-toc
-
-## Summary
-
-This proposal applies to the use of quotas for ephemeral-storage
-metrics gathering.  Use of quotas for ephemeral-storage limit
-enforcement is a [non-goal](#non-goals), but as the architecture and
-code will be very similar, there are comments interspersed related to
-enforcement.  _These comments will be italicized_.
-
-Local storage capacity isolation, aka ephemeral-storage, was
-introduced into Kubernetes via
-<https://github.com/kubernetes/features/issues/361>.  It provides
-support for capacity isolation of shared storage between pods, such
-that a pod can be limited in its consumption of shared resources and
-can be evicted if its consumption of shared storage exceeds that
-limit.  The limits and requests for shared ephemeral-storage are
-similar to those for memory and CPU consumption.
-
-The current mechanism relies on periodically walking each ephemeral
-volume (emptydir, logdir, or container writable layer) and summing the
-space consumption.  This method is slow, can be fooled, and has high
-latency (i. e. a pod could consume a lot of storage prior to the
-kubelet being aware of its overage and terminating it).
-
-The mechanism proposed here utilizes filesystem project quotas to
-provide monitoring of resource consumption _and optionally enforcement
-of limits._  Project quotas, initially in XFS and more recently ported
-to ext4fs, offer a kernel-based means of monitoring _and restricting_
-filesystem consumption that can be applied to one or more directories.
-
-A prototype is in progress; see <https://github.com/kubernetes/kubernetes/pull/66928>.
-
-### Project Quotas
-
-Project quotas are a form of filesystem quota that apply to arbitrary
-groups of files, as opposed to file user or group ownership.  They
-were first implemented in XFS, as described here:
-<http://xfs.org/docs/xfsdocs-xml-dev/XFS_User_Guide/tmp/en-US/html/xfs-quotas.html>.
-
-Project quotas for ext4fs were [proposed in late
-2014](https://lwn.net/Articles/623835/) and added to the Linux kernel
-in early 2016, with
-commit
-[391f2a16b74b95da2f05a607f53213fc8ed24b8e](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=391f2a16b74b95da2f05a607f53213fc8ed24b8e).
-They were designed to be compatible with XFS project quotas.
-
-Each inode contains a 32-bit project ID, to which optionally quotas
-(hard and soft limits for blocks and inodes) may be applied.  The
-total blocks and inodes for all files with the given project ID are
-maintained by the kernel.  Project quotas can be managed from
-userspace by means of the `xfs_quota(8)` command in foreign filesystem
-(`-f`) mode; the traditional Linux quota tools do not manipulate
-project quotas.  Programmatically, they are managed by the `quotactl(2)`
-system call, using in part the standard quota commands and in part the
-XFS quota commands; the man page implies incorrectly that the XFS
-quota commands apply only to XFS filesystems.
-
-The project ID applied to a directory is inherited by files created
-under it.  Files cannot be (hard) linked across directories with
-different project IDs.  A file's project ID cannot be changed by a
-non-privileged user, but a privileged user may use the `xfs_io(8)`
-command to change the project ID of a file.
-
-Filesystems using project quotas may be mounted with quotas either
-enforced or not; the non-enforcing mode tracks usage without enforcing
-it.  A non-enforcing project quota may be implemented on a filesystem
-mounted with enforcing quotas by setting a quota too large to be hit.
-The maximum size that can be set varies with the filesystem; on a
-64-bit filesystem it is 2^63-1 bytes for XFS and 2^58-1 bytes for
-ext4fs.
-
-Conventionally, project quota mappings are stored in `/etc/projects` and
-`/etc/projid`; these files exist for user convenience and do not have
-any direct importance to the kernel.  `/etc/projects` contains a mapping
-from project ID to directory/file; this can be a one to many mapping
-(the same project ID can apply to multiple directories or files, but
-any given directory/file can be assigned only one project ID).
-`/etc/projid` contains a mapping from named projects to project IDs.
-
-This proposal utilizes hard project quotas for both monitoring _and
-enforcement_.  Soft quotas are of no utility; they allow for temporary
-overage that, after a programmable period of time, is converted to the
-hard quota limit.
-
-
-## Motivation
-
-The mechanism presently used to monitor storage consumption involves
-use of `du` and `find` to periodically gather information about
-storage and inode consumption of volumes.  This mechanism suffers from
-a number of drawbacks:
-
-* It is slow.  If a volume contains a large number of files, walking
-  the directory can take a significant amount of time.  There has been
-  at least one known report of nodes becoming not ready due to volume
-  metrics: <https://github.com/kubernetes/kubernetes/issues/62917>
-* It is possible to conceal a file from the walker by creating it and
-  removing it while holding an open file descriptor on it.  POSIX
-  behavior is to not remove the file until the last open file
-  descriptor pointing to it is removed.  This has legitimate uses; it
-  ensures that a temporary file is deleted when the processes using it
-  exit, and it minimizes the attack surface by not having a file that
-  can be found by an attacker.  The following pod does this; it will
-  never be caught by the present mechanism:
-
-```yaml
-apiVersion: v1
-kind: Pod
-max:
-metadata:
-  name: "diskhog"
-spec:
-  containers:
-  - name: "perl"
-    resources:
-      limits:
-        ephemeral-storage: "2048Ki"
-    image: "perl"
-    command:
-    - perl
-    - -e
-    - >
-      my $file = "/data/a/a"; open OUT, ">$file" or die "Cannot open $file: $!\n"; unlink "$file" or die "cannot unlink $file: $!\n"; my $a="0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789"; foreach my $i (0..200000000) { print OUT $a; }; sleep 999999
-    volumeMounts:
-    - name: a
-      mountPath: /data/a
-  volumes:
-  - name: a
-    emptyDir: {}
-```
-* It is reactive rather than proactive.  It does not prevent a pod
-  from overshooting its limit; at best it catches it after the fact.
-  On a fast storage medium, such as NVMe, a pod may write 50 GB or
-  more of data before the housekeeping performed once per minute
-  catches up to it.  If the primary volume is the root partition, this
-  will completely fill the partition, possibly causing serious
-  problems elsewhere on the system.  This proposal does not address
-  this issue; _a future enforcing project would_.
-
-In many environments, these issues may not matter, but shared
-multi-tenant environments need these issues addressed.
-
-### Goals
-
-These goals apply only to local ephemeral storage, as described in
-<https://github.com/kubernetes/features/issues/361>.
-
-* Primary: improve performance of monitoring by using project quotas
-  in a non-enforcing way to collect information about storage
-  utilization of ephemeral volumes.
-* Primary: detect storage used by pods that is concealed by deleted
-  files being held open.
-* Primary: this will not interfere with the more common user and group
-  quotas.
-
-### Non-Goals
-
-* Application to storage other than local ephemeral storage.
-* Application to container copy on write layers.  That will be managed
-  by the container runtime.  For a future project, we should work with
-  the runtimes to use quotas for their monitoring.
-* Elimination of eviction as a means of enforcing ephemeral-storage
-  limits.  Pods that hit their ephemeral-storage limit will still be
-  evicted by the kubelet even if their storage has been capped by
-  enforcing quotas.
-* Enforcing node allocatable (limit over the sum of all pod's disk
-  usage, including e. g. images).
-* Enforcing limits on total pod storage consumption by any means, such
-  that the pod would be hard restricted to the desired storage limit.
-  
-### Future Work
-
-* _Enforce limits on per-volume storage consumption by using
-  enforced project quotas._
-
-## Proposal
-
-This proposal applies project quotas to emptydir volumes on qualifying
-filesystems (ext4fs and xfs with project quotas enabled).  Project
-quotas are applied by selecting an unused project ID (a 32-bit
-unsigned integer), setting a limit on space and/or inode consumption,
-and attaching the ID to one or more files.  By default (and as
-utilized herein), if a project ID is attached to a directory, it is
-inherited by any files created under that directory.
-
-_If we elect to use the quota as enforcing, we impose a quota
-consistent with the desired limit._  If we elect to use it as
-non-enforcing, we impose a large quota that in practice cannot be
-exceeded (2^63-1 bytes for XFS, 2^58-1 bytes for ext4fs).
-
-### Control over Use of Quotas
-
-At present, two feature gates control operation of quotas:
-
-* `LocalStorageCapacityIsolation` must be enabled for any use of
-  quotas.
-  
-* `LocalStorageCapacityIsolationFSMonitoring` must be enabled in addition.  If this is
-  enabled, quotas are used for monitoring, but not enforcement.  At
-  present, this defaults to False, but the intention is that this will
-  default to True by initial release.
-  
-* _`LocalStorageCapacityIsolationFSEnforcement` must be enabled, in addition to
-  `LocalStorageCapacityIsolationFSMonitoring`, to use quotas for enforcement._
-
-### Operation Flow -- Applying a Quota
-
-* Caller (emptydir volume manager or container runtime) creates an
-  emptydir volume, with an empty directory at a location of its
-  choice.
-* Caller requests that a quota be applied to a directory.
-* Determine whether a quota can be imposed on the directory, by asking
-  each quota provider (one per filesystem type) whether it can apply a
-  quota to the directory.  If no provider claims the directory, an
-  error status is returned to the caller.
-* Select an unused project ID ([see below](#selecting-a-project-id)).
-* Set the desired limit on the project ID, in a filesystem-dependent
-  manner ([see below](#notes-on-implementation)).
-* Apply the project ID to the directory in question, in a
-  filesystem-dependent manner.
-
-An error at any point results in no quota being applied and no change
-to the state of the system.  The caller in general should not assume a
-priori that the attempt will be successful.  It could choose to reject
-a request if a quota cannot be applied, but at this time it will
-simply ignore the error and proceed as today.
-
-### Operation Flow -- Retrieving Storage Consumption
-
-* Caller (kubelet metrics code, cadvisor, container runtime) asks the
-  quota code to compute the amount of storage used under the
-  directory.
-* Determine whether a quota applies to the directory, in a
-  filesystem-dependent manner ([see below](#notes-on-implementation)).
-* If so, determine how much storage or how many inodes are utilized,
-  in a filesystem dependent manner.
-
-If the quota code is unable to retrieve the consumption, it returns an
-error status and it is up to the caller to utilize a fallback
-mechanism (such as the directory walk performed today).
-
-### Operation Flow -- Removing a Quota.
-
-* Caller requests that the quota be removed from a directory.
-* Determine whether a project quota applies to the directory.
-* Remove the limit from the project ID associated with the directory.
-* Remove the association between the directory and the project ID.
-* Return the project ID to the system to allow its use elsewhere ([see
-  below](#return-a-project-id-to-the-system)).
-* Caller may delete the directory and its contents (normally it will).
-
-### Operation Notes
-
-#### Selecting a Project ID
-
-Project IDs are a shared space within a filesystem.  If the same
-project ID is assigned to multiple directories, the space consumption
-reported by the quota will be the sum of that of all of the
-directories.  Hence, it is important to ensure that each directory is
-assigned a unique project ID (unless it is desired to pool the storage
-use of multiple directories).
-
-The canonical mechanism to record persistently that a project ID is
-reserved is to store it in the `/etc/projid` (`projid[5]`) and/or
-`/etc/projects` (`projects(5)`) files.  However, it is possible to utilize
-project IDs without recording them in those files; they exist for
-administrative convenience but neither the kernel nor the filesystem
-is aware of them.  Other ways can be used to determine whether a
-project ID is in active use on a given filesystem:
-
-* The quota values (in blocks and/or inodes) assigned to the project
-  ID are non-zero.
-* The storage consumption (in blocks and/or inodes) reported under the
-  project ID are non-zero.
-
-The algorithm to be used is as follows:
-
-* Lock this instance of the quota code against re-entrancy.
-* open and `flock()` the `/etc/project` and `/etc/projid` files, so that
-  other uses of this code are excluded.
-* Start from a high number (the prototype uses 1048577).
-* Iterate from there, performing the following tests:
-   * Is the ID reserved by this instance of the quota code?
-   * Is the ID present in `/etc/projects`?
-   * Is the ID present in `/etc/projid`?
-   * Are the quota values and/or consumption reported by the kernel
-     non-zero?  This test is restricted to 128 iterations to ensure
-     that a bug here or elsewhere does not result in an infinite loop
-     looking for a quota ID.
-* If an ID has been found:
-   * Add it to an in-memory copy of `/etc/projects` and `/etc/projid` so
-     that any other uses of project quotas do not reuse it.
-   * Write temporary copies of `/etc/projects` and `/etc/projid` that are
-     `flock()`ed
-   * If successful, rename the temporary files appropriately (if
-     rename of one succeeds but the other fails, we have a problem
-     that we cannot recover from, and the files may be inconsistent).
-* Unlock `/etc/projid` and `/etc/projects`.
-* Unlock this instance of the quota code.
-
-A minor variation of this is used if we want to reuse an existing
-quota ID.
-
-#### Determine Whether a Project ID Applies To a Directory
-
-It is possible to determine whether a directory has a project ID
-applied to it by requesting (via the `quotactl(2)` system call) the
-project ID associated with the directory.  Whie the specifics are
-filesystem-dependent, the basic method is the same for at least XFS
-and ext4fs.
-
-It is not possible to determine in constant operations the directory
-or directories to which a project ID is applied.  It is possible to
-determine whether a given project ID has been applied to an existing
-directory or files (although those will not be known); the reported
-consumption will be non-zero.
-
-The code records internally the project ID applied to a directory, but
-it cannot always rely on this.  In particular, if the kubelet has
-exited and has been restarted (and hence the quota applying to the
-directory should be removed), the map from directory to project ID is
-lost.  If it cannot find a map entry, it falls back on the approach
-discussed above.
-
-#### Return a Project ID To the System
-
-The algorithm used to return a project ID to the system is very
-similar to the algorithm used to select a project ID, except of course
-for selecting a project ID.  It performs the same sequence of locking
-`/etc/project` and `/etc/projid`, editing a copy of the file, and
-restoring it.
-
-If the project ID is applied to multiple directories and the code can
-determine that, it will not remove the project ID from `/etc/projid`
-until the last reference is removed.  While it is not anticipated in
-this KEP that this mode of operation will be used, at least initially,
-this can be detected even on kubelet restart by looking at the
-reference count in `/etc/projects`.
-
-
-### Implementation Details/Notes/Constraints [optional]
-
-#### Notes on Implementation
-
-The primary new interface defined is the quota interface in
-`pkg/volume/util/quota/quota.go`.  This defines five operations:
-
-* Does the specified directory support quotas?
-
-* Assign a quota to a directory.  If a non-empty pod UID is provided,
-  the quota assigned is that of any other directories under this pod
-  UID; if an empty pod UID is provided, a unique quota is assigned.
-  
-* Retrieve the consumption of the specified directory.  If the quota
-  code cannot handle it efficiently, it returns an error and the
-  caller falls back on existing mechanism.
-  
-* Retrieve the inode consumption of the specified directory; same
-  description as above.
-  
-* Remove quota from a directory.  If a non-empty pod UID is passed, it
-  is checked against that recorded in-memory (if any).  The quota is
-  removed from the specified directory.  This can be used even if
-  AssignQuota has not been used; it inspects the directory and removes
-  the quota from it.  This permits stale quotas from an interrupted
-  kubelet to be cleaned up.
-  
-Two implementations are provided: `quota_linux.go` (for Linux) and
-`quota_unsupported.go` (for other operating systems).  The latter
-returns an error for all requests.
-
-As the quota mechanism is intended to support multiple filesystems,
-and different filesystems require different low level code for
-manipulating quotas, a provider is supplied that finds an appropriate
-quota applier implementation for the filesystem in question.  The low
-level quota applier provides similar operations to the top level quota
-code, with two exceptions:
-
-* No operation exists to determine whether a quota can be applied
-  (that is handled by the provider).
-  
-* An additional operation is provided to determine whether a given
-  quota ID is in use within the filesystem (outside of `/etc/projects`
-  and `/etc/projid`).
-  
-The two quota providers in the initial implementation are in
-`pkg/volume/util/quota/extfs` and `pkg/volume/util/quota/xfs`.  While
-some quota operations do require different system calls, a lot of the
-code is common, and factored into
-`pkg/volume/util/quota/common/quota_linux_common_impl.go`.
-
-#### Notes on Code Changes
-
-The prototype for this project is mostly self-contained within
-`pkg/volume/util/quota` and a few changes to
-`pkg/volume/empty_dir/empty_dir.go`.  However, a few changes were
-required elsewhere:
-
-* The operation executor needs to pass the desired size limit to the
-  volume plugin where appropriate so that the volume plugin can impose
-  a quota.  The limit is passed as 0 (do not use quotas), _positive
-  number (impose an enforcing quota if possible, measured in bytes),_
-  or -1 (impose a non-enforcing quota, if possible) on the volume.
-  
-  This requires changes to
-  `pkg/volume/util/operationexecutor/operation_executor.go` (to add
-  `DesiredSizeLimit` to `VolumeToMount`),
-  `pkg/kubelet/volumemanager/cache/desired_state_of_world.go`, and
-  `pkg/kubelet/eviction/helpers.go` (the latter in order to determine
-  whether the volume is a local ephemeral one).
-  
-* The volume manager (in `pkg/volume/volume.go`) changes the
-  `Mounter.SetUp` and `Mounter.SetUpAt` interfaces to take a new
-  `MounterArgs` type rather than an `FsGroup` (`*int64`).  This is to
-  allow passing the desired size and pod UID (in the event we choose
-  to implement quotas shared between multiple volumes; [see
-  below](#alternative-quota-based-implementation)).  This required
-  small changes to all volume plugins and their tests, but will in the
-  future allow adding additional data without having to change code
-  other than that which uses the new information.
-  
-#### Testing Strategy
-
-The quota code is by an large not very amendable to unit tests.  While
-there are simple unit tests for parsing the mounts file, and there
-could be tests for parsing the projects and projid files, the real
-work (and risk) involves interactions with the kernel and with
-multiple instances of this code (e. g. in the kubelet and the runtime
-manager, particularly under stress).  It also requires setup in the
-form of a prepared filesystem.  It would be better served by
-appropriate end to end tests.
-
-### Risks and Mitigations
-
-* The SIG raised the possibility of a container being unable to exit
-  should we enforce quotas, and the quota interferes with writing the
-  log.  This can be mitigated by either not applying a quota to the
-  log directory and using the du mechanism, or by applying a separate
-  non-enforcing quota to the log directory.
-
-  As log directories are write-only by the container, and consumption
-  can be limited by other means (as the log is filtered by the
-  runtime), I do not consider the ability to write uncapped to the log
-  to be a serious exposure.
-
-  Note in addition that even without quotas it is possible for writes
-  to fail due to lack of filesystem space, which is effectively (and
-  in some cases operationally) indistinguishable from exceeding quota,
-  so even at present code must be able to handle those situations.
-  
-* Filesystem quotas may impact performance to an unknown degree.
-  Information on that is hard to come by in general, and one of the
-  reasons for using quotas is indeed to improve performance.  If this
-  is a problem in the field, merely turning off quotas (or selectively
-  disabling project quotas) on the filesystem in question will avoid
-  the problem.  Against the possibility that cannot be done
-  (because project quotas are needed for other purposes), we should
-  provide a way to disable use of quotas altogether via a feature
-  gate.
-  
-  A report <https://blog.pythonanywhere.com/110/> notes that an
-  unclean shutdown on Linux kernel versions between 3.11 and 3.17 can
-  result in a prolonged downtime while quota information is restored.
-  Unfortunately, [the link referenced
-  here](http://oss.sgi.com/pipermail/xfs/2015-March/040879.html) is no
-  longer available.
-
-* Bugs in the quota code could result in a variety of regression
-  behavior.  For example, if a quota is incorrectly applied it could
-  result in ability to write no data at all to the volume.  This could
-  be mitigated by use of non-enforcing quotas.  XFS in particular
-  offers the `pqnoenforce` mount option that makes all quotas
-  non-enforcing.
-
-
-## Graduation Criteria
-
-How will we know that this has succeeded?  Gathering user feedback is
-crucial for building high quality experiences and SIGs have the
-important responsibility of setting milestones for stability and
-completeness.  Hopefully the content previously contained in [umbrella
-issues][] will be tracked in the `Graduation Criteria` section.
-
-[umbrella issues]: N/A
-
-## Implementation History
-
-Major milestones in the life cycle of a KEP should be tracked in
-`Implementation History`.  Major milestones might include
-
-- the `Summary` and `Motivation` sections being merged signaling SIG
-  acceptance
-- the `Proposal` section being merged signaling agreement on a
-  proposed design
-- the date implementation started
-- the first Kubernetes release where an initial version of the KEP was
-  available
-- the version of Kubernetes where the KEP graduated to general
-  availability
-- when the KEP was retired or superseded
-
-## Drawbacks [optional]
-
-* Use of quotas, particularly the less commonly used project quotas,
-  requires additional action on the part of the administrator.  In
-  particular:
-   * ext4fs filesystems must be created with additional options that
-     are not enabled by default:
-```
-mkfs.ext4 -O quota,project -Q usrquota,grpquota,prjquota _device_
-```
-   * An additional option (`prjquota`) must be applied in `/etc/fstab`
-   * If the root filesystem is to be quota-enabled, it must be set in
-     the grub options.
-* Use of project quotas for this purpose will preclude future use
-  within containers.
-
-## Alternatives [optional]
-
-I have considered two classes of alternatives:
-
-* Alternatives based on quotas, with different implementation
-
-* Alternatives based on loop filesystems without use of quotas
-
-### Alternative quota-based implementation
-
-Within the basic framework of using quotas to monitor and potentially
-enforce storage utilization, there are a number of possible options:
-
-* Utilize per-volume non-enforcing quotas to monitor storage (the
-  first stage of this proposal).
-
-  This mostly preserves the current behavior, but with more efficient
-  determination of storage utilization and the possibility of building
-  further on it.  The one change from current behavior is the ability
-  to detect space used by deleted files.
-
-* Utilize per-volume enforcing quotas to monitor and enforce storage
-  (the second stage of this proposal).
-
-  This allows partial enforcement of storage limits.  As local storage
-  capacity isolation works at the level of the pod, and we have no
-  control of user utilization of ephemeral volumes, we would have to
-  give each volume a quota of the full limit.  For example, if a pod
-  had a limit of 1 MB but had four ephemeral volumes mounted, it would
-  be possible for storage utilization to reach (at least temporarily)
-  4MB before being capped.
-
-* Utilize per-pod enforcing user or group quotas to enforce storage
-  consumption, and per-volume non-enforcing quotas for monitoring.
-
-  This would offer the best of both worlds: a fully capped storage
-  limit combined with efficient reporting.  However, it would require
-  each pod to run under a distinct UID or GID.  This may prevent pods
-  from using setuid or setgid or their variants, and would interfere
-  with any other use of group or user quotas within Kubernetes.
-
-* Utilize per-pod enforcing quotas to monitor and enforce storage.
-
-  This allows for full enforcement of storage limits, at the expense
-  of being able to efficiently monitor per-volume storage
-  consumption.  As there have already been reports of monitoring
-  causing trouble, I do not advise this option.
-
-  A variant of this would report (1/N) storage for each covered
-  volume, so with a pod with a 4MiB quota and 1MiB total consumption,
-  spread across 4 ephemeral volumes, each volume would report a
-  consumption of 256 KiB.  Another variant would change the API to
-  report statistics for all ephemeral volumes combined.  I do not
-  advise this option.
-
-### Alternative loop filesystem-based implementation
-
-Another way of isolating storage is to utilize filesystems of
-pre-determined size, using the loop filesystem facility within Linux.
-It is possible to create a file and run `mkfs(8)` on it, and then to
-mount that filesystem on the desired directory.  This both limits the
-storage available within that directory and enables quick retrieval of
-it via `statfs(2)`.
-
-Cleanup of such a filesystem involves unmounting it and removing the
-backing file.
-
-The backing file can be created as a sparse file, and the `discard`
-option can be used to return unused space to the system, allowing for
-thin provisioning.
-
-I conducted preliminary investigations into this.  While at first it
-appeared promising, it turned out to have multiple critical flaws:
-
-* If the filesystem is mounted without the `discard` option, it can
-  grow to the full size of the backing file, negating any possibility
-  of thin provisioning.  If the file is created dense in the first
-  place, there is never any possibility of thin provisioning without
-  use of `discard`.
-
-  If the backing file is created densely, it additionally may require
-  significant time to create if the ephemeral limit is large.
-
-* If the filesystem is mounted `nosync`, and is sparse, it is possible
-  for writes to succeed and then fail later with I/O errors when
-  synced to the backing storage.  This will lead to data corruption
-  that cannot be detected at the time of write.
-
-  This can easily be reproduced by e. g. creating a 64MB filesystem
-  and within it creating a 128MB sparse file and building a filesystem
-  on it.  When that filesystem is in turn mounted, writes to it will
-  succeed, but I/O errors will be seen in the log and the file will be
-  incomplete:
-
-```
-# mkdir /var/tmp/d1 /var/tmp/d2
-# dd if=/dev/zero of=/var/tmp/fs1 bs=4096 count=1 seek=16383
-# mkfs.ext4 /var/tmp/fs1
-# mount -o nosync -t ext4 /var/tmp/fs1 /var/tmp/d1
-# dd if=/dev/zero of=/var/tmp/d1/fs2 bs=4096 count=1 seek=32767
-# mkfs.ext4 /var/tmp/d1/fs2
-# mount -o nosync -t ext4 /var/tmp/d1/fs2 /var/tmp/d2
-# dd if=/dev/zero of=/var/tmp/d2/test bs=4096 count=24576
-  ...will normally succeed...
-# sync
-  ...fails with I/O error!...
-```
-
-* If the filesystem is mounted `sync`, all writes to it are
-  immediately committed to the backing store, and the `dd` operation
-  above fails as soon as it fills up `/var/tmp/d1`.  However,
-  performance is drastically slowed, particularly with small writes;
-  with 1K writes, I observed performance degradation in some cases
-  exceeding three orders of magnitude.
-
-  I performed a test comparing writing 64 MB to a base (partitioned)
-  filesystem, to a loop filesystem without `sync`, and a loop
-  filesystem with `sync`.  Total I/O was sufficient to run for at least
-  5 seconds in each case.  All filesystems involved were XFS.  Loop
-  filesystems were 128 MB and dense.  Times are in seconds.  The
-  erratic behavior (e. g. the 65536 case) was involved was observed
-  repeatedly, although the exact amount of time and which I/O sizes
-  were affected varied.  The underlying device was an HP EX920 1TB
-  NVMe SSD.
-
-| I/O Size | Partition | Loop w/sync | Loop w/o sync |
-| ---:     | ---:      | ---:        | ---:          |
-| 1024 | 0.104 | 0.120 | 140.390 |
-| 4096 | 0.045 | 0.077 | 21.850 |
-| 16384 | 0.045 | 0.067 | 5.550 |
-| 65536 | 0.044 | 0.061 | 20.440 |
-| 262144 | 0.043 | 0.087 | 0.545 |
-| 1048576 | 0.043 | 0.055 | 7.490 |
-| 4194304 | 0.043 | 0.053 | 0.587 |
-
-  The only potentially viable combination in my view would be a dense
-  loop filesystem without sync, but that would render any thin
-  provisioning impossible.
-
-## Infrastructure Needed [optional]
-
-* Decision: who is responsible for quota management of all volume
-  types (and especially ephemeral volumes of all types).  At present,
-  emptydir volumes are managed by the kubelet and logdirs and writable
-  layers by either the kubelet or the runtime, depending upon the
-  choice of runtime.  Beyond the specific proposal that the runtime
-  should manage quotas for volumes it creates, there are broader
-  issues that I request assistance from the SIG in addressing.
-
-* Location of the quota code.  If the quotas for different volume
-  types are to be managed by different components, each such component
-  needs access to the quota code.  The code is substantial and should
-  not be copied; it would more appropriately be vendored.
-
-## References
-
-### Bugs Opened Against Filesystem Quotas
-
-The following is a list of known security issues referencing
-filesystem quotas on Linux, and other bugs referencing filesystem
-quotas in Linux since 2012.  These bugs are not necessarily in the
-quota system.
-
-#### CVE
-
-* *CVE-2012-2133* Use-after-free vulnerability in the Linux kernel
-  before 3.3.6, when huge pages are enabled, allows local users to
-  cause a denial of service (system crash) or possibly gain privileges
-  by interacting with a hugetlbfs filesystem, as demonstrated by a
-  umount operation that triggers improper handling of quota data.
-  
-  The issue is actually related to huge pages, not quotas
-  specifically.  The demonstration of the vulnerability resulted in
-  incorrect handling of quota data.
-  
-* *CVE-2012-3417* The good_client function in rquotad (rquota_svc.c)
-  in Linux DiskQuota (aka quota) before 3.17 invokes the hosts_ctl
-  function the first time without a host name, which might allow
-  remote attackers to bypass TCP Wrappers rules in hosts.deny (related
-  to rpc.rquotad; remote attackers might be able to bypass TCP
-  Wrappers rules).
-  
-  This issue is related to remote quota handling, which is not the use
-  case for the proposal at hand.
-  
-#### Other Security Issues Without CVE
-
-* [Linux Kernel Quota Flaw Lets Local Users Exceed Quota Limits and
-  Create Large Files](https://securitytracker.com/id/1002610)
-  
-  A setuid root binary inheriting file descriptors from an
-  unprivileged user process may write to the file without respecting
-  quota limits.  If this issue is still present, it would allow a
-  setuid process to exceed any enforcing limits, but does not affect
-  the quota accounting (use of quotas for monitoring).
-  
-### Other Linux Quota-Related Bugs Since 2012
-
-* [ext4: report delalloc reserve as non-free in statfs mangled by
-  project quota](https://lore.kernel.org/patchwork/patch/884530/)
-  
-  This bug, fixed in Feb. 2018, properly accounts for reserved but not
-  committed space in project quotas.  At this point I have not
-  determined the impact of this issue.
-  
-* [XFS quota doesn't work after rebooting because of
-  crash](https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1461730)
-  
-  This bug resulted in XFS quotas not working after a crash or forced
-  reboot.  Under this proposal, Kubernetes would fall back to du for
-  monitoring should a bug of this nature manifest itself again.
-  
-* [quota can show incorrect filesystem
-  name](https://bugzilla.redhat.com/show_bug.cgi?id=1326527)
-  
-  This issue, which will not be fixed, results in the quota command
-  possibly printing an incorrect filesystem name when used on remote
-  filesystems.  It is a display issue with the quota command, not a
-  quota bug at all, and does not result in incorrect quota information
-  being reported.  As this proposal does not utilize the quota command
-  or rely on filesystem name, or currently use quotas on remote
-  filesystems, it should not be affected by this bug.
-  
-In addition, the e2fsprogs have had numerous fixes over the years.
+KEPs have moved to https://git.k8s.io/enhancements/.
+<!--
+This file is a placeholder to preserve links. Please remove after 6 months or the release of Kubernetes 1.15, whichever comes first.
+-->
+\ No newline at end of file
diff --git a/keps/sig-node/compute-device-assignment.md b/keps/sig-node/compute-device-assignment.md
index 1ce72617..cfd1f5fa 100644
--- a/keps/sig-node/compute-device-assignment.md
+++ b/keps/sig-node/compute-device-assignment.md
@@ -1,150 +1,4 @@
----
-kep-number: 18
-title: Kubelet endpoint for device assignment observation details 
-authors:
-  - "@dashpole" 
-  - "@vikaschoudhary16"
-owning-sig: sig-node
-reviewers:
-  - "@thockin"
-  - "@derekwaynecarr"
-  - "@dchen1107"
-  - "@vishh"
-approvers:
-  - "@sig-node-leads"
-editors:
-  - "@dashpole"
-  - "@vikaschoudhary16"
-creation-date: "2018-07-19"
-last-updated: "2018-07-19"
-status: provisional
----
-# Kubelet endpoint for device assignment observation details 
-
-Table of Contents
-=================
-* [Abstract](#abstract)
-* [Background](#background)
-* [Objectives](#objectives)
-* [User Journeys](#user-journeys)
-  * [Device Monitoring Agents](#device-monitoring-agents)
-* [Changes](#changes)
-* [Potential Future Improvements](#potential-future-improvements)
-* [Alternatives Considered](#alternatives-considered)
-
-## Abstract
-In this document we will discuss the motivation and code changes required for introducing a kubelet endpoint to expose device to container bindings.
-
-## Background
-[Device Monitoring](https://docs.google.com/document/d/1NYnqw-HDQ6Y3L_mk85Q3wkxDtGNWTxpsedsgw4NgWpg/edit?usp=sharing) requires external agents to be able to determine the set of devices in-use by containers and attach pod and container metadata for these devices.
-
-## Objectives
-
-* To remove current device-specific knowledge from the kubelet, such as [accellerator metrics](https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/apis/stats/v1alpha1/types.go#L229)
-* To enable future use-cases requiring device-specific knowledge to be out-of-tree
-
-## User Journeys
-
-### Device Monitoring Agents
-
-* As a _Cluster Administrator_, I provide a set of devices from various vendors in my cluster.  Each vendor independently maintains their own agent, so I run monitoring agents only for devices I provide.  Each agent adheres to to the [node monitoring guidelines](https://docs.google.com/document/d/1_CdNWIjPBqVDMvu82aJICQsSCbh2BR-y9a8uXjQm4TI/edit?usp=sharing), so I can use a compatible monitoring pipeline to collect and analyze metrics from a variety of agents, even though they are maintained by different vendors.
-* As a _Device Vendor_, I manufacture devices and I have deep domain expertise in how to run and monitor them. Because I maintain my own Device Plugin implementation, as well as Device Monitoring Agent, I can provide consumers of my devices an easy way to consume and monitor my devices without requiring open-source contributions. The Device Monitoring Agent doesn't have any dependencies on the Device Plugin, so I can decouple monitoring from device lifecycle management. My Device Monitoring Agent works by periodically querying the `/devices/<ResourceName>` endpoint to discover which devices are being used, and to get the container/pod metadata associated with the metrics:
-
-![device monitoring architecture](https://user-images.githubusercontent.com/3262098/43926483-44331496-9bdf-11e8-82a0-14b47583b103.png)
-
-
-## Changes
-
-Add a v1alpha1 Kubelet GRPC service, at `/var/lib/kubelet/pod-resources/kubelet.sock`, which returns information about the kubelet's assignment of devices to containers. It obtains this information from the internal state of the kubelet's Device Manager. The GRPC Service returns a single PodResourcesResponse, which is shown in proto below:
-```protobuf
-// PodResources is a service provided by the kubelet that provides information about the
-// node resources consumed by pods and containers on the node
-service PodResources {
-    rpc List(ListPodResourcesRequest) returns (ListPodResourcesResponse) {}
-}
-
-// ListPodResourcesRequest is the request made to the PodResources service
-message ListPodResourcesRequest {}
-
-// ListPodResourcesResponse is the response returned by List function
-message ListPodResourcesResponse {
-    repeated PodResources pod_resources = 1;
-}
-
-// PodResources contains information about the node resources assigned to a pod
-message PodResources {
-    string name = 1;
-    string namespace = 2;
-    repeated ContainerResources containers = 3;
-}
-
-// ContainerResources contains information about the resources assigned to a container
-message ContainerResources {
-    string name = 1;
-    repeated ContainerDevices devices = 2;
-}
-
-// ContainerDevices contains information about the devices assigned to a container
-message ContainerDevices {
-    string resource_name = 1;
-    repeated string device_ids = 2;
-}
-```
-
-### Potential Future Improvements
-
-* Add `ListAndWatch()` function to the GRPC endpoint so monitoring agents don't need to poll.
-* Add identifiers for other resources used by pods to the `PodResources` message.
-  * For example, persistent volume location on disk
-
-## Alternatives Considered
-
-### Add v1alpha1 Kubelet GRPC service, at `/var/lib/kubelet/pod-resources/kubelet.sock`, which returns a list of [CreateContainerRequest](https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/apis/cri/runtime/v1alpha2/api.proto#L734)s used to create containers.
-* Pros:
-  * Reuse an existing API for describing containers rather than inventing a new one
-* Cons:
-  * It ties the endpoint to the CreateContainerRequest, and may prevent us from adding other information we want in the future
-  * It does not contain any additional information that will be useful to monitoring agents other than device, and contains lots of irrelevant information for this use-case.
-* Notes:
-  * Does not include any reference to resource names.  Monitoring agentes must identify devices by the device or environment variables passed to the pod or container.
-
-### Add a field to Pod Status. 
-* Pros:
-  * Allows for observation of container to device bindings local to the node through the `/pods` endpoint
-* Cons:
-  * Only consumed locally, which doesn't justify an API change
-  * Device Bindings are immutable after allocation, and are _debatably_ observable (they can be "observed" from the local checkpoint file).  Device bindings are generally a poor fit for status.
-
-### Use the Kubelet Device Manager Checkpoint file
-* Allows for observability of device to container bindings through what exists in the checkpoint file
-  * Requires adding additional metadata to the checkpoint file as required by the monitoring agent
-* Requires implementing versioning for the checkpoint file, and handling version skew between readers and the kubelet
-* Future modifications to the checkpoint file are more difficult.
-
-### Add a field to the Pod Spec:
-* A new object `ComputeDevice` will be defined and a new variable `ComputeDevices` will be added in the `Container` (Spec) object which will represent a list of `ComputeDevice` objects.
-```golang
-// ComputeDevice describes the devices assigned to this container for a given ResourceName
-type ComputeDevice struct {
-	// DeviceIDs is the list of devices assigned to this container
-	DeviceIDs []string
-	// ResourceName is the name of the compute resource
-	ResourceName string
-}
-
-// Container represents a single container that is expected to be run on the host.
-type Container struct {
-    ...
-	// ComputeDevices contains the devices assigned to this container
-	// This field is alpha-level and is only honored by servers that enable the ComputeDevices feature.
-	// +optional
-	ComputeDevices []ComputeDevice
-	...
-}
-```
-* During Kubelet pod admission, if `ComputeDevices` is found non-empty, specified devices will be allocated otherwise behaviour will remain same as it is today.
-* Before starting the pod, the kubelet writes the assigned `ComputeDevices` back to the pod spec.  
-  * Note: Writing to the Api Server and waiting to observe the updated pod spec in the kubelet's pod watch may add significant latency to pod startup.
-* Allows devices to potentially be assigned by a custom scheduler.
-* Serves as a permanent record of device assignments for the kubelet, and eliminates the need for the kubelet to maintain this state locally.
-
+KEPs have moved to https://git.k8s.io/enhancements/.
+<!--
+This file is a placeholder to preserve links. Please remove after 6 months or the release of Kubernetes 1.15, whichever comes first.
+-->
+\ No newline at end of file
author	Stephen Augustus <foo@agst.us>	2018-12-01 02:40:42 -0500
committer	Stephen Augustus <foo@agst.us>	2018-12-01 02:40:42 -0500
commit	1004e56177eb12d85b6e0f6cf1ccd00431f7336b (patch)
tree	e2a87f95b32e046ed32a2eea6cde661704e61fbd /keps/sig-node
parent	973b19523840d207ae206175ac2093d3b564668c (diff)