Merge pull request #12291 from derekwaynecarr/resource_quota_requests

Update resource quota design to align with requests and limits
author: Dawn Chen <dawnchen@google.com> 2015-08-06 16:07:42 -0700
committer: Dawn Chen <dawnchen@google.com> 2015-08-06 16:07:42 -0700
commit: 9422c11630e7694b6b835e9fd70a4e21bbdd03ae (patch)
tree: b3e4b14bd1e48e7ca309b02606e233c58bd47f30
parent: 0cbc94352d55aa105d05145dba2ec4327f0a05b2 (diff)
parent: 94ec57fba832c57a013d9acc9bff51d8b4a42ce3 (diff)
1 files changed, 91 insertions, 57 deletions
diff --git a/admission_control_resource_quota.md b/admission_control_resource_quota.md
index 136603d2..bb7c6e0a 100644
--- a/admission_control_resource_quota.md
+++ b/admission_control_resource_quota.md
@@ -35,13 +35,17 @@ Documentation for other releases can be found at
 
 ## Background
 
-This document proposes a system for enforcing hard resource usage limits per namespace as part of admission control.
+This document describes a system for enforcing hard resource usage limits per namespace as part of admission control.
 
-## Model Changes
+## Use cases
 
-A new resource, **ResourceQuota**, is introduced to enumerate hard resource limits in a Kubernetes namespace.
+1. Ability to enumerate resource usage limits per namespace.
+2. Ability to monitor resource usage for tracked resources.
+3. Ability to reject resource usage exceeding hard quotas.
 
-A new resource, **ResourceQuotaUsage**, is introduced to support atomic updates of a **ResourceQuota** status.
+## Data Model
+
+The **ResourceQuota** object is scoped to a **Namespace**.
 
 ```go
 // The following identify resource constants for Kubernetes object types
@@ -54,109 +58,139 @@ const (
   ResourceReplicationControllers ResourceName = "replicationcontrollers"
   // ResourceQuotas, number
   ResourceQuotas ResourceName = "resourcequotas"
+  // ResourceSecrets, number
+  ResourceSecrets ResourceName = "secrets"
+  // ResourcePersistentVolumeClaims, number
+  ResourcePersistentVolumeClaims ResourceName = "persistentvolumeclaims"
 )
 
 // ResourceQuotaSpec defines the desired hard limits to enforce for Quota
 type ResourceQuotaSpec struct {
   // Hard is the set of desired hard limits for each named resource
-  Hard ResourceList `json:"hard,omitempty"`
+  Hard ResourceList `json:"hard,omitempty" description:"hard is the set of desired hard limits for each named resource; see http://releases.k8s.io/HEAD/docs/design/admission_control_resource_quota.md#admissioncontrol-plugin-resourcequota"`
 }
 
 // ResourceQuotaStatus defines the enforced hard limits and observed use
 type ResourceQuotaStatus struct {
   // Hard is the set of enforced hard limits for each named resource
-  Hard ResourceList `json:"hard,omitempty"`
+  Hard ResourceList `json:"hard,omitempty" description:"hard is the set of enforced hard limits for each named resource; see http://releases.k8s.io/HEAD/docs/design/admission_control_resource_quota.md#admissioncontrol-plugin-resourcequota"`
   // Used is the current observed total usage of the resource in the namespace
-  Used ResourceList `json:"used,omitempty"`
+  Used ResourceList `json:"used,omitempty" description:"used is the current observed total usage of the resource in the namespace"`
 }
 
 // ResourceQuota sets aggregate quota restrictions enforced per namespace
 type ResourceQuota struct {
   TypeMeta   `json:",inline"`
-  ObjectMeta `json:"metadata,omitempty"`
+  ObjectMeta `json:"metadata,omitempty" description:"standard object metadata; see http://releases.k8s.io/HEAD/docs/devel/api-conventions.md#metadata"`
 
   // Spec defines the desired quota
-  Spec ResourceQuotaSpec `json:"spec,omitempty"`
-
-  // Status defines the actual enforced quota and its current usage
-  Status ResourceQuotaStatus `json:"status,omitempty"`
-}
-
-// ResourceQuotaUsage captures system observed quota status per namespace
-// It is used to enforce atomic updates of a backing ResourceQuota.Status field in storage
-type ResourceQuotaUsage struct {
-  TypeMeta   `json:",inline"`
-  ObjectMeta `json:"metadata,omitempty"`
+  Spec ResourceQuotaSpec `json:"spec,omitempty" description:"spec defines the desired quota; http://releases.k8s.io/HEAD/docs/devel/api-conventions.md#spec-and-status"`
 
   // Status defines the actual enforced quota and its current usage
-  Status ResourceQuotaStatus `json:"status,omitempty"`
+  Status ResourceQuotaStatus `json:"status,omitempty" description:"status defines the actual enforced quota and current usage; http://releases.k8s.io/HEAD/docs/devel/api-conventions.md#spec-and-status"`
 }
 
 // ResourceQuotaList is a list of ResourceQuota items
 type ResourceQuotaList struct {
   TypeMeta `json:",inline"`
-  ListMeta `json:"metadata,omitempty"`
+  ListMeta `json:"metadata,omitempty" description:"standard list metadata; see http://releases.k8s.io/HEAD/docs/devel/api-conventions.md#metadata"`
 
   // Items is a list of ResourceQuota objects
-  Items []ResourceQuota `json:"items"`
+  Items []ResourceQuota `json:"items" description:"items is a list of ResourceQuota objects; see http://releases.k8s.io/HEAD/docs/design/admission_control_resource_quota.md#admissioncontrol-plugin-resourcequota"`
 }
 ```
 
-## AdmissionControl plugin: ResourceQuota
+## Quota Tracked Resources
 
-The **ResourceQuota** plug-in introspects all incoming admission requests.
+The following resources are supported by the quota system.
 
-It makes decisions by evaluating the incoming object against all defined **ResourceQuota.Status.Hard** resource limits in the request
-namespace.  If acceptance of the resource would cause the total usage of a named resource to exceed its hard limit, the request is denied.
-
-The following resource limits are imposed as part of core Kubernetes at the namespace level:
-
-| ResourceName | Description |
+| Resource | Description |
 | ------------ | ----------- |
-| cpu | Total cpu usage |
-| memory | Total memory usage |
-| pods | Total number of pods  |
+| cpu | Total requested cpu usage |
+| memory | Total requested memory usage |
+| pods | Total number of active pods where phase is pending or active.  |
 | services | Total number of services |
 | replicationcontrollers | Total number of replication controllers |
 | resourcequotas | Total number of resource quotas |
+| secrets | Total number of secrets |
+| persistentvolumeclaims | Total number of persistent volume claims |
 
-Any resource that is not part of core Kubernetes must follow the resource naming convention prescribed by Kubernetes.
+If a third-party wants to track additional resources, it must follow the resource naming conventions prescribed
+by Kubernetes.  This means the resource must have a fully-qualified name (i.e. mycompany.org/shinynewresource)
 
-This means the resource must have a fully-qualified name (i.e. mycompany.org/shinynewresource)
+## Resource Requirements: Requests vs Limits
 
-If the incoming request does not cause the total usage to exceed any of the enumerated hard resource limits, the plug-in will post a
-**ResourceQuotaUsage** document to the server to atomically update the observed usage based on the previously read
-**ResourceQuota.ResourceVersion**.  This keeps incremental usage atomically consistent, but does introduce a bottleneck (intentionally)
-into the system.
+If a resource supports the ability to distinguish between a request and a limit for a resource,
+the quota tracking system will only cost the request value against the quota usage.  If a resource
+is tracked by quota, and no request value is provided, the associated entity is rejected as part of admission.
 
-To optimize system performance, it is encouraged that all resource quotas are tracked on the same **ResourceQuota** document.  As a result,
-its encouraged to actually impose a cap on the total number of individual quotas that are tracked in the **Namespace** to 1 by explicitly
-capping it in **ResourceQuota** document.
+For an example, consider the following scenarios relative to tracking quota on CPU:
 
-## kube-apiserver
+| Pod | Container | Request CPU | Limit CPU | Result |
+| --- | --------- | ----------- | --------- | ------ |
+| X | C1 | 100m | 500m | The quota usage is incremented 100m |
+| Y | C2 | 100m | none | The quota usage is incremented 100m |
+| Y | C2 | none | 500m | The quota usage is incremented 500m since request will default to limit |
+| Z | C3 | none | none | The pod is rejected since it does not enumerate a request. |
 
-The server is updated to be aware of **ResourceQuota** objects.
+The rationale for accounting for the requested amount of a resource versus the limit is the belief
+that a user should only be charged for what they are scheduled against in the cluster.  In addition,
+attempting to track usage against actual usage, where request < actual < limit, is considered highly
+volatile.
 
-The quota is only enforced if the kube-apiserver is started as follows:
+As a consequence of this decision, the user is able to spread its usage of a resource across multiple tiers
+of service.  Let's demonstrate this via an example with a 4 cpu quota.
 
-```console
-$ kube-apiserver -admission_control=ResourceQuota
-```
+The quota may be allocated as follows:
+
+| Pod | Container | Request CPU | Limit CPU | Tier | Quota Usage |
+| --- | --------- | ----------- | --------- | ---- | ----------- |
+| X | C1 | 1 | 4 | Burstable | 1 |
+| Y | C2 | 2 | 2 | Guaranteed | 2 |
+| Z | C3 | 1 | 3 | Burstable | 1 |
 
-## kube-controller-manager
+It is possible that the pods may consume 9 cpu over a given time period depending on the nodes available cpu
+that held pod X and Z, but since we scheduled X and Z relative to the request, we only track the requesting
+value against their allocated quota.  If one wants to restrict the ratio between the request and limit,
+it is encouraged that the user define a **LimitRange** with **LimitRequestRatio** to control burst out behavior.
+This would in effect, let an administrator keep the difference between request and limit more in line with
+tracked usage if desired.
 
-A new controller is defined that runs a synch loop to calculate quota usage across the namespace.
+## Status API
 
-**ResourceQuota** usage is only calculated if a namespace has a **ResourceQuota** object.
+A REST API endpoint to update the status section of the **ResourceQuota** is exposed.  It requires an atomic compare-and-swap
+in order to keep resource usage tracking consistent.
 
-If the observed usage is different than the recorded usage, the controller sends a **ResourceQuotaUsage** resource
-to the server to atomically update.
+## Resource Quota Controller
 
-The synchronization loop frequency will control how quickly DELETE actions are recorded in the system and usage is ticked down.
+A resource quota controller monitors observed usage for tracked resources in the **Namespace**.
+
+If there is observed difference between the current usage stats versus the current **ResourceQuota.Status**, the controller
+posts an update of the currently observed usage metrics to the **ResourceQuota** via the /status endpoint.
+
+The resource quota controller is the only component capable of monitoring and recording usage updates after a DELETE operation
+since admission control is incapable of guaranteeing a DELETE request actually succeeded.
+
+## AdmissionControl plugin: ResourceQuota
+
+The **ResourceQuota** plug-in introspects all incoming admission requests.
+
+To enable the plug-in and support for ResourceQuota, the kube-apiserver must be configured as follows:
+
+```
+$ kube-apiserver -admission_control=ResourceQuota
+```
+
+It makes decisions by evaluating the incoming object against all defined **ResourceQuota.Status.Hard** resource limits in the request
+namespace.  If acceptance of the resource would cause the total usage of a named resource to exceed its hard limit, the request is denied.
+
+If the incoming request does not cause the total usage to exceed any of the enumerated hard resource limits, the plug-in will post a
+**ResourceQuota.Status** document to the server to atomically update the observed usage based on the previously read
+**ResourceQuota.ResourceVersion**.  This keeps incremental usage atomically consistent, but does introduce a bottleneck (intentionally)
+into the system.
 
-To optimize the synchronization loop, this controller will WATCH on Pod resources to track DELETE events, and in response, recalculate
-usage.  This is because a Pod deletion will have the most impact on observed cpu and memory usage in the system, and we anticipate
-this being the resource most closely running at the prescribed quota limits.
+To optimize system performance, it is encouraged that all resource quotas are tracked on the same **ResourceQuota** document in a **Namespace**.  As a result, its encouraged to impose a cap on the total number of individual quotas that are tracked in the **Namespace**
+to 1 in the **ResourceQuota** document.
 
 ## kubectl
author	Dawn Chen <dawnchen@google.com>	2015-08-06 16:07:42 -0700
committer	Dawn Chen <dawnchen@google.com>	2015-08-06 16:07:42 -0700
commit	9422c11630e7694b6b835e9fd70a4e21bbdd03ae (patch)
tree	b3e4b14bd1e48e7ca309b02606e233c58bd47f30
parent	0cbc94352d55aa105d05145dba2ec4327f0a05b2 (diff)
parent	94ec57fba832c57a013d9acc9bff51d8b4a42ce3 (diff)