summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authork8s-ci-robot <k8s-ci-robot@users.noreply.github.com>2018-08-30 13:06:13 -0700
committerGitHub <noreply@github.com>2018-08-30 13:06:13 -0700
commit19dc4bed5b05c23a69e631f96573f8ed2e0774cb (patch)
tree21de1478de48c6350bfe09c7c4e5d049a73fa2fd
parent1622ad3278330629014104e9af5c641feb382137 (diff)
parent7cb010a67a0339f0550af088e8185d6b9e8dc60f (diff)
Merge pull request #2588 from vishh/wg-ml
Extending resource quota to support node labels
-rw-r--r--keps/NEXT_KEP_NUMBER2
-rw-r--r--keps/sig-scheduling/node-labels-quota.md137
2 files changed, 138 insertions, 1 deletions
diff --git a/keps/NEXT_KEP_NUMBER b/keps/NEXT_KEP_NUMBER
index f64f5d8d..9902f178 100644
--- a/keps/NEXT_KEP_NUMBER
+++ b/keps/NEXT_KEP_NUMBER
@@ -1 +1 @@
-27
+28
diff --git a/keps/sig-scheduling/node-labels-quota.md b/keps/sig-scheduling/node-labels-quota.md
new file mode 100644
index 00000000..1d44a58d
--- /dev/null
+++ b/keps/sig-scheduling/node-labels-quota.md
@@ -0,0 +1,137 @@
+---
+kep-number: 27
+title: Resource Quota based on Node Labels
+authors:
+ - "@vishh"
+ - "@bsalamat"
+owning-sig: sig-scheduling
+participating-sigs: sig-architecture
+reviewers:
+ - "@derekwaynecarr"
+ - "@davidopp"
+approvers:
+ - TBD
+editor: TBD
+creation-date: 2018-08-23
+status: provisional
+---
+
+# Resource Quota based on Node Labels
+
+## Summary
+
+Allowing Resource Quota to be applied on pods based on their node selector configuration opens up a flexible interface for addressing some immediate and potential future use cases.
+
+## Motivation
+
+As a kubernetes cluster administrator, I'd like to,
+
+1. Restrict namespaces to specific HW types they can consume. Nodes are expected to be homogeneous wrt. to specific types of HW and HW type will be exposed as node labels.
+ * A concrete example - An intern should only use the cheapest GPU available in my cluster, while researchers can consume the latest or most expensive GPUs.
+2. Restrict compute resources consumed by namespaces on different zones or dedicated node pools.
+3. Restrict compute resources consumed by namespaces based on policy (FIPS, HIPAA, etc) compliance on individual nodes.
+
+This proposal presents flexible solution(s) for addressing these use cases without introducing much additional complexity to core kubernetes.
+
+## Potential solutions
+
+This proposal currently identifies two possible solutions, with the first one being the _preferred_ solution.
+
+### Solution A - Extend Resource Quota Scopes
+
+Resource Quota already includes a built in extension mechanism called [Resource Scopes](https://github.com/kubernetes/api/blob/master/core/v1/types.go#L4746).
+It is possible to add a new Resource Scope called “NodeAffinityKey” (or something similar) that will allow for Resource Quota limits to apply to node selector and/or affinity fields specified in the pod spec.
+
+Here’s an illustration of a sample object with these new fields:
+
+```yaml
+apiVersion: v1
+kind: ResourceQuota
+metadata:
+ name: hipaa-nodes
+ namespace: team-1
+spec:
+ hard:
+ cpu: 1000
+ memory: 100Gi
+ scopeSelector:
+ scopeName: NodeAffinityKey
+ operator: In
+ values: [“hipaa-compliant: true”]
+```
+
+``` yaml
+apiVersion: v1
+kind: ResourceQuota
+metadata:
+ name: nvidia-tesla-v100-quota
+ namespace: team-1
+spec:
+ hard:
+ - nvidia.com/gpu: 128
+ scopeSelector:
+ scopeName: NodeAffinityKey
+ operator: In
+ values: [“nvidia.com/gpu-type:nvidia-tesla-v100”]
+```
+
+It is possible for quotas to overlap with this feature as is the case today.
+All quotas have to be satisfied for the pod to be admitted.
+
+[Quota configuration object](https://github.com/kubernetes/kubernetes/blob/7f23a743e8c23ac6489340bbb34fa6f1d392db9d/plugin/pkg/admission/resourcequota/apis/resourcequota/types.go#L32) will also support the new scope to allow for preventing pods from running on nodes that match a label selector unless a corresponding quota object has been created.
+
+#### Pros
+
+- Support arbitrary properties to be consumed as part of quota as long as they are exposed as node labels.
+- Little added cognitive burden - follows existing API paradigms.
+- Implementation is straightforward.
+- Doesn’t compromise portability - Quota remains an administrator burden.
+
+#### Cons
+
+- Requires property labels to become standardized if portability is desired. This is required anyways irrespective of how they are exposed outside of the node for scheduling portability.
+- Label keys and values are concatenated. Given that most selector use cases for quota will be deterministic (one -> one), the proposed API schema might be adequate.
+
+### Solution B - Extend Resource Quota to include an explicit Node Selector field
+
+This solution is similar to the previous one with changes to the API where instead of re-using scopes we can add an explicit Node Selector field to the Resource Quota object.
+
+```yaml
+apiVersion: v1
+kind: ResourceQuota
+metadata:
+ name: hipaa-nodes
+ namespace: team-1
+spec:
+ hard:
+ cpu: 1000
+ memory: 100Gi
+ podNodeSelector:
+ matchExpressions:
+ - key: hipaa-compliant
+ operator: In
+ values: ["true"]
+```
+
+Users should already be familiar with the Node Selector spec illustrated here as it is used in pod and volume topology specifications.
+However this solution introduces a field that is only applicable to a few types of resources that Resource Quota can be used to control.
+
+### Solution C - CRD for expressing Resource Quota for extended resources
+
+The idea behind this solution is to let individual kubernetes vendors create additional CRDs that will allow for expressing quota per namespace for their resource and have a controller that will use mutating webhooks to quota pods on creation & deletion.
+The controller can also keep track of “in use” quota for the resource it owns similar to the built in resource quota object.
+The schema for quota is controlled by the resource vendor and the onus of maintaining compatibility and portability is on them.
+
+#### Pros
+
+- Maximum flexibility
+ - Use arbitrary specifications associated with a pod to define quota policies
+ - The spec for quota itself can be arbitrarily complex
+- Develop and maintain outside of upstream
+
+#### Cons
+
+- Added administrator burden. An admin needs to identify multiple types of quota objects based on the HW they consume.
+- It is not trivial to develop an external CRD given the lack of some critical validation, versioning, and lifecycle primitives.
+- Tracking quota is non trivial - perhaps a canonical (example) quota controller might help ease the pain.
+- Hard to generate available and in-use quota reports for users - existing quota support in ecosystem components will not support this new quota object (kubectl for example).