diff options
| author | k8s-ci-robot <k8s-ci-robot@users.noreply.github.com> | 2018-08-30 13:06:13 -0700 |
|---|---|---|
| committer | GitHub <noreply@github.com> | 2018-08-30 13:06:13 -0700 |
| commit | 19dc4bed5b05c23a69e631f96573f8ed2e0774cb (patch) | |
| tree | 21de1478de48c6350bfe09c7c4e5d049a73fa2fd | |
| parent | 1622ad3278330629014104e9af5c641feb382137 (diff) | |
| parent | 7cb010a67a0339f0550af088e8185d6b9e8dc60f (diff) | |
Merge pull request #2588 from vishh/wg-ml
Extending resource quota to support node labels
| -rw-r--r-- | keps/NEXT_KEP_NUMBER | 2 | ||||
| -rw-r--r-- | keps/sig-scheduling/node-labels-quota.md | 137 |
2 files changed, 138 insertions, 1 deletions
diff --git a/keps/NEXT_KEP_NUMBER b/keps/NEXT_KEP_NUMBER index f64f5d8d..9902f178 100644 --- a/keps/NEXT_KEP_NUMBER +++ b/keps/NEXT_KEP_NUMBER @@ -1 +1 @@ -27 +28 diff --git a/keps/sig-scheduling/node-labels-quota.md b/keps/sig-scheduling/node-labels-quota.md new file mode 100644 index 00000000..1d44a58d --- /dev/null +++ b/keps/sig-scheduling/node-labels-quota.md @@ -0,0 +1,137 @@ +--- +kep-number: 27 +title: Resource Quota based on Node Labels +authors: + - "@vishh" + - "@bsalamat" +owning-sig: sig-scheduling +participating-sigs: sig-architecture +reviewers: + - "@derekwaynecarr" + - "@davidopp" +approvers: + - TBD +editor: TBD +creation-date: 2018-08-23 +status: provisional +--- + +# Resource Quota based on Node Labels + +## Summary + +Allowing Resource Quota to be applied on pods based on their node selector configuration opens up a flexible interface for addressing some immediate and potential future use cases. + +## Motivation + +As a kubernetes cluster administrator, I'd like to, + +1. Restrict namespaces to specific HW types they can consume. Nodes are expected to be homogeneous wrt. to specific types of HW and HW type will be exposed as node labels. + * A concrete example - An intern should only use the cheapest GPU available in my cluster, while researchers can consume the latest or most expensive GPUs. +2. Restrict compute resources consumed by namespaces on different zones or dedicated node pools. +3. Restrict compute resources consumed by namespaces based on policy (FIPS, HIPAA, etc) compliance on individual nodes. + +This proposal presents flexible solution(s) for addressing these use cases without introducing much additional complexity to core kubernetes. + +## Potential solutions + +This proposal currently identifies two possible solutions, with the first one being the _preferred_ solution. + +### Solution A - Extend Resource Quota Scopes + +Resource Quota already includes a built in extension mechanism called [Resource Scopes](https://github.com/kubernetes/api/blob/master/core/v1/types.go#L4746). +It is possible to add a new Resource Scope called “NodeAffinityKey” (or something similar) that will allow for Resource Quota limits to apply to node selector and/or affinity fields specified in the pod spec. + +Here’s an illustration of a sample object with these new fields: + +```yaml +apiVersion: v1 +kind: ResourceQuota +metadata: + name: hipaa-nodes + namespace: team-1 +spec: + hard: + cpu: 1000 + memory: 100Gi + scopeSelector: + scopeName: NodeAffinityKey + operator: In + values: [“hipaa-compliant: true”] +``` + +``` yaml +apiVersion: v1 +kind: ResourceQuota +metadata: + name: nvidia-tesla-v100-quota + namespace: team-1 +spec: + hard: + - nvidia.com/gpu: 128 + scopeSelector: + scopeName: NodeAffinityKey + operator: In + values: [“nvidia.com/gpu-type:nvidia-tesla-v100”] +``` + +It is possible for quotas to overlap with this feature as is the case today. +All quotas have to be satisfied for the pod to be admitted. + +[Quota configuration object](https://github.com/kubernetes/kubernetes/blob/7f23a743e8c23ac6489340bbb34fa6f1d392db9d/plugin/pkg/admission/resourcequota/apis/resourcequota/types.go#L32) will also support the new scope to allow for preventing pods from running on nodes that match a label selector unless a corresponding quota object has been created. + +#### Pros + +- Support arbitrary properties to be consumed as part of quota as long as they are exposed as node labels. +- Little added cognitive burden - follows existing API paradigms. +- Implementation is straightforward. +- Doesn’t compromise portability - Quota remains an administrator burden. + +#### Cons + +- Requires property labels to become standardized if portability is desired. This is required anyways irrespective of how they are exposed outside of the node for scheduling portability. +- Label keys and values are concatenated. Given that most selector use cases for quota will be deterministic (one -> one), the proposed API schema might be adequate. + +### Solution B - Extend Resource Quota to include an explicit Node Selector field + +This solution is similar to the previous one with changes to the API where instead of re-using scopes we can add an explicit Node Selector field to the Resource Quota object. + +```yaml +apiVersion: v1 +kind: ResourceQuota +metadata: + name: hipaa-nodes + namespace: team-1 +spec: + hard: + cpu: 1000 + memory: 100Gi + podNodeSelector: + matchExpressions: + - key: hipaa-compliant + operator: In + values: ["true"] +``` + +Users should already be familiar with the Node Selector spec illustrated here as it is used in pod and volume topology specifications. +However this solution introduces a field that is only applicable to a few types of resources that Resource Quota can be used to control. + +### Solution C - CRD for expressing Resource Quota for extended resources + +The idea behind this solution is to let individual kubernetes vendors create additional CRDs that will allow for expressing quota per namespace for their resource and have a controller that will use mutating webhooks to quota pods on creation & deletion. +The controller can also keep track of “in use” quota for the resource it owns similar to the built in resource quota object. +The schema for quota is controlled by the resource vendor and the onus of maintaining compatibility and portability is on them. + +#### Pros + +- Maximum flexibility + - Use arbitrary specifications associated with a pod to define quota policies + - The spec for quota itself can be arbitrarily complex +- Develop and maintain outside of upstream + +#### Cons + +- Added administrator burden. An admin needs to identify multiple types of quota objects based on the HW they consume. +- It is not trivial to develop an external CRD given the lack of some critical validation, versioning, and lifecycle primitives. +- Tracking quota is non trivial - perhaps a canonical (example) quota controller might help ease the pain. +- Hard to generate available and in-use quota reports for users - existing quota support in ecosystem components will not support this new quota object (kubectl for example). |
