diff options
| author | Derek Carr <decarr@redhat.com> | 2017-08-25 20:55:42 -0400 |
|---|---|---|
| committer | GitHub <noreply@github.com> | 2017-08-25 20:55:42 -0400 |
| commit | 398dc7e3f7aa8a399953ac3df3e657fe22b54698 (patch) | |
| tree | f32d6e3b674a5c81ba32b246a4edb8d32922dcbb | |
| parent | de0f3c3606744fa787d5c681b657bc841c343330 (diff) | |
| parent | 0d66c855d66276247a810043a0a880e7c947edd6 (diff) | |
Merge pull request #946 from dashpole/priority_eviction
Modify Eviction Strategy to take Priority into account
| -rw-r--r-- | contributors/design-proposals/kubelet-eviction.md | 39 |
1 files changed, 25 insertions, 14 deletions
diff --git a/contributors/design-proposals/kubelet-eviction.md b/contributors/design-proposals/kubelet-eviction.md index 68b39ec1..1700babe 100644 --- a/contributors/design-proposals/kubelet-eviction.md +++ b/contributors/design-proposals/kubelet-eviction.md @@ -241,6 +241,22 @@ the `kubelet` will select a subsequent pod. ## Eviction Strategy +The `kubelet` will implement an eviction strategy oriented around +[Priority](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/pod-priority-api.md) +and pod usage relative to requests. It will target pods that are the lowest +Priority, and are the largest consumers of the starved resource relative to +their scheduling request. + +It will target pods whose usage of the starved resource exceeds its requests. +Of those pods, it will rank by a function of priority, and usage - requests. +Roughly speaking, if a pod has twice the priority of another pod, it will +recieve half the penalty for usage above requests. If system daemons are +exceeding their allocation (see [Strategy Caveat](strategy-caveat) below), +and all pods are using less than their requests, then it will evict a pod +whose usage is less than requests, based on the function of priority, and +usage - requests. + +Prior to v1.8: The `kubelet` will implement a default eviction strategy oriented around the pod quality of service class. @@ -258,14 +274,16 @@ starved resource. relative to their request are killed first. If no pod has exceeded its request, the strategy targets the largest consumer of the starved resource. -A guaranteed pod is guaranteed to never be evicted because of another pod's -resource consumption. That said, guarantees are only as good as the underlying -foundation they are built upon. If a system daemon +### Strategy Caveat + +A pod consuming less resources than its requests is guaranteed to never be +evicted because of another pod's resource consumption. That said, guarantees +are only as good as the underlying foundation they are built upon. If a system daemon (i.e. `kubelet`, `docker`, `journald`, etc.) is consuming more resources than -were reserved via `system-reserved` or `kube-reserved` allocations, and the node -only has guaranteed pod(s) remaining, then the node must choose to evict a -guaranteed pod in order to preserve node stability, and to limit the impact -of the unexpected consumption to other guaranteed pod(s). +were reserved via `system-reserved` or `kube-reserved` allocations, then the node +must choose to evict a pod, even if it is consuming less than its requests. +It must take action in order to preserve node stability, and to limit the impact +of the unexpected consumption to other well-behaved pod(s). ## Disk based evictions @@ -458,13 +476,6 @@ for eviction. Instead `DaemonSet` should ideally include Guaranteed pods only. The pod eviction may evict more pods than needed due to stats collection timing gap. This can be mitigated by adding the ability to get root container stats on an on-demand basis (https://github.com/google/cadvisor/issues/1247) in the future. -### How kubelet ranks pods for eviction in response to inode exhaustion - -At this time, it is not possible to know how many inodes were consumed by a particular container. If the `kubelet` observes -inode exhaustion, it will evict pods by ranking them by quality of service. The following issue has been opened in cadvisor -to track per container inode consumption (https://github.com/google/cadvisor/issues/1422) which would allow us to rank pods -by inode consumption. For example, this would let us identify a container that created large numbers of 0 byte files, and evict -that pod over others. <!-- BEGIN MUNGE: GENERATED_ANALYTICS --> []() |
