diff options
| author | wojtekt <wojtekt@google.com> | 2019-06-01 21:41:21 +0200 |
|---|---|---|
| committer | wojtekt <wojtekt@google.com> | 2019-06-01 22:04:15 +0200 |
| commit | f0c6e48b7b5746508074a4e9791c8635d39bc985 (patch) | |
| tree | 75a676b6ee4997fdc5005c2c344d26ff6fb87b8e /sig-scalability | |
| parent | f042a6d212fcea419289d3876bccd59e89ca7f01 (diff) | |
Remove burst SLIs/SLOs
Diffstat (limited to 'sig-scalability')
| -rw-r--r-- | sig-scalability/slos/slos.md | 24 | ||||
| -rw-r--r-- | sig-scalability/slos/system_throughput.md | 34 |
2 files changed, 0 insertions, 58 deletions
diff --git a/sig-scalability/slos/slos.md b/sig-scalability/slos/slos.md index b418abd1..2cb707f8 100644 --- a/sig-scalability/slos/slos.md +++ b/sig-scalability/slos/slos.md @@ -63,18 +63,6 @@ we will not provide any guarantees for users. [Service Level Objectives]: https://en.wikipedia.org/wiki/Service_level_objective [Service Level Agreement]: https://en.wikipedia.org/wiki/Service-level_agreement -## Types of SLOs - -While SLIs are very generic and don't really depend on anything (they just -define what and how we measure), it's not the case for SLOs. -SLOs provide guarantees, and satisfying them may depend on meeting some -specific requirements. - -As a result, we build our SLOs in "you promise, we promise" format. -That means, that we provide you a guarantee only if you satisfy the requirement -that we put on you. - -As a consequence we introduce the two types of SLOs. ### Steady state SLOs @@ -87,12 +75,6 @@ We define system to be in steady state when the cluster churn per second is <= 2 churn = #(Pod spec creations/updates/deletions) + #(user originated requests) in a given second ``` -### Burst SLO - -With burst SLOs, we provide guarantees on how system behaves under the heavy load -(when user wants the system to do something as quickly as possible not caring too -much about response time). - ## Environment In order to meet the SLOs, system must run in the environment satisfying @@ -145,12 +127,6 @@ sliding window. However, for the purpose of SLO itself, it basically means "fraction of good minutes per day" being within threshold. -### Burst SLIs/SLOs - -| Status | SLI | SLO | User stories, test scenarios, ... | -| --- | --- | --- | --- | -| WIP | Time to start 30\*#nodes pods, measured from test scenario start until observing last Pod as ready | Benchmark: when all images present on all Nodes, 99th percentile <= X minutes | [Details](./system_throughput.md) | - ### Other SLIs | Status | SLI | User stories, ... | diff --git a/sig-scalability/slos/system_throughput.md b/sig-scalability/slos/system_throughput.md deleted file mode 100644 index eb3fc6af..00000000 --- a/sig-scalability/slos/system_throughput.md +++ /dev/null @@ -1,34 +0,0 @@ -## System throughput SLI/SLO details - -### Definition - -| Status | SLI | SLO | -| --- | --- | --- | -| WIP | Time to start 30\*#nodes pods, measured from test scenario start until observing last Pod as ready | Benchmark: when all images present on all Nodes, 99th percentile <= X minutes | - -### User stories -- As a user, I want a guarantee that my workload of X pods can be started - within a given time -- As a user, I want to understand how quickly I can react to a dramatic - change in workload profile when my workload exhibits very bursty behavior - (e.g. shop during Back Friday Sale) -- As a user, I want a guarantee how quickly I can recreate the whole setup - in case of a serious disaster which brings the whole cluster down. - -### Test scenario -- Start with a healthy (all nodes ready, all cluster addons already running) - cluster with N (>0) running pause pods per node. -- Create a number of `Namespaces` and a number of `Deployments` in each of them. -- All `Namespaces` should be isomorphic, possibly excluding last one which should - run all pods that didn't fit in the previous ones. -- Single namespace should run 5000 `Pods` in the following configuration: - - one big `Deployment` running ~1/3 of all `Pods` from this `namespace` - - medium `Deployments`, each with 120 `Pods`, in total running ~1/3 of all - `Pods` from this `namespace` - - small `Deployment`, each with 10 `Pods`, in total running ~1/3 of all `Pods` - from this `Namespace` -- Each `Deployment` should be covered by a single `Service`. -- Each `Pod` in any `Deployment` contains two pause containers, one `Secret` - other than default `ServiceAccount` and one `ConfigMap`. Additionally it has - resource requests set and doesn't use any advanced scheduling features or - init containers. |
