summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorwojtekt <wojtekt@google.com>2019-06-01 21:41:21 +0200
committerwojtekt <wojtekt@google.com>2019-06-01 22:04:15 +0200
commitf0c6e48b7b5746508074a4e9791c8635d39bc985 (patch)
tree75a676b6ee4997fdc5005c2c344d26ff6fb87b8e
parentf042a6d212fcea419289d3876bccd59e89ca7f01 (diff)
Remove burst SLIs/SLOs
-rw-r--r--sig-scalability/slos/slos.md24
-rw-r--r--sig-scalability/slos/system_throughput.md34
2 files changed, 0 insertions, 58 deletions
diff --git a/sig-scalability/slos/slos.md b/sig-scalability/slos/slos.md
index b418abd1..2cb707f8 100644
--- a/sig-scalability/slos/slos.md
+++ b/sig-scalability/slos/slos.md
@@ -63,18 +63,6 @@ we will not provide any guarantees for users.
[Service Level Objectives]: https://en.wikipedia.org/wiki/Service_level_objective
[Service Level Agreement]: https://en.wikipedia.org/wiki/Service-level_agreement
-## Types of SLOs
-
-While SLIs are very generic and don't really depend on anything (they just
-define what and how we measure), it's not the case for SLOs.
-SLOs provide guarantees, and satisfying them may depend on meeting some
-specific requirements.
-
-As a result, we build our SLOs in "you promise, we promise" format.
-That means, that we provide you a guarantee only if you satisfy the requirement
-that we put on you.
-
-As a consequence we introduce the two types of SLOs.
### Steady state SLOs
@@ -87,12 +75,6 @@ We define system to be in steady state when the cluster churn per second is <= 2
churn = #(Pod spec creations/updates/deletions) + #(user originated requests) in a given second
```
-### Burst SLO
-
-With burst SLOs, we provide guarantees on how system behaves under the heavy load
-(when user wants the system to do something as quickly as possible not caring too
-much about response time).
-
## Environment
In order to meet the SLOs, system must run in the environment satisfying
@@ -145,12 +127,6 @@ sliding window. However, for the purpose of SLO itself, it basically means
"fraction of good minutes per day" being within threshold.
-### Burst SLIs/SLOs
-
-| Status | SLI | SLO | User stories, test scenarios, ... |
-| --- | --- | --- | --- |
-| WIP | Time to start 30\*#nodes pods, measured from test scenario start until observing last Pod as ready | Benchmark: when all images present on all Nodes, 99th percentile <= X minutes | [Details](./system_throughput.md) |
-
### Other SLIs
| Status | SLI | User stories, ... |
diff --git a/sig-scalability/slos/system_throughput.md b/sig-scalability/slos/system_throughput.md
deleted file mode 100644
index eb3fc6af..00000000
--- a/sig-scalability/slos/system_throughput.md
+++ /dev/null
@@ -1,34 +0,0 @@
-## System throughput SLI/SLO details
-
-### Definition
-
-| Status | SLI | SLO |
-| --- | --- | --- |
-| WIP | Time to start 30\*#nodes pods, measured from test scenario start until observing last Pod as ready | Benchmark: when all images present on all Nodes, 99th percentile <= X minutes |
-
-### User stories
-- As a user, I want a guarantee that my workload of X pods can be started
- within a given time
-- As a user, I want to understand how quickly I can react to a dramatic
- change in workload profile when my workload exhibits very bursty behavior
- (e.g. shop during Back Friday Sale)
-- As a user, I want a guarantee how quickly I can recreate the whole setup
- in case of a serious disaster which brings the whole cluster down.
-
-### Test scenario
-- Start with a healthy (all nodes ready, all cluster addons already running)
- cluster with N (>0) running pause pods per node.
-- Create a number of `Namespaces` and a number of `Deployments` in each of them.
-- All `Namespaces` should be isomorphic, possibly excluding last one which should
- run all pods that didn't fit in the previous ones.
-- Single namespace should run 5000 `Pods` in the following configuration:
- - one big `Deployment` running ~1/3 of all `Pods` from this `namespace`
- - medium `Deployments`, each with 120 `Pods`, in total running ~1/3 of all
- `Pods` from this `namespace`
- - small `Deployment`, each with 10 `Pods`, in total running ~1/3 of all `Pods`
- from this `Namespace`
-- Each `Deployment` should be covered by a single `Service`.
-- Each `Pod` in any `Deployment` contains two pause containers, one `Secret`
- other than default `ServiceAccount` and one `ConfigMap`. Additionally it has
- resource requests set and doesn't use any advanced scheduling features or
- init containers.