Remove burst SLIs/SLOs

author: wojtekt <wojtekt@google.com> 2019-06-01 21:41:21 +0200
committer: wojtekt <wojtekt@google.com> 2019-06-01 22:04:15 +0200
commit: f0c6e48b7b5746508074a4e9791c8635d39bc985 (patch)
tree: 75a676b6ee4997fdc5005c2c344d26ff6fb87b8e
parent: f042a6d212fcea419289d3876bccd59e89ca7f01 (diff)
2 files changed, 0 insertions, 58 deletions
diff --git a/sig-scalability/slos/slos.md b/sig-scalability/slos/slos.md
index b418abd1..2cb707f8 100644
--- a/sig-scalability/slos/slos.md
+++ b/sig-scalability/slos/slos.md
@@ -63,18 +63,6 @@ we will not provide any guarantees for users.
 [Service Level Objectives]: https://en.wikipedia.org/wiki/Service_level_objective
 [Service Level Agreement]: https://en.wikipedia.org/wiki/Service-level_agreement
 
-## Types of SLOs
-
-While SLIs are very generic and don't really depend on anything (they just
-define what and how we measure), it's not the case for SLOs.
-SLOs provide guarantees, and satisfying them may depend on meeting some
-specific requirements.
-
-As a result, we build our SLOs in "you promise, we promise" format.
-That means, that we provide you a guarantee only if you satisfy the requirement
-that we put on you.
-
-As a consequence we introduce the two types of SLOs.
 
 ### Steady state SLOs
 
@@ -87,12 +75,6 @@ We define system to be in steady state when the cluster churn per second is <= 2
 churn = #(Pod spec creations/updates/deletions) + #(user originated requests) in a given second
 ```
 
-### Burst SLO
-
-With burst SLOs, we provide guarantees on how system behaves under the heavy load
-(when user wants the system to do something as quickly as possible not caring too
-much about response time).
-
 ## Environment
 
 In order to meet the SLOs, system must run in the environment satisfying
@@ -145,12 +127,6 @@ sliding window. However, for the purpose of SLO itself, it basically means
 "fraction of good minutes per day" being within threshold.
 
 
-### Burst SLIs/SLOs
-
-| Status | SLI | SLO | User stories, test scenarios, ... |
-| --- | --- | --- | --- |
-| WIP | Time to start 30\*#nodes pods, measured from test scenario start until observing last Pod as ready | Benchmark: when all images present on all Nodes, 99th percentile <= X minutes | [Details](./system_throughput.md) |
-
 ### Other SLIs
 
 | Status | SLI | User stories, ... |
diff --git a/sig-scalability/slos/system_throughput.md b/sig-scalability/slos/system_throughput.md
deleted file mode 100644
index eb3fc6af..00000000
--- a/sig-scalability/slos/system_throughput.md
+++ /dev/null
@@ -1,34 +0,0 @@
-## System throughput SLI/SLO details
-
-### Definition
-
-| Status | SLI | SLO |
-| --- | --- | --- |
-| WIP | Time to start 30\*#nodes pods, measured from test scenario start until observing last Pod as ready | Benchmark: when all images present on all Nodes, 99th percentile <= X minutes |
-
-### User stories
-- As a user, I want a guarantee that my workload of X pods can be started
-  within a given time
-- As a user, I want to understand how quickly I can react to a dramatic
-  change in workload profile when my workload exhibits very bursty behavior
-  (e.g. shop during Back Friday Sale)
-- As a user, I want a guarantee how quickly I can recreate the whole setup
-  in case of a serious disaster which brings the whole cluster down.
-
-### Test scenario
-- Start with a healthy (all nodes ready, all cluster addons already running)
-  cluster with N (>0) running pause pods per node.
-- Create a number of `Namespaces` and a number of `Deployments` in each of them.
-- All `Namespaces` should be isomorphic, possibly excluding last one which should
-  run all pods that didn't fit in the previous ones.
-- Single namespace should run 5000 `Pods` in the following configuration:
-  - one big `Deployment` running ~1/3 of all `Pods` from this `namespace`
-  - medium `Deployments`, each with 120 `Pods`, in total running ~1/3 of all
-    `Pods` from this `namespace`
-  - small `Deployment`, each with 10 `Pods`, in total running ~1/3 of all `Pods`
-    from this `Namespace`
-- Each `Deployment` should be covered by a single `Service`.
-- Each `Pod` in any `Deployment` contains two pause containers, one `Secret`
-  other than default `ServiceAccount` and one `ConfigMap`. Additionally it has
-  resource requests set and doesn't use any advanced scheduling features or
-  init containers.
author	wojtekt <wojtekt@google.com>	2019-06-01 21:41:21 +0200
committer	wojtekt <wojtekt@google.com>	2019-06-01 22:04:15 +0200
commit	f0c6e48b7b5746508074a4e9791c8635d39bc985 (patch)
tree	75a676b6ee4997fdc5005c2c344d26ff6fb87b8e
parent	f042a6d212fcea419289d3876bccd59e89ca7f01 (diff)