5 files changed, 104 insertions, 1 deletions
diff --git a/sig-scalability/slos/api_call_latency.md b/sig-scalability/slos/api_call_latency.md
new file mode 100644
index 00000000..f4373703
--- /dev/null
+++ b/sig-scalability/slos/api_call_latency.md
@@ -0,0 +1,42 @@
+## API call latency SLIs/SLOs details
+
+### User stories
+- As a user of vanilla Kubernetes, I want some guarantee how quickly I get the
+response from an API call.
+- As an administrator of Kubernetes cluster, if I know characteristics of my
+external dependencies of apiserver (e.g custom admission plugins, webhooks and
+initializers) I want to be able to provide guarantees for API calls latency to
+users of my cluster
+
+### Other notes
+- We obviously can’t give any guarantee in general, because cluster
+administrators are allowed to register custom admission plugins, webhooks
+and/or initializers, which we don’t have any control about and they obviously
+impact API call latencies.
+- As a result, we define the SLIs to be very generic (no matter how your
+cluster is set up), but we provide SLO only for default installations (where we
+have control over what apiserver is doing). This doesn’t provide a false
+impression, that we provide guarantee no matter how the cluster is setup and
+what is installed on top of it.
+- At the same time, API calls are part of pretty much every non-trivial workflow
+in Kubernetes, so this metric is a building block for less trivial SLIs and
+SLOs.
+- The SLO for latency for read-only API calls of a given type may have significant
+buffer in threshold. In fact, the latency of the request should be proportional to
+the amount of work to do (which is number of objects og a given type in a given
+scope) plus some constant overhead. For better tracking of performance, we
+may want to define purely internal SLI of "latency per object". But that
+isn't in near term plans.
+
+### Caveats
+- The SLO has to be satisfied independently from used encoding in user-originated
+requests. This makes mix of client important while testing. However, we assume
+that all `core` components communicate with apiserver using protocol buffers.
+- In case of GET requests, user has an option opt-in for accepting potentially
+stale data (being served from cache) and the SLO again has to be satisfied
+independently of that. This makes the careful choice of requests in tests
+important.
+
+### Test scenario
+
+__TODO: Descibe test scenario.__
diff --git a/sig-scalability/slos/api_extensions_latency.md b/sig-scalability/slos/api_extensions_latency.md
new file mode 100644
index 00000000..2681422c
--- /dev/null
+++ b/sig-scalability/slos/api_extensions_latency.md
@@ -0,0 +1,6 @@
+## API call extension points latency SLIs details
+
+### User stories
+- As an administrator, if API calls are slow, I would like to know if this is
+because slow extension points (admission plugins, webhooks, initializers) and
+if so which ones are responsible for it.
diff --git a/sig-scalability/slos/slos.md b/sig-scalability/slos/slos.md
index 49d26c6a..34aba2f8 100644
--- a/sig-scalability/slos/slos.md
+++ b/sig-scalability/slos/slos.md
@@ -92,13 +92,39 @@ of the cases.
 We are looking into extending the set of SLIs/SLOs to cover more parts of
 Kubernetes.
 
+```
+Prerequisite: Kubernetes cluster is available and serving.
+```
+
 ### Steady state SLIs/SLOs
 
 | Status | SLI | SLO | User stories, test scenarios, ... |
 | --- | --- | --- | --- |
+| __Official__ | Latency<sup>[1](#footnote1)</sup> of mutating<sup>[2](#footnote2)</sup> API calls for single objects for every (resource, verb) pair, measured as 99th percentile over last 5 minutes | In default Kubernetes installation, for every (resource, verb) pair, excluding virtual and aggregated resources and Custom Resource Definitions, 99th percentile per cluster-day<sup>[3](#footnote3)</sup> <= 1s | [Details](./api_call_latency.md) |
+| __Official__ | Latency<sup>[1](#footnote1)</sup> of non-streaming read-only<sup>[4](#footnote3)</sup> API calls for every (resource, scope<sup>[5](#footnote4)</sup>) pair, measured as 99th percentile over last 5 minutes | In default Kubernetes installation, for every (resource, scope) pair, excluding virtual and aggregated resources and Custom Resource Definitions, 99th percentile per cluster-day (a) <= 1s if `scope=resource` (b) <= 5s if `scope=namespace` (c) <= 30s if `scope=cluster` | [Details](./api_call_latency.md) |
+
+<a name="footnote1">\[1\]</a>By latency of API call in this doc we mean time
+from the moment when apiserver gets the request to last byte of response sent
+to the user.
+
+<a name="footnote2">\[2\]</a>By mutating API calls we mean POST, PUT, DELETE
+and PATCH.
+
+<a name="footnote3">\[3\]</a> For the purpose of visualization it will be a
+sliding window. However, for the purpose of reporting the SLO, it means one
+point per day (whether SLO was satisfied on a given day or not).
+
+<a name="footnote4">\[4\]</a>By non-streaming read-only API calls we mean GET
+requests without `watch=true` option set. (Note that in Kubernetes internally
+it translates to both GET and LIST calls).
+
+<a name="footnote5">\[5\]</a>A scope of a request can be either (a) `resource`
+if the request is about a single object, (b) `namespace` if it is about objects
+from a single namespace or (c) `cluster` if it spawns objects from multiple
+namespaces.
+
 
 __TODO: Migrate existing SLIs/SLOs here:__
-- __API-machinery ones__
 - __Pod startup time__
 
 ### Burst SLIs/SLOs
@@ -106,3 +132,13 @@ __TODO: Migrate existing SLIs/SLOs here:__
 | Status | SLI | SLO | User stories, test scenarios, ... |
 | --- | --- | --- | --- |
 | WIP | Time to start 30\*#nodes pods, measured from test scenario start until observing last Pod as ready | Benchmark: when all images present on all Nodes, 99th percentile <= X minutes | [Details](./system_throughput.md) |
+
+### Other SLIs
+
+| Status | SLI | User stories, ... |
+| --- | --- | --- |
+| WIP | Watch latency for every resource, (from the moment when object is stored in database to when it's ready to be sent to all watchers), measured as 99th percentile over last 5 minutes | TODO |
+| WIP | Admission latency for each admission plugin type, measured as 99th percentile over last 5 minutes | [Details](./api_extensions_latency.md) |
+| WIP | Webhook call latency for each webhook type, measured as 99th percentile over last 5 minutes | [Details](./api_extensions_latency.md) |
+| WIP | Initializer latency for each initializer, measured as 99th percentile over last 5 minutes | [Details](./api_extensions_latency.md) |
+
diff --git a/sig-scalability/slos/system_throughput.md b/sig-scalability/slos/system_throughput.md
index 369a6cba..5691b46d 100644
--- a/sig-scalability/slos/system_throughput.md
+++ b/sig-scalability/slos/system_throughput.md
@@ -1,3 +1,5 @@
+## System throughput SLI/SLO details
+
 ### User stories
 - As a user, I want a guarantee that my workload of X pods can be started
   within a given time
diff --git a/sig-scalability/slos/watch_latency.md b/sig-scalability/slos/watch_latency.md
new file mode 100644
index 00000000..1aa3d488
--- /dev/null
+++ b/sig-scalability/slos/watch_latency.md
@@ -0,0 +1,17 @@
+## Watch latency SLI details
+
+### User stories
+- As an aministrator, if Kubernetes is slow, I would like to know if the root
+cause of it is slow api-machinery (slow watch) or something farther the path
+(lack of network bandwidth, slow or cpu-starved controllers, ...)
+
+### Other notes
+- Pretty much all control loops in Kubernetes are watch-based. As a result
+slow watch means slow system in general.
+- Note that how we measure it silently assumes no clock-skew in case of
+cluster with multiple masters.
+
+### TODOs
+- Longer term, we would like to provide some guarantees on watch latency
+(e.g. 99th percentile of SLI per cluster-day <= Xms). However, we are not
+there yet.