Scalability questionaire questions proposal

author: wojtekt <wojtekt@google.com> 2019-10-21 13:11:33 +0200
committer: wojtekt <wojtekt@google.com> 2019-10-28 09:01:27 +0100
commit: 762a1cffe6ba573ae8fef03bbbcba35117f8ce9e (patch)
tree: 7928be1bddfce61396ce7c605670d24d8b807a23
parent: c80c4128c64f6e62d5829340bd49dd74721f44ef (diff)
1 files changed, 29 insertions, 2 deletions
diff --git a/sig-architecture/production-readiness.md b/sig-architecture/production-readiness.md
index 17ed49dc..9f953678 100644
--- a/sig-architecture/production-readiness.md
+++ b/sig-architecture/production-readiness.md
@@ -8,8 +8,7 @@ cause increased failures in production.
 ## Status
 
 The process and questoinnaire are currently under development as part of the
-[PRR KEP](https://github.com/kubernetes/enhancements/blob/master/keps/sig-architecture/20190731-production-readiness-review-process.md), with a target that reviews will be needed for features
-going into 1.18.
+[PRR KEP][], with a target that reviews will be needed for features going into 1.18.
 
 During the 1.17 cycle, the PRR team will be piloting the questionnaire and other
 aspects of the process.
@@ -28,6 +27,30 @@ aspects of the process.
     happens if it is subsequently upgraded again?
   - Are there tests for this?
 * Scalability
+  - Will enabling / using the feature result in any new API calls?
+    Describe them with their impact keeping in mind the [supported limits][]
+    (e.g. 5000 nodes per cluster, 100 pods/s churn) focusing mostly on:
+     - components listing and/or watching resources they didn't before
+     - API calls that may be triggered by changes of some Kubernetes
+       resources (e.g. update object X based on changes of object Y)
+     - periodic API calls to reconcile state (e.g. periodic fetching state,
+       heartbeats, leader election, etc.)
+  - Will enabling / using the feature result in supporting new API types?
+    How many objects of that type will be supported (and how that translates
+    to limitations for users)?
+  - Will enabling / using the feature result in increasing size or count
+    of the existing API objects?
+  - Will enabling / using the feature result in increasing time taken
+    by any operations covered by [existing SLIs/SLOs][] (e.g. by adding
+    additional work, introducing new steps in between, etc.)?
+    Please describe the details if so.
+  - Will enabling / using the feature result in non-negligible increase
+    of resource usage (CPU, RAM, disk IO, ...) in any components?
+    Things to keep in mind include: additional in-memory state, additional
+    non-trivial computations, excessive access to disks (including increased
+    log volume), significant amount of data sent and/or received over
+    network, etc. Think through this in both small and large cases, again
+    with respect to the [supported limits][].
 * Rollout, Upgrade, and Rollback Planning
 * Dependencies
   - Does this feature depend on any specific services running in the cluster
@@ -49,3 +72,7 @@ aspects of the process.
   - What are the most useful log messages and what logging levels do they require?
   - What steps should be taken if SLOs are not being met to determine the
     problem?
+
+[PRR KEP]: https://github.com/kubernetes/enhancements/blob/master/keps/sig-architecture/20190731-production-readiness-review-process.md
+[supported limits]: https://github.com/kubernetes/community/blob/master/sig-scalability/configs-and-limits/thresholds.md
+[existing SLIs/SLOs]: https://github.com/kubernetes/community/blob/master/sig-scalability/slos/slos.md#kubernetes-slisslos
author	wojtekt <wojtekt@google.com>	2019-10-21 13:11:33 +0200
committer	wojtekt <wojtekt@google.com>	2019-10-28 09:01:27 +0100
commit	762a1cffe6ba573ae8fef03bbbcba35117f8ce9e (patch)
tree	7928be1bddfce61396ce7c605670d24d8b807a23
parent	c80c4128c64f6e62d5829340bd49dd74721f44ef (diff)