summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorwojtekt <wojtekt@google.com>2019-10-21 13:11:33 +0200
committerwojtekt <wojtekt@google.com>2019-10-28 09:01:27 +0100
commit762a1cffe6ba573ae8fef03bbbcba35117f8ce9e (patch)
tree7928be1bddfce61396ce7c605670d24d8b807a23
parentc80c4128c64f6e62d5829340bd49dd74721f44ef (diff)
Scalability questionaire questions proposal
-rw-r--r--sig-architecture/production-readiness.md31
1 files changed, 29 insertions, 2 deletions
diff --git a/sig-architecture/production-readiness.md b/sig-architecture/production-readiness.md
index 17ed49dc..9f953678 100644
--- a/sig-architecture/production-readiness.md
+++ b/sig-architecture/production-readiness.md
@@ -8,8 +8,7 @@ cause increased failures in production.
## Status
The process and questoinnaire are currently under development as part of the
-[PRR KEP](https://github.com/kubernetes/enhancements/blob/master/keps/sig-architecture/20190731-production-readiness-review-process.md), with a target that reviews will be needed for features
-going into 1.18.
+[PRR KEP][], with a target that reviews will be needed for features going into 1.18.
During the 1.17 cycle, the PRR team will be piloting the questionnaire and other
aspects of the process.
@@ -28,6 +27,30 @@ aspects of the process.
happens if it is subsequently upgraded again?
- Are there tests for this?
* Scalability
+ - Will enabling / using the feature result in any new API calls?
+ Describe them with their impact keeping in mind the [supported limits][]
+ (e.g. 5000 nodes per cluster, 100 pods/s churn) focusing mostly on:
+ - components listing and/or watching resources they didn't before
+ - API calls that may be triggered by changes of some Kubernetes
+ resources (e.g. update object X based on changes of object Y)
+ - periodic API calls to reconcile state (e.g. periodic fetching state,
+ heartbeats, leader election, etc.)
+ - Will enabling / using the feature result in supporting new API types?
+ How many objects of that type will be supported (and how that translates
+ to limitations for users)?
+ - Will enabling / using the feature result in increasing size or count
+ of the existing API objects?
+ - Will enabling / using the feature result in increasing time taken
+ by any operations covered by [existing SLIs/SLOs][] (e.g. by adding
+ additional work, introducing new steps in between, etc.)?
+ Please describe the details if so.
+ - Will enabling / using the feature result in non-negligible increase
+ of resource usage (CPU, RAM, disk IO, ...) in any components?
+ Things to keep in mind include: additional in-memory state, additional
+ non-trivial computations, excessive access to disks (including increased
+ log volume), significant amount of data sent and/or received over
+ network, etc. Think through this in both small and large cases, again
+ with respect to the [supported limits][].
* Rollout, Upgrade, and Rollback Planning
* Dependencies
- Does this feature depend on any specific services running in the cluster
@@ -49,3 +72,7 @@ aspects of the process.
- What are the most useful log messages and what logging levels do they require?
- What steps should be taken if SLOs are not being met to determine the
problem?
+
+[PRR KEP]: https://github.com/kubernetes/enhancements/blob/master/keps/sig-architecture/20190731-production-readiness-review-process.md
+[supported limits]: https://github.com/kubernetes/community/blob/master/sig-scalability/configs-and-limits/thresholds.md
+[existing SLIs/SLOs]: https://github.com/kubernetes/community/blob/master/sig-scalability/slos/slos.md#kubernetes-slisslos