diff options
| author | wojtekt <wojtekt@google.com> | 2019-10-21 13:11:33 +0200 |
|---|---|---|
| committer | wojtekt <wojtekt@google.com> | 2019-10-28 09:01:27 +0100 |
| commit | 762a1cffe6ba573ae8fef03bbbcba35117f8ce9e (patch) | |
| tree | 7928be1bddfce61396ce7c605670d24d8b807a23 /sig-architecture | |
| parent | c80c4128c64f6e62d5829340bd49dd74721f44ef (diff) | |
Scalability questionaire questions proposal
Diffstat (limited to 'sig-architecture')
| -rw-r--r-- | sig-architecture/production-readiness.md | 31 |
1 files changed, 29 insertions, 2 deletions
diff --git a/sig-architecture/production-readiness.md b/sig-architecture/production-readiness.md index 17ed49dc..9f953678 100644 --- a/sig-architecture/production-readiness.md +++ b/sig-architecture/production-readiness.md @@ -8,8 +8,7 @@ cause increased failures in production. ## Status The process and questoinnaire are currently under development as part of the -[PRR KEP](https://github.com/kubernetes/enhancements/blob/master/keps/sig-architecture/20190731-production-readiness-review-process.md), with a target that reviews will be needed for features -going into 1.18. +[PRR KEP][], with a target that reviews will be needed for features going into 1.18. During the 1.17 cycle, the PRR team will be piloting the questionnaire and other aspects of the process. @@ -28,6 +27,30 @@ aspects of the process. happens if it is subsequently upgraded again? - Are there tests for this? * Scalability + - Will enabling / using the feature result in any new API calls? + Describe them with their impact keeping in mind the [supported limits][] + (e.g. 5000 nodes per cluster, 100 pods/s churn) focusing mostly on: + - components listing and/or watching resources they didn't before + - API calls that may be triggered by changes of some Kubernetes + resources (e.g. update object X based on changes of object Y) + - periodic API calls to reconcile state (e.g. periodic fetching state, + heartbeats, leader election, etc.) + - Will enabling / using the feature result in supporting new API types? + How many objects of that type will be supported (and how that translates + to limitations for users)? + - Will enabling / using the feature result in increasing size or count + of the existing API objects? + - Will enabling / using the feature result in increasing time taken + by any operations covered by [existing SLIs/SLOs][] (e.g. by adding + additional work, introducing new steps in between, etc.)? + Please describe the details if so. + - Will enabling / using the feature result in non-negligible increase + of resource usage (CPU, RAM, disk IO, ...) in any components? + Things to keep in mind include: additional in-memory state, additional + non-trivial computations, excessive access to disks (including increased + log volume), significant amount of data sent and/or received over + network, etc. Think through this in both small and large cases, again + with respect to the [supported limits][]. * Rollout, Upgrade, and Rollback Planning * Dependencies - Does this feature depend on any specific services running in the cluster @@ -49,3 +72,7 @@ aspects of the process. - What are the most useful log messages and what logging levels do they require? - What steps should be taken if SLOs are not being met to determine the problem? + +[PRR KEP]: https://github.com/kubernetes/enhancements/blob/master/keps/sig-architecture/20190731-production-readiness-review-process.md +[supported limits]: https://github.com/kubernetes/community/blob/master/sig-scalability/configs-and-limits/thresholds.md +[existing SLIs/SLOs]: https://github.com/kubernetes/community/blob/master/sig-scalability/slos/slos.md#kubernetes-slisslos |
