summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--sig-architecture/production-readiness.md89
1 files changed, 79 insertions, 10 deletions
diff --git a/sig-architecture/production-readiness.md b/sig-architecture/production-readiness.md
index 4b6fc244..764dd161 100644
--- a/sig-architecture/production-readiness.md
+++ b/sig-architecture/production-readiness.md
@@ -5,21 +5,90 @@ Kubernetes are observable, scalable and supportable, can be safely operated in
production environments, and can be disabled or rolled back in the event they
cause increased failures in production.
-More details may be found in the [PRR KEP].
+Production readiness reviews are done by a separate team, apart from the SIG
+leads (although SIG lead approval is needed as well, of course). It is useful to
+have the viewpoint of a team that is not as familiar with the intimate details
+of the SIG, but *is* familiar with Kubernetes and with operating Kubernetes in
+production. Experience through our dry runs in 1.19 and 1.20 have shown that
+this slightly "outsider" view helps identify otherwise missed items.
-## Status
+More background may be found in the [PRR KEP].
-As of 1.19, production readiness reviews are required, and are part of the KEP
-process. The PRR questionnaire previously found here has been incorporated into
-the [KEP template]. The template details the specific questions that must be
-answered, depending on the stage of the feature.
+## Status
As of 1.21, PRRs are now blocking. PRR _approval_ is required for the enhancement
-to be part of the release.
+to be part of the release. This means that any KEPs targeting the release for any
+stage, and in - or moving to - `implementable` state, will require production
+readiness approval by the *Enhancements Freeze Date*.
+
+Note that some of the questions in the [KEP template] should be answered in both
+the KEP's README.md and the `kep.yaml`, in order to support automated checks on
+the PRR. The template points out these as needed.
+
+## Submitting a KEP for production readiness approval
+
+The KEP template production readiness questionnaire should be filled out by the
+KEP authors, and reviewed by the SIG leads. Once the leads are satisfied with
+both the overall KEP (i.e., it is ready to move to `implementable` state) and
+the PRR answers, the authors may request PRR approval:
+
+* Assign a PRR approver from the `prod-readiness-approvers` list in the
+ [OWNERS_ALIASES] file. This may be done earlier as well, to get early feedback
+ or just to let the approver know. Reach out on the `#prod-readiness` Slack
+ channel or just pick someone from the list.
+* Update the `kep.yaml`, setting the `stage`, `latest-milestone`, and the
+ `milestone` struct (which captures per-stage release versions).
+* Create a `prod-readiness/<sig>/<KEP number>.yaml` file, with the PRR
+ approver's GitHub handle for the specific stage
+* See this [example PRR approval request PR].
+
+The PRR approvers use the `kepctl` tool to identify
+all outstanding requests for PRR approval, and are responsible for providing
+timely feedback so that KEPs are not delayed. If your need is urgent it doesn't
+hurt to ping the person on Slack, as well, but it is not necessary.
+
+## Common feedback from reviewers
+
+Some common issues we see:
+* Missing metrics. Often metrics are overlooked, or sometimes they are defined
+ but not wired up (implemented). Please see this [example metrics PR] to better
+ understand how to add a metric.
+* Be sure to differentiate between when to use a *metric* vs an *event*. Metrics
+ are for the cluster operators and should be primarily about system status and issues,
+ whereas events can be used to surface user errors. That is, in general the audience
+ for metrics is cluster operator whereas the audience for events is the user (creator
+ of that resource). Users won't usually see metrics, and cluster operators
+ don't generally need to know about user mistakes.
+
+## Becoming a prod readiness reviewer or approver
+
+The prod readiness team is open and eager to add new members. The ideal
+production readiness approver is someone with deep knowledge of Kubernetes overall
+and the ability to think from the point of view of a cluster operator. An
+excellent background would, for example, be someone that is an SRE for a fleet
+of Kubernetes clusters.
+
+To become a reviewer:
+ * Inform the PRR team on Slack (`#prod-readiness`) or by attending the
+ bi-weekly meeting.
+ * Read/study previous PRR comments and production readiness responses in existing KEPs.
+ * Choose some KEPs requiring PRR and perform a review. Put "shadow prod readiness review"
+ in your review comments so that the assigned PRR approver knows your intent.
+ * After two release cycles, if you have shown good judgement and quality reviews,
+ you can propose yourself as approver by submitting a PRR to add your GitHub
+ handle to the `prod-readiness-approvers` alias in [OWNERS_ALIASES].
+
+## Finding KEPs needing prod readiness review
+
+The prod readiness team uses the [kepctl query] command line tool to identify KEP PRs
+that need review. For example:
-Note that some of the questions should be answered in both the KEP's README.md
-and the `kep.yaml`, in order to support automated checks on the PRR. The
-template points out these as needed.
+`./cmd/kepctl/kepctl query --sig '.*' --prr '@johnbelamaric' --include-prs
+--gh-token-path ~/gh-token --status implementable,provisional`
[PRR KEP]: https://git.k8s.io/enhancements/keps/sig-architecture/1194-prod-readiness
[KEP template]: https://git.k8s.io/enhancements/keps/NNNN-kep-template
+[OWNERS_ALIASES]: https://git.k8s.io/enhancements/OWNERS_ALIASES
+[example PRR approval request PR]: https://github.com/kubernetes/enhancements/pull/2274/files
+[example metrics PR]: https://github.com/kubernetes/kubernetes/pull/97814
+[kepctl query]: https://git.k8s.io/enhancements/cmd/kepctl