summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorChristian Bell <csbell@google.com>2017-07-14 11:05:22 -0700
committerGitHub <noreply@github.com>2017-07-14 11:05:22 -0700
commit7d705d6802e54c37f739d1b05deac0995ce54ea4 (patch)
treebcc203f4afb741212bdf4ea658942d15cd15c127
parent789f252a71e18ad9e853bf34748d4bb53f279ba8 (diff)
parentfa184860fec58529a465bd3e80df70a718f70c43 (diff)
Merge pull request #678 from madhusudancs/federation-buildcop
First draft of federation buildcop guide/playbook.
-rw-r--r--contributors/devel/on-call-federation-build-cop.md167
1 files changed, 167 insertions, 0 deletions
diff --git a/contributors/devel/on-call-federation-build-cop.md b/contributors/devel/on-call-federation-build-cop.md
new file mode 100644
index 00000000..bf8b427a
--- /dev/null
+++ b/contributors/devel/on-call-federation-build-cop.md
@@ -0,0 +1,167 @@
+# Federation Buildcop Guide and Playbook
+
+Federation runs two classes of tests: CI and Presubmits.
+
+## CI
+
+* These tests run on the HEADs of master and release branches (starting
+ from Kubernetes v1.6).
+* As a result, they run on code that's already merged.
+* As the name suggests, they run continuously. Currently, they are
+ configured to run
+ [at least once every 30 minutes](https://github.com/kubernetes/test-infra/blob/22c38cfb64137086373e1b89d5e7d98766560747/prow/config.yaml#L3686).
+* Federation CI tests run as
+ [periodic jobs on prow](https://github.com/kubernetes/test-infra/blob/22c38cfb64137086373e1b89d5e7d98766560747/prow/config.yaml#L3686).
+* CI jobs always run sequentially. In other words, no single CI job
+ can have two instances of the job running at the same time.
+
+### Configuration
+
+Configuration steps are described in https://github.com/kubernetes/test-infra/blob/0c56d2c9d32307c0a0f8fece85ef6919389e77fd/jenkins/README.md#how-to-work-with-jenkins-jobs
+
+The configuration of CI tests are stored in:
+
+* Jenkins config: https://github.com/kubernetes/test-infra/blob/0c56d2c9d32307c0a0f8fece85ef6919389e77fd/jenkins/job-configs/kubernetes-jenkins/bootstrap-ci.yaml
+* Test job/bootstrap config: https://github.com/kubernetes/test-infra/blob/0c56d2c9d32307c0a0f8fece85ef6919389e77fd/jobs/config.json
+* Test grid config: https://github.com/kubernetes/test-infra/blob/0c56d2c9d32307c0a0f8fece85ef6919389e77fd/testgrid/config/config.yaml
+* Job specific config: https://github.com/kubernetes/test-infra/tree/0c56d2c9d32307c0a0f8fece85ef6919389e77fd/jobs
+
+### Results
+
+Results of all the federation CI tests, including the soak tests, are
+listed in the corresponding tabs on the Cluster Federation page in the
+testgrid.
+https://k8s-testgrid.appspot.com/cluster-federation
+
+### Playbook
+
+#### Triggering a new run
+
+Please ping someone who has access to the Jenkins UI/dashboard and ask
+them to login and click the "Build Now" link on the Jenkins page
+corresponding to the CI job you want to manually start.
+
+#### Quota cleanup
+
+Please ping someone who has access to the GCP project. Ask them to
+look at the quotas and delete the leaked resources by clicking the
+delete button corresponding to those leaked resources on Google Cloud
+Console.
+
+
+## Presubmit
+
+* We only have one presubmit test, but it is configured very
+ differently than the CI tests.
+* The presubmit test is currently configured to run on the master
+ branch and any release branch that's 1.7 or newer.
+* Federation presubmit infrastructure is composed of two separate test
+ jobs:
+ * Deploy job: This job runs in the background and recycles federated
+ clusters every time it runs. Although this job supports federation
+ presubmit tests, it is configured as a CI/Soak job. More on
+ configuration later. Since recycling federated clusters is an
+ expensive operation, we do not want to run this often. Hence, this
+ job is configured to run once every 24 hours, around midnight
+ Pacific time.
+ * Test job: This is the job that runs federation presubmit tests on
+ every PR in the core repository, i.e.
+ [kubernetes/kubernetes](https://github.com/kubernetes/kubernetes).
+ These jobs can run in parallel on the PRs in the repository.
+
+### Two-jobs setup
+
+The deploy job runs once every 24 hours at around midnight Pacific
+time. It is configured to turn up and tear down 3 federated clusters.
+It starts out by downloading the latest Kubernetes release built from
+[kubernetes/kubernetes](https://github.com/kubernetes/kubernetes)
+master. It then tears down the existing federated clusters and turns
+up new ones. As the clusters are created, their kubeconfigs are
+written to a local kubeconfig file where the job runs. Once all the
+clusters are successfully turned up, the local kubeconfig is then
+copied to a pre-configured GCS bucket. Any existing kubeconfig in the
+bucket will be overwritten.
+
+The test job on the other hand starts by copying the latest kubeconfig
+from the pre-configured GCS bucket. It uses this kubeconfig to deploy
+a new federation control plane on one of the clusters in the
+kubeconfig. It then joins all the clusters in the kubeconfig, including
+the host cluster where federation control plane is deployed, as members
+to the newly created federation control plane. The test job then runs
+the federation presubmit tests on this control plane and tears down the
+control plane in the end.
+
+Since federated clusters are recycled only once every 24 hours, all
+presubmit runs in that period share the federated clusters. And since
+there could be multiple presubmit tests running in parallel, each
+instance of the test gets its own namespace where it deploys the
+federation control plane. These federation control planes deployed in
+separate namespaces are independent of each other and do not interfere
+with other federation control planes in any way.
+
+### Configuration
+
+The two jobs are configured differently.
+
+#### Deploy job
+
+The deploy job is configured as a CI/Soak job in Jenkins.
+Configuration steps are described in https://github.com/kubernetes/test-infra/blob/0c56d2c9d32307c0a0f8fece85ef6919389e77fd/jenkins/README.md#how-to-work-with-jenkins-jobs
+
+The configuration of the deploy job is stored in:
+
+* Jenkins config: https://github.com/kubernetes/test-infra/blob/0c56d2c9d32307c0a0f8fece85ef6919389e77fd/jenkins/job-configs/kubernetes-jenkins/bootstrap-ci-soak.yaml#L76
+* Test job/bootstrap config: https://github.com/kubernetes/test-infra/blob/0c56d2c9d32307c0a0f8fece85ef6919389e77fd/jobs/config.json#L3996
+* Test grid config: https://github.com/kubernetes/test-infra/blob/0c56d2c9d32307c0a0f8fece85ef6919389e77fd/testgrid/config/config.yaml#L152
+* Job specific config: https://github.com/kubernetes/test-infra/blob/0c56d2c9d32307c0a0f8fece85ef6919389e77fd/jobs/ci-kubernetes-pull-gce-federation-deploy.env
+
+#### Test job
+
+The test job is
+[configured in prow](https://github.com/kubernetes/test-infra/blob/35ceb37e999bb0589218708262634951b79dfe05/prow/config.yaml#L236),
+but it runs in Jenkins mode. The configuration steps are described in
+https://github.com/kubernetes/test-infra/blob/0c56d2c9d32307c0a0f8fece85ef6919389e77fd/README.md#create-a-new-job
+
+The configuration of the test job is stored in:
+
+* Prow config: https://github.com/kubernetes/test-infra/blob/0c56d2c9d32307c0a0f8fece85ef6919389e77fd/prow/config.yaml#L244
+* Test job/bootstrap config: https://github.com/kubernetes/test-infra/blob/0c56d2c9d32307c0a0f8fece85ef6919389e77fd/jobs/config.json#L4691
+* Job specific config: https://github.com/kubernetes/test-infra/blob/0c56d2c9d32307c0a0f8fece85ef6919389e77fd/jobs/pull-kubernetes-federation-e2e-gce.env
+
+### Results
+
+Aggregated results are available on the Gubernator dashboard page for
+the federation presubmit tests.
+
+https://k8s-gubernator.appspot.com/builds/kubernetes-jenkins/pr-logs/directory/pull-kubernetes-federation-e2e-gce
+
+### Metrics
+
+We track the flakiness metrics of all the presubmit jobs and
+individual tests that run against PRs in
+[kubernetes/kubernetes](https://github.com/kubernetes/kubernetes).
+
+* The metrics that we track are documented in https://github.com/kubernetes/test-infra/blob/0c56d2c9d32307c0a0f8fece85ef6919389e77fd/metrics/README.md#metrics.
+* Job-level metrics are available in - [http://storage.googleapis.com/k8s-metrics/job-flakes-latest.json]().
+* As of this writing, federation presubmits have a [success rate of
+ 93.4%](http://storage.googleapis.com/k8s-metrics/job-flakes-latest.json).
+
+### Playbook
+
+#### Triggering a new deploy job run
+
+Please ping someone who has access to the Jenkins UI/dashboard and ask
+them to login and click the "Build Now" link on the Jenkins page
+corresponding to the CI job you want to manually start.
+
+#### Triggering a new test run
+
+Use the `/test` command on the PR to retrigger the test. The exact
+incantation is: `/test pull-kubernetes-federation-e2e-gce`
+
+#### Quota cleanup
+
+Please ping someone who has access to `k8s-jkns-pr-bldr-e2e-gce-fdrtn`
+GCP project. Ask them to look at the quotas and delete the leaked
+resources by clicking the delete button corresponding to those leaked
+resources on Google Cloud Console.