summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorKubernetes Submit Queue <k8s-merge-robot@users.noreply.github.com>2017-11-19 23:33:47 -0800
committerGitHub <noreply@github.com>2017-11-19 23:33:47 -0800
commit3a991bd962d86628034834bf998489df53eedaac (patch)
treec08324e0af10f7ed9511f4365b2b58c9f5fb9d97
parent164e6b51cc8311acd11ff242ac412fbf7a5c31ae (diff)
parent3b1d9698799cdbdcf91877569a7ed34463e92c62 (diff)
Merge pull request #1411 from shyamjvs/update-scale-jobs-automation-schedule
Automatic merge from submit-queue. Update scalability-validation doc to reflect current job schedule Following from https://github.com/kubernetes/test-infra/pull/5532 /cc @porridge
-rw-r--r--contributors/devel/release/scalability-validation.md59
1 files changed, 35 insertions, 24 deletions
diff --git a/contributors/devel/release/scalability-validation.md b/contributors/devel/release/scalability-validation.md
index 5aae1cbd..08ecf09b 100644
--- a/contributors/devel/release/scalability-validation.md
+++ b/contributors/devel/release/scalability-validation.md
@@ -41,37 +41,47 @@ We need to run them on 5k-node clusters, but they’re:
- Blocking other large tests (quota limitations + only one large test project available viz. 'kubernetes-scale')
So we don’t want to run them too frequently. On the other hand, running them too infrequently means late identification and piling up of regressions. So we choose the following middleground:
-(B = release-blocking, NB = not release-blocking)
-
-- Performance tests on 2k-node cluster in GCE/GKE alternatingly each week (NB)
- - would catch most scale-related regressions
- - would catch bugs in both GCE and GKE
-- Performance tests on 5k-node cluster in GCE each week (B)
- - would catch scale-related bugs missed by 2k-node test (should be rare)
-- Correctness tests on 2k-node cluster in GCE/GKE alternatingly each week (NB)
- - correctness tests failing on just large clusters is rare, so weekly is enough
- - would catch bugs in both GCE and GKE
-- Correctness tests on 5k-node cluster in GCE/GKE alternatingly each week (B for GCE)
- - would catch bugs left out by 2k-node (should be rare)
- - would verify 5k-node clusters on GKE but without blocking the release
+
+- Performance tests on 2k-node/5k-node GCE clusters alternatingly from Mon-Sat
+ - would give us one performance run from each day to help catch regressions fast
+ - running 2k-node on alternating days gives time for 5k-node correctness tests to run on those days
+ - many of the performance regressions on 5k-node should also be seen on 2k-node (albeit a smaller version probably)
+- Correctness tests on 2k-node/5k-node GCE clusters alternatingly from Mon-Sat
+ - would give us one correctness run from each day to help catch regressions fast
+ - running 2k-node on alternating days gives time for 5k-node performance tests to run on those days
+ - many of the correctness regressions on 5k-node should also be seen on 2k-node
+- Performance tests on 2k-node GKE cluster on Sun
+ - would give us a performance run for sunday too
+ - would also additionally help verify performance of GKE
+- Correctness tests on 2k-node GKE cluster on Sun
+ - would give us a correctness run for sunday too
+ - would also additionally help verify correctness of GKE
Here's the proposed schedule (may be fine-tuned later based on test health / release schedule):
+(B = release-blocking job)
+
+| Day | | |
+| ------------- |:-------------:| -----:|
+| Mon | 5k-node performance @ 00:01 PT (B) | 2k-node correctness @ 22:01 PT |
+| Tue | 2k-node performance @ 05:01 PT | 5k-node correctness @ 14:01 PT (B) |
+| Wed | 5k-node performance @ 00:01 PT (B) | 2k-node correctness @ 22:01 PT |
+| Thu | 2k-node performance @ 05:01 PT | 5k-node correctness @ 14:01 PT (B) |
+| Fri | 5k-node performance @ 00:01 PT (B) | 2k-node correctness @ 22:01 PT |
+| Sat | 2k-node performance @ 05:01 PT | 5k-node correctness @ 14:01 PT (B) |
+| Sun | 'GKE' 2k-node performance @ 05:01 PT | 'GKE' 2k-node correctness @ 15:01 PT |
-| | Mon | Tue | Wed | Thu | Fri | Sat | Sun
-| ---- | :----: | :----: | :----:| :----: | :----: | :----: | :----:
-| Week 2k | GCE 5k perf (B) | | | GCE 5k corr (B) | | GKE 2k perf+corr (NB) |
-| Week 2k+1 | GCE 5k perf (B) | | | GKE 5k corr (NB) | | GCE 2k perf+corr (NB) |
+Note: The above schedule is subject to change based on job health, release requirements, etc. You should find it up-to-date in this [calendar].
Why this schedule?
-- 5k tests might need special attention in case of failures so they should run on weekdays
-- Running 5k perf test on Mon gives the whole week for debugging in case of failures
-- Running 5k corr test on Thu gives Tue-Wed to debug any setup issues found on Mon
-- Running 5k corr tests alternatingly on GKE and GCE validates both setups
-- 2k perf and corr tests can be run in parallel to save time
-- Running 2k tests on weekend will free up a weekday for the project to run other tests
+- 5k tests might need special attention in case of failures so they should mostly run on weekdays (EDIT: Given that they're quite stable now, we're trying running them on weekend too)
+- Running a large-scale performance job and a large-scale correctness job each day would:
+ - help catch regressions on a daily basis
+ - help verify fixes with low latency
+ - ensure a good release signal
+- Running large scale tests on GKE once a week would help verify GKE setup also, at no real loss of signal ideally
-Why GKE tests?
+Why run GKE tests at all?
Google is currently using a single project for scalability testing, on both GCE and GKE. As a result we need to schedule them together. There's a plan for CNCF becoming responsible for funding k8s testing, and GCE/GKE tests would be separated to different projects when that happens, with only GCE being funded by them. This ensures fairness across all cloud providers.
@@ -123,3 +133,4 @@ Responsibilities lying with other SIGs/teams as applicable (could be sig-scalabi
[here]: https://docs.google.com/document/d/15rD6XBtKyvXXifkRAsAVFBqEGApQxDRWM3H1bZSBsKQ
[#47865]: https://github.com/kubernetes/kubernetes/issues/47865
[test_owners.csv]: https://github.com/kubernetes/kubernetes/blob/master/test/test_owners.csv
+[calendar]: https://calendar.google.com/calendar?cid=Z29vZ2xlLmNvbV9tNHA3bG1jODVubGlmazFxYzRnNTRqZjg4a0Bncm91cC5jYWxlbmRhci5nb29nbGUuY29t