From bfed3e323e91ec90d2215ee3d1cf59b316f16e36 Mon Sep 17 00:00:00 2001 From: Bob Wise Date: Fri, 16 Jun 2017 14:24:27 -0700 Subject: Update goals.md --- sig-scalability/goals.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/sig-scalability/goals.md b/sig-scalability/goals.md index 7866bde8..ae59340f 100644 --- a/sig-scalability/goals.md +++ b/sig-scalability/goals.md @@ -70,6 +70,10 @@ NOTES: * Goal: 90 minutes * Time to restore a fully saturated large cluster is important for cluster-wide failure recovery, and/or related emergency capacity provisioning (e.g. building and populating a new cluster to replace capacity in a failed one). This number also needs to correlate with max pods per cluster, and max scheduler throughput (500,000 pods / 100 pods per second ~ 90 minutes). We believe that this fulfills most real-world recovery requirements. The required time to recovery is usually driven primarily by trying to reduce the probability of multiple uncorrelated cluster failures (e.g. "one of our 3 clusters has failed. We're just fine unless another one fails before we've repaired/replaced the first failed one"). +## Control Plane Configurations for Testing + +Configuration of the control plane for cluster testing varies by provider, and there multiple reasonable configurations. Discussion and guideline of control plane configuration options and standards are documented [here](provider-configs.md). + ## Open Questions 1. **What, if any, reasonable use cases exist for very large numbers of very small nodes (e.g. for isolation reasons - multitenant)? Based on comments so far, it seems that the answer is yes, and needs to be addressed.**
-- cgit v1.2.3