summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorMichail Kargakis <mkargaki@redhat.com>2017-04-27 18:45:01 +0200
committerGitHub <noreply@github.com>2017-04-27 18:45:01 +0200
commit7bcff32eca6bbe4c68a4767f2baf5d41957ff83a (patch)
tree4b09102a84eda5811cc2fd1ea800d8eb2c15012b
parent44914b05c70516fdc2d1b80a202f8fa3f9bc9eea (diff)
parentca952a81dd7ffeb55a5ea627890d6092ca7bc1a8 (diff)
Merge pull request #477 from kargakis/alternative-deployment-proposal
Refine the Deployment proposal and switch hashing algorithm
-rw-r--r--contributors/design-proposals/deployment.md187
1 files changed, 111 insertions, 76 deletions
diff --git a/contributors/design-proposals/deployment.md b/contributors/design-proposals/deployment.md
index f12ffc9e..b064f47d 100644
--- a/contributors/design-proposals/deployment.md
+++ b/contributors/design-proposals/deployment.md
@@ -1,16 +1,24 @@
# Deployment
+Authors:
+- Brian Grant (@bgrant0607)
+- Clayton Coleman (@smarterclayton)
+- Dan Mace (@ironcladlou)
+- David Oppenheimer (@davidopp)
+- Janet Kuo (@janetkuo)
+- Michail Kargakis (@kargakis)
+- Nikhil Jindal (@nikhiljindal)
+
## Abstract
A proposal for implementing a new resource - Deployment - which will enable
-declarative config updates for Pods and ReplicationControllers.
-
-Users will be able to create a Deployment, which will spin up
-a ReplicationController to bring up the desired pods.
-Users can also target the Deployment at existing ReplicationControllers, in
-which case the new RC will replace the existing ones. The exact mechanics of
-replacement depends on the DeploymentStrategy chosen by the user.
-DeploymentStrategies are explained in detail in a later section.
+declarative config updates for ReplicaSets. Users will be able to create a
+Deployment, which will spin up a ReplicaSet to bring up the desired Pods.
+Users can also target the Deployment to an existing ReplicaSet either by
+rolling back an existing Deployment or creating a new Deployment that can
+adopt an existing ReplicaSet. The exact mechanics of replacement depends on
+the DeploymentStrategy chosen by the user. DeploymentStrategies are explained
+in detail in a later section.
## Implementation
@@ -33,10 +41,10 @@ type Deployment struct {
type DeploymentSpec struct {
// Number of desired pods. This is a pointer to distinguish between explicit
// zero and not specified. Defaults to 1.
- Replicas *int
+ Replicas *int32
- // Label selector for pods. Existing ReplicationControllers whose pods are
- // selected by this will be scaled down. New ReplicationControllers will be
+ // Label selector for pods. Existing ReplicaSets whose pods are
+ // selected by this will be scaled down. New ReplicaSets will be
// created with this selector, with a unique label `pod-template-hash`.
// If Selector is empty, it is defaulted to the labels present on the Pod template.
Selector map[string]string
@@ -46,14 +54,17 @@ type DeploymentSpec struct {
// The deployment strategy to use to replace existing pods with new ones.
Strategy DeploymentStrategy
+
+ // Minimum number of seconds for which a newly created pod should be ready
+ // without any of its container crashing, for it to be considered available.
+ // Defaults to 0 (pod will be considered available as soon as it is ready)
+ MinReadySeconds int32
}
type DeploymentStrategy struct {
// Type of deployment. Can be "Recreate" or "RollingUpdate".
Type DeploymentStrategyType
- // TODO: Update this to follow our convention for oneOf, whatever we decide it
- // to be.
// Rolling update config params. Present only if DeploymentStrategyType =
// RollingUpdate.
RollingUpdate *RollingUpdateDeploymentStrategy
@@ -65,7 +76,8 @@ const (
// Kill all existing pods before creating new ones.
RecreateDeploymentStrategyType DeploymentStrategyType = "Recreate"
- // Replace the old RCs by new one using rolling update i.e gradually scale down the old RCs and scale up the new one.
+ // Replace the old ReplicaSets by new one using rolling update i.e gradually scale
+ // down the old ReplicaSets and scale up the new one.
RollingUpdateDeploymentStrategyType DeploymentStrategyType = "RollingUpdate"
)
@@ -94,20 +106,20 @@ type RollingUpdateDeploymentStrategy struct {
// new RC can be scaled up further, ensuring that total number of pods running
// at any time during the update is atmost 130% of original pods.
MaxSurge IntOrString
-
- // Minimum number of seconds for which a newly created pod should be ready
- // without any of its container crashing, for it to be considered available.
- // Defaults to 0 (pod will be considered available as soon as it is ready)
- MinReadySeconds int
}
type DeploymentStatus struct {
// Total number of ready pods targeted by this deployment (this
// includes both the old and new pods).
- Replicas int
+ Replicas int32
// Total number of new ready pods with the desired template spec.
- UpdatedReplicas int
+ UpdatedReplicas int32
+
+ // Monotonically increasing counter that tracks hash collisions for
+ // the Deployment. Used as a collision avoidance mechanism by the
+ // Deployment controller.
+ Uniquifier *int64
}
```
@@ -116,38 +128,42 @@ type DeploymentStatus struct {
#### Deployment Controller
-The DeploymentController will make Deployments happen.
-It will watch Deployment objects in etcd.
-For each pending deployment, it will:
+The DeploymentController will process Deployments and crud ReplicaSets.
+For each creation or update for a Deployment, it will:
-1. Find all RCs whose label selector is a superset of DeploymentSpec.Selector.
- - For now, we will do this in the client - list all RCs and then filter the
+1. Find all RSs (ReplicaSets) whose label selector is a superset of DeploymentSpec.Selector.
+ - For now, we will do this in the client - list all RSs and then filter the
ones we want. Eventually, we want to expose this in the API.
-2. The new RC can have the same selector as the old RC and hence we add a unique
- selector to all these RCs (and the corresponding label to their pods) to ensure
- that they do not select the newly created pods (or old pods get selected by
- new RC).
+2. The new RS can have the same selector as the old RS and hence we add a unique
+ selector to all these RSs (and the corresponding label to their pods) to ensure
+ that they do not select the newly created pods (or old pods get selected by the
+ new RS).
- The label key will be "pod-template-hash".
- - The label value will be hash of the podTemplateSpec for that RC without
- this label. This value will be unique for all RCs, since PodTemplateSpec should be unique.
- - If the RCs and pods dont already have this label and selector:
- - We will first add this to RC.PodTemplateSpec.Metadata.Labels for all RCs to
+ - The label value will be the hash of {podTemplateSpec+uniquifier} where podTemplateSpec
+ is the one that the new RS uses and uniquifier is a counter in the DeploymentStatus
+ that increments every time a [hash collision](#hashing-collisions) happens (hash
+ collisions should be rare with fnv).
+ - If the RSs and pods dont already have this label and selector:
+ - We will first add this to RS.PodTemplateSpec.Metadata.Labels for all RSs to
ensure that all new pods that they create will have this label.
- - Then we will add this label to their existing pods and then add this as a selector
- to that RC.
-3. Find if there exists an RC for which value of "pod-template-hash" label
+ - Then we will add this label to their existing pods
+ - Eventually we flip the RS selector to use the new label.
+ This process potentially can be abstracted to a new endpoint for controllers [1].
+3. Find if there exists an RS for which value of "pod-template-hash" label
is same as hash of DeploymentSpec.PodTemplateSpec. If it exists already, then
- this is the RC that will be ramped up. If there is no such RC, then we create
+ this is the RS that will be ramped up. If there is no such RS, then we create
a new one using DeploymentSpec and then add a "pod-template-hash" label
- to it. RCSpec.replicas = 0 for a newly created RC.
-4. Scale up the new RC and scale down the olds ones as per the DeploymentStrategy.
- - Raise an event if we detect an error, like new pods failing to come up.
-5. Go back to step 1 unless the new RC has been ramped up to desired replicas
- and the old RCs have been ramped down to 0.
-6. Cleanup.
+ to it. The size of the new RS depends on the used DeploymentStrategyType
+4. Scale up the new RS and scale down the olds ones as per the DeploymentStrategy.
+ Raise events appropriately (both in case of failure or success).
+5. Go back to step 1 unless the new RS has been ramped up to desired replicas
+ and the old RSs have been ramped down to 0.
+6. Cleanup old RSs as per revisionHistoryLimit.
DeploymentController is stateless so that it can recover in case it crashes during a deployment.
+[1] See https://github.com/kubernetes/kubernetes/issues/36897
+
### MinReadySeconds
We will implement MinReadySeconds using the Ready condition in Pod. We will add
@@ -163,52 +179,71 @@ LastTransitionTime to PodCondition.
### Updating
-Users can update an ongoing deployment before it is completed.
-In this case, the existing deployment will be stalled and the new one will
+Users can update an ongoing Deployment before it is completed.
+In this case, the existing rollout will be stalled and the new one will
begin.
-For ex: consider the following case:
-- User creates a deployment to rolling-update 10 pods with image:v1 to
+For example, consider the following case:
+- User updates a Deployment to rolling-update 10 pods with image:v1 to
pods with image:v2.
-- User then updates this deployment to create pods with image:v3,
- when the image:v2 RC had been ramped up to 5 pods and the image:v1 RC
+- User then updates this Deployment to create pods with image:v3,
+ when the image:v2 RS had been ramped up to 5 pods and the image:v1 RS
had been ramped down to 5 pods.
-- When Deployment Controller observes the new deployment, it will create
- a new RC for creating pods with image:v3. It will then start ramping up this
- new RC to 10 pods and will ramp down both the existing RCs to 0.
+- When Deployment Controller observes the new update, it will create
+ a new RS for creating pods with image:v3. It will then start ramping up this
+ new RS to 10 pods and will ramp down both the existing RSs to 0.
### Deleting
-Users can pause/cancel a deployment by deleting it before it is completed.
-Recreating the same deployment will resume it.
-For ex: consider the following case:
-- User creates a deployment to rolling-update 10 pods with image:v1 to
- pods with image:v2.
-- User then deletes this deployment while the old and new RCs are at 5 replicas each.
- User will end up with 2 RCs with 5 replicas each.
-User can then create the same deployment again in which case, DeploymentController will
-notice that the second RC exists already which it can ramp up while ramping down
+Users can pause/cancel a rollout by doing a non-cascading deletion of the Deployment
+before it is complete. Recreating the same Deployment will resume it.
+For example, consider the following case:
+- User creats a Deployment to perform a rolling-update for 10 pods from image:v1 to
+ image:v2.
+- User then deletes the Deployment while the old and new RSs are at 5 replicas each.
+ User will end up with 2 RSs with 5 replicas each.
+User can then re-create the same Deployment again in which case, DeploymentController will
+notice that the second RS exists already which it can ramp up while ramping down
the first one.
### Rollback
-We want to allow the user to rollback a deployment. To rollback a
-completed (or ongoing) deployment, user can create (or update) a deployment with
-DeploymentSpec.PodTemplateSpec = oldRC.PodTemplateSpec.
+We want to allow the user to rollback a Deployment. To rollback a completed (or
+ongoing) Deployment, users can simply use `kubectl rollout undo` or update the
+Deployment directly by using its spec.rollbackTo.revision field and specify the
+revision they want to rollback to or no revision which means that the Deployment
+will be rolled back to its previous revision.
## Deployment Strategies
-DeploymentStrategy specifies how the new RC should replace existing RCs.
-To begin with, we will support 2 types of deployment:
-* Recreate: We kill all existing RCs and then bring up the new one. This results
- in quick deployment but there is a downtime when old pods are down but
+DeploymentStrategy specifies how the new RS should replace existing RSs.
+To begin with, we will support 2 types of Deployment:
+* Recreate: We kill all existing RSs and then bring up the new one. This results
+ in quick Deployment but there is a downtime when old pods are down but
the new ones have not come up yet.
-* Rolling update: We gradually scale down old RCs while scaling up the new one.
- This results in a slower deployment, but there is no downtime. At all times
- during the deployment, there are a few pods available (old or new). The number
- of available pods and when is a pod considered "available" can be configured
- using RollingUpdateDeploymentStrategy.
-
-In future, we want to support more deployment types.
+* Rolling update: We gradually scale down old RSs while scaling up the new one.
+ This results in a slower Deployment, but there can be no downtime. Depending on
+ the strategy parameters, it is possible to have at all times during the rollout
+ available pods (old or new). The number of available pods and when is a pod
+ considered "available" can be configured using RollingUpdateDeploymentStrategy.
+
+## Hashing collisions
+
+Hashing collisions are a real thing with the existing hashing algorithm[1]. We
+need to switch to a more stable algorithm like fnv. Preliminary benchmarks[2]
+show that while fnv is a bit slower than adler, it is much more stable. Also,
+hashing an API object is subject to API changes which means that the name
+for a ReplicaSet may differ between minor Kubernetes versions.
+
+For both of the aforementioned cases, we will use a field in the DeploymentStatus,
+called Uniquifier, to create a unique hash value when a hash collision happens.
+The Deployment controller will compute the hash value of {template+uniquifier},
+and will use the resulting hash in the ReplicaSet names and selectors. One side
+effect of this hash collision avoidance mechanism is that we don't need to
+migrate ReplicaSets that were created with adler.
+
+[1] https://github.com/kubernetes/kubernetes/issues/29735
+
+[2] https://github.com/kubernetes/kubernetes/pull/39527
## Future