Renamed BackoffDeadlineSeconds to BackoffSeconds

author: Maciej Szulik <maszulik@redhat.com> 2017-08-23 22:23:41 +0200
committer: Maciej Szulik <maszulik@redhat.com> 2017-08-23 22:23:41 +0200
commit: f719b7e3835cae255fcf23b372f212766a5438b3 (patch)
tree: 04d6fd5b21f2f6d8e763a5f53ec6040165ecf5af
parent: 5b6d2ce880cad0deb7199e09b1a7ef86ec5f6660 (diff)
1 files changed, 13 insertions, 10 deletions
diff --git a/contributors/design-proposals/job.md b/contributors/design-proposals/job.md
index b23c3e02..c5a21927 100644
--- a/contributors/design-proposals/job.md
+++ b/contributors/design-proposals/job.md
@@ -34,14 +34,12 @@ which is mistakenly taken as Job's restart policy ([#30243](https://github.com/k
 [#[43964](https://github.com/kubernetes/kubernetes/issues/43964)]).  There are
 situation where one wants to fail a Job after some amount of retries over a certain
 period of time, due to a logical error in configuration etc.  To do so we are going
-to introduce following fields, which will control the backoff policy: a number of
-retries and a time to retry (counted from the first failure).  The two fields will
-allow fine-grained control over the backoff policy, limiting the number of retries
-over a specified period of time.  If only one of the two fields is supplied,
-a backoff with an intervening duration of ten seconds and a factor of two will be
-applied, such that either:
-* the number of retries will not exceed a specified count, if present, or
-* the maximum time elapsed will not exceed the specified duration, if present.
+to introduce the following fields, which will control the backoff policy: a number of
+retries and an initial time of retry.  The two fields will allow fine-grained control
+over the backoff policy.  Each of the two fields will use a default value if none
+is provided,  `BackoffLimit` is set by default to 6 and `BackoffSeconds` to 10s.
+This will result in the following retry sequence: 10s, 20s, 40s, 1m20s, 2m40s,
+5m20s.  After which the job will be considered failed.
 
 Additionally, to help debug the issue with a Job, and limit the impact of having
 too many failed pods left around (as mentioned in [#30243](https://github.com/kubernetes/kubernetes/issues/30243)),
@@ -56,6 +54,9 @@ is set on a `PodTemplate`.  The only difference applies to how failures are coun
 For restart policy `Never` we count actual pod failures (reflected in `.status.failed`
 field). With restart policy `OnFailure` we take an approximate value of pod restarts
 (as reported in `.status.containerStatuses[*].restartCount`).
+When `.spec.parallelism` is set to a value higher than 1, the failures are an
+overall number (as coming from `.status.failed`) because the controller does not
+hold information about failures coming from separate pods.
 
 
 ## Implementation
@@ -118,10 +119,12 @@ type JobSpec struct {
     ActiveDeadlineSeconds *int64
 
     // Optional number of retries before marking this job failed.
+    // Defaults to 6.
     BackoffLimit *int32
 
-    // Optional time (in seconds) specifying how long a job should be retried before marking it failed.
-    BackoffDeadlineSeconds *int64
+    // Optional time (in seconds) specifying how long the initial backoff will last.
+    // Defaults to 10s.
+    BackoffSeconds *int64
 
     // Optional number of failed pods to retain.
     FailedPodsLimit *int32
author	Maciej Szulik <maszulik@redhat.com>	2017-08-23 22:23:41 +0200
committer	Maciej Szulik <maszulik@redhat.com>	2017-08-23 22:23:41 +0200
commit	f719b7e3835cae255fcf23b372f212766a5438b3 (patch)
tree	04d6fd5b21f2f6d8e763a5f53ec6040165ecf5af
parent	5b6d2ce880cad0deb7199e09b1a7ef86ec5f6660 (diff)