summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorMaciej Szulik <maszulik@redhat.com>2017-08-25 17:29:05 +0200
committerMaciej Szulik <maszulik@redhat.com>2017-08-25 17:29:05 +0200
commitf65a602d626479e93820324c4ffbe9a203da0dd4 (patch)
tree917446d186c70037b134cab0ade9b29d4162d05a
parentf719b7e3835cae255fcf23b372f212766a5438b3 (diff)
Which backoff fields apply to which restart policy
-rw-r--r--contributors/design-proposals/job.md13
1 files changed, 8 insertions, 5 deletions
diff --git a/contributors/design-proposals/job.md b/contributors/design-proposals/job.md
index c5a21927..4028de33 100644
--- a/contributors/design-proposals/job.md
+++ b/contributors/design-proposals/job.md
@@ -49,11 +49,14 @@ limits described above are set. By default it will take value of 1, to allow deb
job issues, but not to flood the cluster with too many failed jobs and their
accompanying pods.
-All of the above fields will be optional and will apply no matter which `restartPolicy`
-is set on a `PodTemplate`. The only difference applies to how failures are counted.
-For restart policy `Never` we count actual pod failures (reflected in `.status.failed`
-field). With restart policy `OnFailure` we take an approximate value of pod restarts
-(as reported in `.status.containerStatuses[*].restartCount`).
+All of the above fields will be optional and will apply when `restartPolicy` is
+set to `Never` on a `PodTemplate`. With restart policy `OnFailure` only `BackoffLimit`
+applies. The reason for that is that failed pods are already restarted by the
+kubelet with an [exponential backoff](https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#restart-policy).
+Additionally, failures are counted differently depending on `restartPolicy`
+setting. For `Never` we count actual pod failures (reflected in `.status.failed`
+field). With `OnFailure`, we take an approximate value of pod restarts (as reported
+in `.status.containerStatuses[*].restartCount`).
When `.spec.parallelism` is set to a value higher than 1, the failures are an
overall number (as coming from `.status.failed`) because the controller does not
hold information about failures coming from separate pods.