summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorEric Tune <etune@google.com>2016-05-13 14:56:05 -0700
committerEric Tune <etune@google.com>2016-08-04 07:29:06 -0700
commitd375dada887becfc654f62674758e7f878448e73 (patch)
tree0a4ea9ee89ca2fab326e9935af8714e849ee4743
parentd6039caa0b411cc457d8fb64e85bdbc5352b0b10 (diff)
ScheduledJob: proposal updates
-rw-r--r--scheduledjob.md195
1 files changed, 167 insertions, 28 deletions
diff --git a/scheduledjob.md b/scheduledjob.md
index 0625f09d..5a474b0a 100644
--- a/scheduledjob.md
+++ b/scheduledjob.md
@@ -62,13 +62,26 @@ report generation and the like. Each of these tasks should be allowed to run
repeatedly (once a day/month, etc.) or once at a given point in time.
-## Implementation
+## Design Overview
-### ScheduledJob resource
+Users create a ScheduledJob object. One ScheduledJob object
+is like one line of a crontab file. It has a schedule of when to run,
+in [Cron](https://en.wikipedia.org/wiki/Cron) format.
+
+
+The ScheduledJob controller creates a Job object [Job](job.md)
+about once per execution time of the scheduled (e.g. once per
+day for a daily schedule.) We say "about" because there are certain
+circumstances where two jobs might be created, or no job might be
+created. We attempt to make these rare, but do not completely prevent
+them. Therefore, Jobs should be idempotent.
-The ScheduledJob controller relies heavily on the [Job API](job.md)
-for running actual jobs, on top of which it adds information regarding the date
-and time part according to [Cron](https://en.wikipedia.org/wiki/Cron) format.
+The Job object is responsible for any retrying of Pods, and any parallelism
+among pods it creates, and determining the success or failure of the set of
+pods. The ScheduledJob does not examine pods at all.
+
+
+### ScheduledJob resource
The new `ScheduledJob` object will have the following contents:
@@ -173,31 +186,12 @@ type ScheduledJobStatus struct {
}
```
-### Modifications to Job resource
+Users must use a generated selector for the job.
-In order to distinguish Job runs, we need to add `UniqueLabelKey` field to `JobSpec`.
-This field will be used for creating unique label selectors.
+## Modifications to Job resource
-```go
-type JobSpec {
-
- //...
-
- // Key of the selector that is added to prevent concurrently running Jobs
- // selecting their pods.
- // Users can set this to an empty string to indicate that the system should
- // not add any selector and label. If unspecified, system uses
- // "scheduledjob.kubernetes.io/podTemplateHash".
- // Value of this key is hash of ScheduledJobSpec.PodTemplateSpec.
- // No label is added if this is set to an empty string.
- UniqueLabelKey *string
-}
-```
-
-Although at Job level empty string is perfectly valid, `ScheduledJob` cannot have
-empty selector, it needs to be defined, either by user or generated automatically.
-For this to happen, validation will be tightened at ScheduledJob level for this
-field to be either nil or non-empty string.
+TODO for beta: forbid manual selector since that could cause confusing between
+subsequent jobs.
### Running ScheduledJobs using kubectl
@@ -216,6 +210,151 @@ In the above example:
July 7th, 2pm. This value will be validated according to the same rules which
apply to `.spec.schedule`.
+## Fields Added to Job Template
+
+When the controller creates a Job from the JobTemplateSpec in the ScheduledJob, it
+adds the following fields to the Job:
+
+- a name, based on the ScheduledJob's name, but with a suffix to distinguish
+ multiple executions, which may overlap.
+- the standard created-by annotation on the Job, pointing to the SJ that created it
+ The standard key is `kubernetes.io/created-by`. The value is a serialized JSON object, like
+ `{ "kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ScheduledJob","namespace":"default",`
+ `"name":"nightly-earnings-report","uid":"5ef034e0-1890-11e6-8935-42010af0003e","apiVersion":...`
+ This serialization contains the UID of the parent. This is used to match the Job to the SJ that created
+ it.
+
+## Updates to ScheduledJobs
+
+If the schedule is updated on a ScheduledJob, it will:
+- continue to use the Status.Active list of jobs to detect conflicts.
+- try to fulfill all recently-passed times for the new schedule, by starting
+ new jobs. But it will not try to fulfill times prior to the
+ Status.LastScheduledTime.
+ - Example: If you have a schedule to run every 30 minutes, and change that to hourly, then the previously started
+ top-of-the-hour run, in Status.Active, will be seen and no new job started.
+ - Example: If you have a schedule to run every hour, change that to 30-minutely, at 31 minutes past the hour,
+ one run will be started immediately for the starting time that has just passed.
+
+If the job template of a ScheduledJob is updated, then future executions use the new template
+but old ones still satisfy the schedule and are not re-run just because the template changed.
+
+If you delete and replace a ScheduledJob with one of the same name, it will:
+- not use any old Status.Active, and not consider any existing running or terminated jobs from the previous
+ ScheduledJob (with a different UID) at all when determining coflicts, what needs to be started, etc.
+- If there is an existing Job with the same time-based hash in its name (see below), then
+ new instances of that job will not be able to be created. So, delete it if you want to re-run.
+with the same name as conflicts.
+- not "re-run" jobs for "start times" before the creation time of the new ScheduledJobJob object.
+- not consider executions from the previous UID when making decisions about what executions to
+ start, or status, etc.
+- lose the history of the old SJ.
+
+To preserve status, you can suspend the old one, and make one with a new name, or make a note of the old status.
+
+
+## Fault-Tolerance
+
+### Starting Jobs in the face of controller failures
+
+If the process with the scheduledJob controller in it fails,
+and takes a while to restart, the scheduledJob controller
+may miss the time window and it is too late to start a job.
+
+With a single scheduledJob controller process, we cannot give
+very strong assurances about not missing starting jobs.
+
+With a suggested HA configuration, there are multiple controller
+processes, and they use master election to determine which one
+is active at any time.
+
+If the Job's StartingDeadlineSeconds is long enough, and the
+lease for the master lock is short enough, and other controller
+processes are running, then a Job will be started.
+
+TODO: consider hard-coding the minimum StartingDeadlineSeconds
+at say 1 minute. Then we can offer a clearer guarantee,
+assuming we know what the setting of the lock lease duration is.
+
+### Ensuring jobs are run at most once
+
+There are three problems here:
+
+- ensure at most one Job created per "start time" of a schedule.
+- ensure that at most one Pod is created per Job
+- ensure at most one container start occurs per Pod
+
+#### Ensuring one Job
+
+Multiple jobs might be created in the following sequence:
+
+1. scheduled job controller sends request to start Job J1 to fulfill start time T.
+1. the create request is accepted by the apiserver and enqueued but not yet written to etcd.
+1. scheduled job controller crashes
+1. new scheduled job controller starts, and lists the existing jobs, and does not see one created.
+1. it creates a new one.
+1. the first one eventually gets written to etcd.
+1. there are now two jobs for the same start time.
+
+We can solve this in several ways:
+
+1. with three-phase protocol, e.g.:
+ 1. controller creates a "suspended" job.
+ 1. controller writes writes an annotation in the SJ saying that it created a job for this time.
+ 1. controller unsuspends that job.
+1. by picking a deterministic name, so that at most one object create can succeed.
+
+#### Ensuring one Pod
+
+Job object does not currently have a way to ask for this.
+Even if it did, controller is not written to support it.
+Same problem as above.
+
+#### Ensuring one container invocation per Pod
+
+Kubelet is not written to ensure at-most-one-container-start per pod.
+
+#### Decision
+
+This is too hard to do for the alpha version. We will await user
+feedback to see if the "at most once" property is needed in the beta version.
+
+This is awkward but possible for a containerized application ensure on it own, as it needs
+to know what ScheduledJob name and Start Time it is from, and then record the attempt
+in a shared storage system. We should ensure it could extract this data from its annotations
+using the downward API.
+
+## Name of Jobs
+
+A ScheduledJob creates one Job at each time when a Job should run.
+Since there may be concurrent jobs, and since we might want to keep failed
+non-overlapping Jobs around as a debugging record, each Job created by the same ScheduledJob
+needs a distinct name.
+
+To make the Jobs from the same ScheduledJob distinct, we could use a random string,
+in the way that pods have a `generateName`. For example, a scheduledJob named `nightly-earnings-report`
+in namespace `ns1` might create a job `nightly-earnings-report-3m4d3`, and later create
+a job called `nightly-earnings-report-6k7ts`. This is consistent with pods, but
+does not give the user much information.
+
+Alternatively, we can use time as a uniqifier. For example, the same scheduledJob could
+create a job called `nightly-earnings-report-2016-May-19`.
+However, for Jobs that run more than once per day, we would need to represent
+time as well as date. Standard date formats (e.g. RFC 3339) use colons for time.
+Kubernetes names cannot include time. Using a non-standard date format without colons
+will annoy some users.
+
+Also, date strings are much longer than random suffixes, which means that
+the pods will also have long names, and that we are more likely to exceed the
+253 character name limit when combining the scheduled-job name,
+the time suffix, and pod random suffix.
+
+One option would be to compute a hash of the nominal start time of the job,
+and use that as a suffix. This would not provide the user with an indication
+of the start time, but it would prevent creation of the same execution
+by two instances (replicated or restarting) of the controller process.
+
+We chose to use the hashed-date suffix approach.
## Future evolution