diff options
| author | k8s-ci-robot <k8s-ci-robot@users.noreply.github.com> | 2018-10-15 14:22:37 -0700 |
|---|---|---|
| committer | GitHub <noreply@github.com> | 2018-10-15 14:22:37 -0700 |
| commit | 17248af3560a52b2c09ef667449a8a225e000416 (patch) | |
| tree | fb7ef200cca9500ebcad617c5fcd251ce259ed29 | |
| parent | 3067c78e2236290ea088d85af29edb18dfa23f20 (diff) | |
| parent | e6e3075eebda8db142757cf570c98dc7117990b1 (diff) | |
Merge pull request #2524 from caesarxuchao/storage
[KEP] storage api version migration
| -rw-r--r-- | keps/NEXT_KEP_NUMBER | 2 | ||||
| -rw-r--r-- | keps/sig-api-machinery/0030-storage-migration.md | 289 |
2 files changed, 290 insertions, 1 deletions
diff --git a/keps/NEXT_KEP_NUMBER b/keps/NEXT_KEP_NUMBER index 64bb6b74..e85087af 100644 --- a/keps/NEXT_KEP_NUMBER +++ b/keps/NEXT_KEP_NUMBER @@ -1 +1 @@ -30 +31 diff --git a/keps/sig-api-machinery/0030-storage-migration.md b/keps/sig-api-machinery/0030-storage-migration.md new file mode 100644 index 00000000..88d57852 --- /dev/null +++ b/keps/sig-api-machinery/0030-storage-migration.md @@ -0,0 +1,289 @@ +--- +kep-number: 30 +title: Migrating API objects to latest storage version +authors: + - "@xuchao" +owning-sig: sig-api-machinery +reviewers: + - "@deads2k" + - "@yliaog" + - "@lavalamp" +approvers: + - "@deads2k" + - "@lavalamp" +creation-date: 2018-08-06 +last-updated: 2018-10-11 +status: provisional +--- + +# Migrating API objects to latest storage version + +## Table of Contents + + * [Migrating API objects to latest storage version](#migrating-api-objects-to-latest-storage-version) + * [Table of Contents](#table-of-contents) + * [Summary](#summary) + * [Motivation](#motivation) + * [Goals](#goals) + * [Proposal](#proposal) + * [Alpha workflow](#alpha-workflow) + * [Alpha API](#alpha-api) + * [Failure recovery](#failure-recovery) + * [Beta workflow - Automation](#beta-workflow---automation) + * [Risks and Mitigations](#risks-and-mitigations) + * [Graduation Criteria](#graduation-criteria) + * [Alternatives](#alternatives) + * [update-storage-objects.sh](#update-storage-objectssh) + +## Summary + +We propose a solution to migrate the stored API objects in Kubernetes clusters. +In 2018 Q4, we will deliver a tool of alpha quality. The tool extends and +improves based on the [oc adm migrate storage][] command. We will integrate the +storage migration into the Kubernetes upgrade process in 2019 Q1. We will make +the migration automatically triggered in 2019. + +[oc adm migrate storage]:https://www.mankier.com/1/oc-adm-migrate-storage + +## Motivation + +"Today it is possible to create API objects (e.g., HPAs) in one version of +Kubernetes, go through multiple upgrade cycles without touching those objects, +and eventually arrive at a version of Kubernetes that can’t interpret the stored +resource and crashes. See k8s.io/pr/52185."[1][]. We propose a solution to the +problem. + +[1]:https://docs.google.com/document/d/1eoS1K40HLMl4zUyw5pnC05dEF3mzFLp5TPEEt4PFvsM + +### Goals + +A successful storage version migration tool must: +* work for Kubernetes built-in APIs, custom resources (CR), and aggregated APIs. +* do not add burden to cluster administrators or Kubernetes distributions. +* only cause insignificant load to apiservers. For example, if the master has + 10GB memory, the migration tool should generate less than 10 qps of single + object operations(TODO: measure the memory consumption of PUT operations; + study how well the default 10 Mbps bandwidth limit in the oc command work). +* work for big clusters that have ~10^6 instances of some resource types. +* make progress in flaky environment, e.g., flaky apiservers, or the migration + process get preempted. +* allow system administrators to track the migration progress. + +As to the deliverables, +* in the short term, providing system administrators with a tool to migrate + the Kubernetes built-in API objects to the proper storage versions. +* in the long term, automating the migration of Kubernetes built-in APIs, CR, + aggregated APIs without further burdening system administrators or Kubernetes + distributions. + +## Proposal + +### Alpha workflow + +At the alpha stage, the migrator needs to be manually launched, and does not +handle custom resources or aggregated resources. + +After all the kube-apiservers are at the desired version, the cluster +administrator runs `kubectl apply -f migrator-initializer-<k8s-versio>.yaml`. +The apply command +* creates a *kube-storage-migration* namespace +* creates a *storage-migrator* service account +* creates a *system:storage-migrator* cluster role that can *get*, *list*, and + *update* all resources, and in addition, *create* and *delete* CRDs. +* creates a cluster role binding to bind the created service account with the + cluster role +* creates a **migrator-initializer** job running with the + *storage-migrator* service account. + +The **migrator-initializer** job +* deletes any existing deployment of **kube-migrator controller** +* creates a **kube-migrator controller** deployment running with the + *storage-migrator* service account. +* generates a comprehensive list of resource types via the discovery API +* discovers all custom resources via listing CRDs +* discovers all aggregated resources via listing all `apiservices` that have + `.spec.service != null` +* removes the custom resources and aggregated resources from the comprehensive + resource list. The list now only contains Kubernetes built-in resources. +* removes resources that share the same storage. At the alpha stage, the + information is hard-coded, like in this [list][]. +* creates `migration` CRD (see the [API section][] for the schema) if it does + not exist. +* creates `migration` CRs for all remaining resources in the list. The + `ownerReferences` of the `migration` objects are set to the **kube-migrator + controller** deployment. Thus, the old `migration`s are deleted with the old + deployment in the first step. + +The control loop of **kube-migrator controller** does the following: +* runs a reflector to watch for the instances of the `migration` CR. The list + function used to construct the reflector sorts the `migration`s so that the + *Running* `migration` will be processed first. +* syncs one `migration` at a time to avoid overloading the apiserver, + * if `migration.status` is nil, or `migration.status.conditions` shows + *Running*, it creates a **migration worker** goroutine to migrate the + resource type. + * adds the *Running* condition to `migration.status.conditions`. + * waits until the **migration worker** goroutine finishes, adds either the + *Succeeded* or *Failed* condition to `migration.status.conditions` and sets + the *Running* condition to false. + +The **migration worker** runs the equivalence of `oc adm migrate storage +--include=<resource type>` to migrate a resource type. The **migration worker** +uses API chunking to retrieve partial lists of a resource type and thus can +migrate a small chunk at a time. It stores the [continue token] in the owner +`migration.spec.continueToken`. With the inconsistent continue token +introduced in [#67284][], the **migration worker** does not need to worry about +expired continue token. + +[list]:https://github.com/openshift/origin/blob/2a8633598ef0dcfa4589d1e9e944447373ac00d7/pkg/oc/cli/admin/migrate/storage/storage.go#L120-L184 +[#67284]:https://github.com/kubernetes/kubernetes/pull/67284 +[API section]:#alpha-api + +The cluster admin can run the `kubectl wait --for=condition=Succeeded +migrations` to wait for all migrations to succeed. + +Users can run `kubectl create` to create `migration`s to request migrating +custom resources and aggregated resources. + +### Alpha API + +We introduce the `storageVersionMigration` API to record the intention and the +progress of a migration. Throughout this doc, we abbreviated it as `migration` +for simplicity. The API will be a CRD defined in the `migration.k8s.io` group. + +Read the [workflow section][] to understand how the API is used. + +```golang +type StorageVersionMigration struct { + metav1.TypeMeta + // For readers of this KEP, metadata.generateName will be "<resource>.<group>" + // of the resource being migrated. + metav1.ObjectMeta + Spec StorageVersionMigrationSpec + Status StorageVersionMigrationStatus +} + +// Note that the spec only contains an immutable field in the alpha version. To +// request another round of migration for the resource, clients need to create +// another `migration` CR. +type StorageVersionMigrationSpec { + // Resource is the resource that is being migrated. The migrator sends + // requests to the endpoint tied to the Resource. + // Immutable. + Resource GroupVersionResource + // ContinueToken is the token to use in the list options to get the next chunk + // of objects to migrate. When the .status.conditions indicates the + // migration is "Running", users can use this token to check the progress of + // the migration. + // +optional + ContinueToken string +} + +type MigrationConditionType string + +const ( + // MigrationRunning indicates that a migrator job is running. + MigrationRunning MigrationConditionType = "Running" + // MigrationSucceed indicates that the migration has completed successfully. + MigrationSucceeded MigrationConditionType = "Succeeded" + // MigrationFailed indicates that the migration has failed. + MigrationFailed MigrationConditionType = "Failed" +) + +type MigrationCondition struct { + // Type of the condition + Type MigrationConditionType + // Status of the condition, one of True, False, Unknown. + Status corev1.ConditionStatus + // The last time this condition was updated. + LastUpdateTime metav1.Time + // The reason for the condition's last transition. + Reason string + // A human readable message indicating details about the transition. + Message string +} + +type StorageVersionMigrationStatus { + // Conditions represents the latest available observations of the migration's + // current state. + Conditions []MigrationCondition +} +``` + +[continue token]:https://github.com/kubernetes/kubernetes/blob/972e1549776955456d9808b619d136ee95ebb388/staging/src/k8s.io/apimachinery/pkg/apis/meta/v1/types.go#L82 +[workflow section]:#alpha-workflow + +### Failure recovery + +As stated in the goals section, the migration has to make progress even if the +environment is flaky. This section describes how the migrator recovers from +failure. + +Kubernetes **replicaset controller** restarts the **migration controller** `pod` +if it fails. Because the migration states, including the continue token, are + stored in the `migration` object, the **migration controller** can resume from + where it left off. + +[workflow section]:#alpha-workflow + +### Beta workflow - Automation + +It is a beta goal to automate the migration workflow. That is, migration does +not need to be triggered manually by cluster admins, or by custom control loops +of Kubernetes distributions. + +The automated migration should work for Kubernetes built-in resource types, +custom resources, and aggregated resources. + +The trigger can be implemented as a separate control loop. It watches for the +triggering signal, and creates `migration` to notify the **kube-migrator +controller** to migrate a resource. + +We haven't reached consensus on what signal would trigger storage migration. We +will revisit this section during beta design. + +### Risks and Mitigations + +The migration process does not change the objects, so it will not pollute +existing data. + +If the rate limiting is not tuned well, the migration can overload the +apiserver. Users can delete the migration controller and the migration +jobs to mitigate. + +Before upgrading or downgrading the cluster, the cluster administrator must run +`kubectl wait --for=condition=Succeeded migrations` to make sure all +migrations have completed. Otherwise the apiserver can crash, because it cannot +interpret the serialized data in etcd. To mitigate, the cluster administrator +can rollback the apiserver to the old version, and wait for the migration to +complete. Even if the apiserver does not crash after upgrading or downgrading, +the `migration` objects are not accurate anymore, because the default storage +versions might have changed after upgrading or downgrading, but no one +increments the `migration.spec.generation`. Administrator needs to re-run the +`kubectl run migrate --image=migrator-initializer --restart=OnFailure` command +to recover. + +TODO: it is safe to rollback an apiserver to the previous configuration without +waiting for the migration to complete. It is only unsafe to roll-forward or +rollback twice. We need to design how to record the previous configuration. + +## Graduation Criteria + +* alpha: delivering a tool that implements the "alpha workflow" and "failure + recovery" sections. ETA is 2018 Q4. + +* beta: implementing the "beta workflow" and integrating the storage migration + into Kubernetes upgrade tests. + +* GA: TBD. + +We will revisit this section in 2018 Q4. + +## Alternatives + +### update-storage-objects.sh + +The Kubernetes repo has an update-storage-objects.sh script. It is not +production ready: no rate limiting, hard-coded resource types, no persisted +migration states. We will delete it, leaving a breadcrumb for any users to +follow to the new tool. |
