summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorJan Safranek <jsafrane@redhat.com>2016-10-19 12:01:50 +0200
committerJan Safranek <jsafrane@redhat.com>2016-10-19 12:01:50 +0200
commit5820fd350847e7a2e6960717f2bbc416af6328af (patch)
tree9df73e50beee4db33c923bcf15f8b5de0d0aff48
parent336c069ee01df0e238e38541b5044de7df73ff14 (diff)
Proposal for external dynamic provisioners
-rw-r--r--volume-provisioning.md332
1 files changed, 301 insertions, 31 deletions
diff --git a/volume-provisioning.md b/volume-provisioning.md
index 68b30b69..6e8ea6b2 100644
--- a/volume-provisioning.md
+++ b/volume-provisioning.md
@@ -64,10 +64,7 @@ types of volumes within a single cloud.
One of our goals is to enable administrators to create out-of-tree
provisioners, that is, provisioners whose code does not live in the Kubernetes
-project. Our experience since the 1.2 release with dynamic provisioning has
-shown that it is impossible to anticipate every aspect and manner of
-provisioning that administrators will want to perform. The proposed design
-should not prevent future work to allow out-of-tree provisioners.
+project.
## Design
@@ -75,37 +72,60 @@ This design represents the minimally viable changes required to provision based
We propose that:
-1. For the base impelementation storage class and volume selectors are mutually exclusive.
-
-2. An api object will be incubated in storage.k8s.io/v1beta1 to hold the a `StorageClass`
+1. Both for in-tree and out-of-tree storage provisioners, the PV created by the
+ provisioners must match the PVC that led to its creations. If a provisioner
+ is unable to provision such a matching PV, it reports an error to the
+ user.
+
+2. The above point applies also to PVC label selector. If user submits a PVC
+ with a label selector, the provisioner must provision a PV with matching
+ labels. This directly implies that the provisioner understands meaning
+ behind these labels - if user submits a claim with selector that wants
+ a PV with label "region" not in "[east,west]", the provisioner must
+ understand what label "region" means, what available regions are there and
+ choose e.g. "north".
+
+ In other words, provisioners should either refuse to provision a volume for
+ a PVC that has a selector, or select few labels that are allowed in
+ selectors (such as the "region" example above), implement necessary logic
+ for their parsing, document them and refuse any selector that references
+ unknown labels.
+
+3. An api object will be incubated in storage.k8s.io/v1beta1 to hold the a `StorageClass`
API resource. Each StorageClass object contains parameters required by the provisioner to provision volumes of that class. These parameters are opaque to the user.
-3. `PersistentVolume.Spec.Class` attribute is added to volumes. This attribute
+4. `PersistentVolume.Spec.Class` attribute is added to volumes. This attribute
is optional and specifies which `StorageClass` instance represents
storage characteristics of a particular PV.
During incubation, `Class` is an annotation and not
actual attribute.
-4. `PersistentVolume` instances do not require labels by the provisioner.
+5. `PersistentVolume` instances do not require labels by the provisioner.
-5. `PersistentVolumeClaim.Spec.Class` attribute is added to claims. This
+6. `PersistentVolumeClaim.Spec.Class` attribute is added to claims. This
attribute specifies that only a volume with equal
`PersistentVolume.Spec.Class` value can satisfy a claim.
During incubation, `Class` is just an annotation and not
actual attribute.
-6. The existing provisioner plugin implementations be modified to accept
+7. The existing provisioner plugin implementations be modified to accept
parameters as specified via `StorageClass`.
-7. The persistent volume controller modified to invoke provisioners using `StorageClass` configuration and bind claims with `PersistentVolumeClaim.Spec.Class` to volumes with equivalent `PersistentVolume.Spec.Class`
+8. The persistent volume controller modified to invoke provisioners using `StorageClass` configuration and bind claims with `PersistentVolumeClaim.Spec.Class` to volumes with equivalent `PersistentVolume.Spec.Class`
-8. The existing alpha dynamic provisioning feature be phased out in the
+9. The existing alpha dynamic provisioning feature be phased out in the
next release.
### Controller workflow for provisioning volumes
+0. Kubernetes administator can configure name of a default StorageClass. This
+ StorageClass instance is then used when user requests a dynamically
+ provisioned volume, but does not specify a StorageClass. In other words,
+ `claim.Spec.Class == ""`
+ (or annotation `volume.beta.kubernetes.io/storage-class == ""`).
+
1. When a new claim is submitted, the controller attempts to find an existing
volume that will fulfill the claim.
@@ -125,30 +145,280 @@ We propose that:
periodically retries finding a matching volume or storage class again until
a match is found. The claim is `Pending` during this period.
-4. With StorageClass instance, the controller finds volume plugin specified by
- StorageClass.Provisioner.
+4. With StorageClass instance, the controller updates the claim:
+ * `claim.Annotations["volume.beta.kubernetes.io/storage-provisioner"] = storageClass.Provisioner`
+
+* **In-tree provisioning**
+
+ The controller tries to find an internal volume plugin referenced by
+ `storageClass.Provisioner`. If it is found:
+
+ 5. The internal provisioner implements interface`ProvisionableVolumePlugin`,
+ which has a method called `NewProvisioner` that returns a new provisioner.
+
+ 6. The controller calls volume plugin `Provision` with Parameters
+ from the `StorageClass` configuration object.
+
+ 7. If `Provision` returns an error, the controller generates an event on the
+ claim and goes back to step 1., i.e. it will retry provisioning
+ periodically.
+
+ 8. If `Provision` returns no error, the controller creates the returned
+ `api.PersistentVolume`, fills its `Class` attribute with `claim.Spec.Class`
+ and makes it already bound to the claim
+
+ 1. If the create operation for the `api.PersistentVolume` fails, it is
+ retried
+
+ 2. If the create operation does not succeed in reasonable time, the
+ controller attempts to delete the provisioned volume and creates an event
+ on the claim
+
+Existing behavior is un-changed for claims that do not specify
+`claim.Spec.Class`.
+
+* **Out of tree provisioning**
+
+ Following step 4. above, the controller tries to find internal plugin for the
+ `StorageClass`. If it is not found, it does not do anything, it just
+ periodically goes to step 1., i.e. tries to find available matching PV.
+
+ The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
+ "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be
+ interpreted as described in RFC 2119.
+
+ External provisioner must have these features:
+
+ * It MUST have a distinct name, following Kubernetenes plugin naming scheme
+ `<vendor name>/<provisioner name>`, e.g. `gluster.org/gluster-volume`.
+
+ * The provisioner SHOULD send events on a claim to report any errors
+ related to provisioning a volume for the claim. This way, users get the same
+ experience as with internal provisioners.
-5. All provisioners are in-tree; they implement an interface called
- `ProvisionableVolumePlugin`, which has a method called `NewProvisioner`
- that returns a new provisioner.
+ * The provisioner MUST implement also a deleter. It must be able to delete
+ storage assets it created. It MUST NOT assume that any other internal or
+ external plugin is present.
-6. The controller calls volume plugin `Provision` with Parameters from the `StorageClass` configuration object.
+ The external provisioner runs in a separate process which watches claims, be
+ it an external storage appliance, a daemon or a Kubernetes pod. For every
+ claim creation or update, it implements these steps:
-7. If `Provision` returns an error, the controller generates an event on the
- claim and goes back to step 1., i.e. it will retry provisioning periodically
+ 1. The provisioner inspects if
+ `claim.Annotations["volume.beta.kubernetes.io/storage-provisioner"] == <provisioner name>`.
+ All other claims MUST be ignored.
-8. If `Provision` returns no error, the controller creates the returned
- `api.PersistentVolume`, fills its `Class` attribute with `claim.Spec.Class`
- and makes it already bound to the claim
+ 2. The provisioner MUST check that the claim is unbound, i.e. its
+ `claim.Spec.VolumeName` is empty. Bound volumes MUST be ignored.
- 1. If the create operation for the `api.PersistentVolume` fails, it is
- retried
+ *Race condition when the provisioner provisions a new PV for a claim and
+ at the same time Kubernetes binds the same claim to another PV that was
+ just created by admin is discussed below.*
- 2. If the create operation does not succeed in reasonable time, the
- controller attempts to delete the provisioned volume and creates an event
- on the claim
+ 3. It tries to find a StorageClass instance referenced by annotation
+ `claim.Annotations["volume.beta.kubernetes.io/storage-class"]`. If not
+ found, it SHOULD report an error (by sending an event to the claim) and it
+ SHOULD retry periodically with step i.
-Existing behavior is un-changed for claims that do not specify `claim.Spec.Class`.
+ 4. The provisioner MUST parse arguments in the `StorageClass` and
+ `claim.Spec.Selector` and provisions appropriate storage asset that matches
+ both the parameters and the selector.
+ When it encounters unknown parameters in `storageClass.Parameters` or
+ `claim.Spec.Selector` or the combination of these parameters is impossible
+ to achieve, it SHOULD report an error and it MUST NOT provision a volume.
+ All errors found during parsing or provisioning SHOULD be send as events
+ on the claim and the provisioner SHOULD retry periodically with step i.
+
+ As parsing (and understanding) claim selectors is hard, the sentence
+ "MUST parse ... `claim.Spec.Selector`" will in typical case lead to simple
+ refusal of claims that have any selector:
+
+ ```go
+ if pvc.Spec.Selector != nil {
+ return Error("can't parse PVC selector!")
+ }
+ ```
+
+ 5. When the volume is provisioned, the provisioner MUST create a new PV
+ representing the storage asset and save it in Kubernetes. When this fails,
+ it SHOULD retry creating the PV again few times. If all attempts fail, it
+ MUST delete the storage asset. All errors SHOULD be sent as events to the
+ claim.
+
+ The created PV MUST have these properties:
+
+ * `pv.Spec.ClaimRef` MUST point to the claim that led to its creation
+ (including the claim UID).
+
+ *This way, the PV will be bound to the claim.*
+
+ * `pv.Annotations["pv.kubernetes.io/provisioned-by"]` MUST be set to name
+ of the external provisioner. This provisioner will be used to delete the
+ volume.
+
+ *The provisioner/delete should not assume there is any other
+ provisioner/deleter available that would delete the volume.*
+
+ * `pv.Annotations["volume.beta.kubernetes.io/storage-class"]` MUST be set
+ to name of the storage class requested by the claim.
+
+ *So the created PV matches the claim.*
+
+ * The provisioner MAY store any other information to the created PV as
+ annotations. It SHOULD save any information that is needed to delete the
+ storage asset there, as appropriate StorageClass instance may not exist
+ when the volume will be deleted. However, references to Secret instance
+ or direct username/password to a remote storage appliance MUST NOT be
+ stored there, see issue #34822.
+
+ * `pv.Labels` MUST be set to match `claim.spec.selector`. The provisioner
+ MAY add additional labels.
+
+ *So the created PV matches the claim.*
+
+ * `pv.Spec` MUST be set to match requirements in `claim.Spec`, especially
+ access mode and PV size. The provisioned volume size MUST NOT be smaller
+ than size requested in the claim, however it MAY be larger.
+
+ *So the created PV matches the claim.*
+
+ * `pv.Spec.PersistentVolumeSource` MUST be set to point to the created
+ storage asset.
+
+ * `pv.Spec.PersistentVolumeReclaimPolicy` SHOULD be set to `Delete` unless
+ user manually configures other reclaim policy.
+
+ * `pv.Name` MUST be unique. Internal provisioners use name based on
+ `claim.UID` to produce conflicts when two provisioners accidentally
+ provision a PV for the same claim, however external provisioners can use
+ any mechanism to generate an unique PV name.
+
+ Example of a claim that is to be provisioned by an external provisioner for
+ `foo.org/foo-volume`:
+
+ ```yaml
+ apiVersion: v1
+ kind: PersistentVolumeClaim
+ metadata:
+ annotations:
+ volume.beta.kubernetes.io/storage-class: myClass
+ volume.beta.kubernetes.io/storage-provisioner: foo.org/foo-volume
+ name: fooclaim
+ namespace: default
+ resourceVersion: "53"
+ uid: 5a294561-7e5b-11e6-a20e-0eb6048532a3
+ spec:
+ accessModes:
+ - ReadWriteOnce
+ resources:
+ requests:
+ storage: 4Gi
+ # volumeName: must be empty!
+ ```
+
+ Example of the created PV:
+
+ ```yaml
+ apiVersion: v1
+ kind: PersistentVolume
+ metadata:
+ annotations:
+ pv.kubernetes.io/provisioned-by: foo.org/foo-volume
+ volume.beta.kubernetes.io/storage-class: myClass
+ foo.org/provisioner: "any other annotations as needed"
+ labels:
+ foo.org/my-label: "any labels as needed"
+ generateName: "foo-volume-"
+ spec:
+ accessModes:
+ - ReadWriteOnce
+ awsElasticBlockStore:
+ fsType: ext4
+ volumeID: aws://us-east-1d/vol-de401a79
+ capacity:
+ storage: 4Gi
+ claimRef:
+ apiVersion: v1
+ kind: PersistentVolumeClaim
+ name: fooclaim
+ namespace: default
+ resourceVersion: "53"
+ uid: 5a294561-7e5b-11e6-a20e-0eb6048532a3
+ persistentVolumeReclaimPolicy: Delete
+ ```
+
+ As result, Kubernetes has a PV that represents the storage asset and is bound
+ to the claim. When everything went well, Kubernetes completed binding of the
+ claim to the PV.
+
+ Kubernetes was not blocked in any way during the provisioning and could
+ either bound the claim to another PV that was created by user or even the
+ claim may have been deleted by the user. In both cases, Kubernetes will mark
+ the PV to be delete using the protocol below.
+
+ The external provisioner MAY save any annotations to the claim that is
+ provisioned, however the claim may be modified or even deleted by the user at
+ any time.
+
+
+### Controller workflow for deleting volumes
+
+When the controller decides that a volume should be deleted it performs these
+steps:
+
+1. The controller changes `pv.Status.Phase` to `Released`.
+
+2. The controller looks for `pv.Annotations["pv.kubernetes.io/provisioned-by"]`.
+ If found, it uses this provisioner/deleter to delete the volume.
+
+3. If the volume is not annotated by `pv.kubernetes.io/provisioned-by`, the
+ controller inspects `pv.Spec` and finds in-tree deleter for the volume.
+
+4. If the deleter found by steps 2. or 3. is internal, it calls it and deletes
+ the storage asset together with the PV that represents it.
+
+5. If the deleter is not known to Kubernetes, it does not do anything.
+
+6. External deleters MUST watch for PV changes. When
+ `pv.Status.Phase == Released && pv.Annotations['pv.kubernetes.io/provisioned-by'] == <deleter name>`,
+ the deleter:
+
+ * It MUST check reclaim policy of the PV and ignore all PVs whose
+ `Spec.PersistentVolumeReclaimPolicy` is not `Delete`.
+
+ * It MUST delete the storage asset.
+
+ * Only after the storage asset was successfully deleted, it MUST delete the
+ PV object in Kubernetes.
+
+ * Any error SHOULD be sent as an event on the PV being deleted and the
+ deleter SHOULD retry to delete the volume periodically.
+
+ * The deleter SHOULD NOT use any information from StorageClass instance
+ referenced by the PV. This is different to internal deleters, which
+ need to be StorageClass instance present at the time of deletion to read
+ Secret instances (see Gluster provisioner for example), however we would
+ like to phase out this behavior.
+
+ Note that watching `pv.Status` has been frowned upon in the past, however in
+ this particular case we could use it quite reliably to trigger deletion.
+ It's not trivial to find out if a PV is not needed and should be deleted.
+ *Alternatively, an annotation could be used.*
+
+### Security considerations
+
+Both internal and external provisioners and deleters may need access to
+credentials (e.g. username+password) of an external storage appliance to
+provision and delete volumes.
+
+* For internal provisioners, a Secret instance in a well secured namespace
+should be used. Pointer to the Secret instance shall be parameter of the
+StorageClass and it MUST NOT be copied around the system e.g. in annotations
+of PVs. See issue #34822.
+
+* External provisioners running in pod should have appropriate credentials
+mouted as Secret inside pods that run the provisioner. Namespace with the pods
+and Secret instance should be well secured.
### `StorageClass` API
@@ -253,7 +523,7 @@ parameters:
0. Annotation `volume.alpha.kubernetes.io/storage-class` is used instead of `claim.Spec.Class` and `volume.Spec.Class` during incubation.
-1. `claim.Spec.Selector` and `claim.Spec.Class` are mutually exclusive. User can either match existing volumes with `Selector` XOR match existing volumes with `Class` and get dynamic provisioning by using `Class`. This simplifies initial PR and also provisioners.
+1. `claim.Spec.Selector` and `claim.Spec.Class` are mutually exclusive for now (1.4). User can either match existing volumes with `Selector` XOR match existing volumes with `Class` and get dynamic provisioning by using `Class`. This simplifies initial PR and also provisioners. This limitation may be lifted in future releases.
# Cloud Providers