diff options
| author | Nikhil Jindal <nikhiljindal@google.com> | 2017-04-10 12:54:24 -0700 |
|---|---|---|
| committer | GitHub <noreply@github.com> | 2017-04-10 12:54:24 -0700 |
| commit | 943e9b7888f0119decdb6f8f4e4d501fb37d7968 (patch) | |
| tree | 7f328d00b0417fe92086eca916300a878d9f243d | |
| parent | b59fc8016fe12a5898758b5d17db4d0a652b5016 (diff) | |
| parent | 7e5f2752001cf5f4768965a75bd151e89e28381f (diff) | |
Merge pull request #292 from tsandall/federated-placement-policy
Proposal: policy-based federated resource placement
| -rw-r--r-- | contributors/design-proposals/federated-placement-policy.md | 371 |
1 files changed, 371 insertions, 0 deletions
diff --git a/contributors/design-proposals/federated-placement-policy.md b/contributors/design-proposals/federated-placement-policy.md new file mode 100644 index 00000000..8b3d4b49 --- /dev/null +++ b/contributors/design-proposals/federated-placement-policy.md @@ -0,0 +1,371 @@ +# Policy-based Federated Resource Placement + +This document proposes a design for policy-based control over placement of +Federated resources. + +Tickets: + +- https://github.com/kubernetes/kubernetes/issues/39982 + +Authors: + +- Torin Sandall (torin@styra.com, tsandall@github) and Tim Hinrichs + (tim@styra.com). +- Based on discussions with Quinton Hoole (quinton.hoole@huawei.com, + quinton-hoole@github), Nikhil Jindal (nikhiljindal@github). + +## Background + +Resource placement is a policy-rich problem affecting many deployments. +Placement may be based on company conventions, external regulation, pricing and +performance requirements, etc. Furthermore, placement policies evolve over time +and vary across organizations. As a result, it is difficult to anticipate the +policy requirements of all users. + +A simple example of a placement policy is + +> Certain apps must be deployed on clusters in EU zones with sufficient PCI +> compliance. + +The [Kubernetes Cluster +Federation](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/federation.md#policy-engine-and-migrationreplication-controllers) +design proposal includes a pluggable policy engine component that decides how +applications/resources are placed across federated clusters. + +Currently, the placement decision can be controlled for Federated ReplicaSets +using the `federation.kubernetes.io/replica-set-preferences` annotation. In the +future, the [Cluster +Selector](https://github.com/kubernetes/kubernetes/issues/29887) annotation will +provide control over placement of other resources. The proposed design supports +policy-based control over both of these annotations (as well as others). + +This proposal is based on a POC built using the Open Policy Agent project. [This +short video (7m)](https://www.youtube.com/watch?v=hRz13baBhfg) provides an +overview and demo of the POC. + +## Design + +The proposed design uses the [Open Policy Agent](http://www.openpolicyagent.org) +project (OPA) to realize the policy engine component from the Federation design +proposal. OPA is an open-source, general purpose policy engine that includes a +declarative policy language and APIs to answer policy queries. + +The proposed design allows administrators to author placement policies and have +them automatically enforced when resources are created or updated. The design +also covers support for automatic remediation of resource placement when policy +(or the relevant state of the world) changes. + +In the proposed design, the policy engine (OPA) is deployed on top of Kubernetes +in the same cluster as the Federation Control Plane: + + + +The proposed design is divided into following sections: + +1. Control over the initial placement decision (admission controller) +1. Remediation of resource placement (opa-kube-sync/remediator) +1. Replication of Kubernetes resources (opa-kube-sync/replicator) +1. Management and storage of policies (ConfigMap) + +### 1. Initial Placement Decision + +To provide policy-based control over the initial placement decision, we propose +a new admission controller that integrates with OPA: + +When admitting requests, the admission controller executes an HTTP API call +against OPA. The API call passes the JSON representation of the resource in the +message body. + +The response from OPA contains the desired value for the resource’s annotations +(defined in policy by the administrator). The admission controller updates the +annotations on the resource and admits the request: + + + +The admission controller updates the resource by **merging** the annotations in +the response with existing annotations on the resource. If there are overlapping +annotation keys the admission controller replaces the existing value with the +value from the response. + +#### Example Policy Engine Query: + +```http +POST /v1/data/io/k8s/federation/admission HTTP/1.1 +Content-Type: application/json +``` + +```json +{ + "input": { + "apiVersion": "extensions/v1beta1", + "kind": "ReplicaSet", + "metadata": { + "annotations": { + "policy.federation.alpha.kubernetes.io/eu-jurisdiction-required": "true", + "policy.federation.alpha.kubernetes.io/pci-compliance-level": "2" + }, + "creationTimestamp": "2017-01-23T16:25:14Z", + "generation": 1, + "labels": { + "app": "nginx-eu" + }, + "name": "nginx-eu", + "namespace": "default", + "resourceVersion": "364993", + "selfLink": "/apis/extensions/v1beta1/namespaces/default/replicasets/nginx-eu", + "uid": "84fab96d-e188-11e6-ac83-0a580a54020e" + }, + "spec": { + "replicas": 4, + "selector": {...}, + "template": {...}, + } + } +} +``` + +#### Example Policy Engine Response: + +```http +HTTP/1.1 200 OK +Content-Type: application/json +``` + +```json +{ + "result": { + "annotations": { + "federation.kubernetes.io/replica-set-preferences": { + "clusters": { + "gce-europe-west1": { + "weight": 1 + }, + "gce-europe-west2": { + "weight": 1 + } + }, + "rebalance": true + } + } + } +} +``` + +> This example shows the policy engine returning the replica-set-preferences. +> The policy engine could similarly return a desired value for other annotations +> such as the Cluster Selector annotation. + +#### Conflicts + +A conflict arises if the developer and the policy define different values for an +annotation. In this case, the developer's intent is provided as a policy query +input and the policy author's intent is encoded in the policy itself. Since the +policy is the only place where both the developer and policy author intents are +known, the policy (or policy engine) should be responsible for resolving the +conflict. + +There are a few options for handling conflicts. As a concrete example, this is +how a policy author could handle invalid clusters/conflicts: + +``` +package io.k8s.federation.admission + +errors["requested replica-set-preferences includes invalid clusters"] { + invalid_clusters = developer_clusters - policy_defined_clusters + invalid_clusters != set() +} + +annotations["replica-set-preferences"] = value { + value = developer_clusters & policy_defined_clusters +} + +# Not shown here: +# +# policy_defined_clusters[...] { ... } +# developer_clusters[...] { ... } +``` + +The admission controller will execute a query against +/io/k8s/federation/admission and if the policy detects an invalid cluster, the +"errors" key in the response will contain a non-empty array. In this case, the +admission controller will deny the request. + +```http +HTTP/1.1 200 OK +Content-Type: application/json +``` + +```json +{ + "result": { + "errors": [ + "requested replica-set-preferences includes invalid clusters" + ], + "annotations": { + "federation.kubernetes.io/replica-set-preferences": { + ... + } + } + } +} +``` + +This example shows how the policy could handle conflicts when the author's +intent is to define clusters that MAY be used. If the author's intent is to +define what clusters MUST be used, then the logic would not use intersection. + +#### Configuration + +The admission controller requires configuration for the OPA endpoint: + +``` +{ + "EnforceSchedulingPolicy": { + "url": “https://opa.federation.svc.cluster.local:8181/v1/data/io/k8s/federation/annotations”, + "token": "super-secret-token-value" + } +} +``` + +- `url` specifies the URL of the policy engine API to query. The query response + contains the annotations to apply to the resource. +- `token` specifies a static token to use for authentication when contacting the + policy engine. In the future, other authentication schemes may be supported. + +The configuration file is provided to the federation-apiserver with the +`--admission-control-config-file` command line argument. + +The admission controller is enabled in the federation-apiserver by providing the +`--admission-control` command line argument. E.g., +`--admission-control=AlwaysAdmit,EnforceSchedulingPolicy`. + +The admission controller will be enabled by default. + +#### Error Handling + +The admission controller is designed to **fail closed** if policies have been +created. + +Request handling may fail because of: + +- Serialization errors +- Request timeouts or other network errors +- Authentication or authorization errors from the policy engine +- Other unexpected errors from the policy engine + +In the event of request timeouts (or other network errors) or back-pressure +hints from the policy engine, the admission controller should retry after +applying a backoff. The admission controller should also create an event so that +developers can identify why their resources are not being scheduled. + +Policies are stored as ConfigMap resources in a well-known namespace. This +allows the admission controller to check if one or more policies exist. If one +or more policies exist, the admission controller will fail closed. Otherwise +the admission controller will **fail open**. + +### 2. Remediation of Resource Placement + +When policy changes or the environment in which resources are deployed changes +(e.g. a cluster’s PCI compliance rating gets up/down-graded), resources might +need to be moved for them to obey the placement policy. Sometimes administrators +may decide to remediate manually, other times they may want Kubernetes to +remediate automatically. + +To automatically reschedule resources onto desired clusters, we introduce a +remediator component (**opa-kube-sync**) that is deployed as a sidecar with OPA. + + + +The notifications sent to the remediator by OPA specify the new value for +annotations such as replica-set-preferences. + +When the remediator component (in the sidecar) receives the notification it +sends a PATCH request to the federation-apiserver to update the affected +resource. This way, the actual rebalancing of ReplicaSets is still handled by +the [Rescheduling +Algorithm](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/federated-replicasets.md) +in the Federated ReplicaSet controller. + +The remediator component must be deployed with a kubeconfig for the +federation-apiserver so that it can identify itself when sending the PATCH +requests. We can use the same mechanism that is used for the +federation-controller-manager (which also needs ot identify itself when sending +requests to the federation-apiserver.) + +### 3. Replication of Kubernetes Resources + +Administrators must be able to author policies that refer to properties of +Kubernetes resources. For example, assuming the following sample policy (in +English): + +> Certain apps must be deployed on Clusters in EU zones with sufficient PCI +> compliance. + +The policy definition must refer to the geographic region and PCI compliance +rating of federated clusters. Today, the geographic region is stored as an +attribute on the cluster resource and the PCI compliance rating is an example of +data that may be included in a label or annotation. + +When the policy engine is queried for a placement decision (e.g., by the +admission controller), it must have access to the data representing the +federated clusters. + +To provide OPA with the data representing federated clusters as well as other +Kubernetes resource types (such as federated ReplicaSets), we use a sidecar +container that is deployed alongside OPA. The sidecar (“opa-kube-sync”) is +responsible for replicating Kubernetes resources into OPA: + + + +The sidecar/replicator component will implement the (somewhat common) list/watch +pattern against the federation-apiserver: + +- Initially, it will GET all resources of a particular type. +- Subsequently, it will GET with the **watch** and **resourceVersion** + parameters set and process add, remove, update events accordingly. + +Each resource received by the sidecar/replicator component will be pushed into +OPA. The sidecar will likely rely on one of the existing Kubernetes Go client +libraries to handle the low-level list/watch behavior. + +As new resource types are introduced in the federation-apiserver, the +sidecar/replicator component will need to be updated to support them. As a +result, the sidecar/replicator component must be designed so that it is easy to +add support for new resource types. + +Eventually, the sidecar/replicator component may allow admins to configure which +resource types are replicated. As an optimization, the sidecar may eventually +analyze policies to determine which resource properties are requires for policy +evaluation. This would allow it to replicate the minimum amount of data into +OPA. + +### 4. Policy Management + +Policies are written in a text-based, declarative language supported by OPA. The +policies can be loaded into the policy engine either on startup or via HTTP +APIs. + +To avoid introducing additional persistent state, we propose storing policies +in ConfigMap resources in the Federation Control Plane inside a well-known +namespace (e.g., `kube-federationscheduling-policy`). The ConfigMap resources +will be replicated into the policy engine by the sidecar. + +The sidecar can establish a watch on the ConfigMap resources in the Federation +Control Plane. This will enable hot-reloading of policies whenever they change. + +## Applicability to Other Policy Engines + +This proposal was designed based on a POC with OPA, but it can be applied to +other policy engines as well. The admission and remediation components are +comprised of two main pieces of functionality: (i) applying annotation values to +federated resources and (ii) asking the policy engine for annotation values. The +code for applying annotation values is completely independent of the policy +engine. The code that asks the policy engine for annotation values happens both +within the admission and remediation components. In the POC, asking OPA for +annotation values amounts to a simple RESTful API call that any other policy +engine could implement. + +## Future Work + +- This proposal uses ConfigMaps to store and manage policies. In the future, we + want to introduce a first-class **Policy** API resource.
\ No newline at end of file |
