From 20c3b985b71af5cf2b03b751bf0dda925d7de482 Mon Sep 17 00:00:00 2001 From: Laszlo Janosi Date: Fri, 15 Jun 2018 09:26:26 +0000 Subject: New KEP document at keps/sig-network/0015-20180614-SCTP-support.md --- keps/sig-network/0015-20180614-SCTP-support.md | 291 +++++++++++++++++++++++++ 1 file changed, 291 insertions(+) create mode 100644 keps/sig-network/0015-20180614-SCTP-support.md diff --git a/keps/sig-network/0015-20180614-SCTP-support.md b/keps/sig-network/0015-20180614-SCTP-support.md new file mode 100644 index 00000000..ef3ac678 --- /dev/null +++ b/keps/sig-network/0015-20180614-SCTP-support.md @@ -0,0 +1,291 @@ +--- +kep-number: 15 +title: SCTP support +authors: + - "@janosi" +owning-sig: sig-network +participating-sigs: + - sig-cloud-provider +reviewers: + - "@thockin" +approvers: + - TBD +editor: TBD +creation-date: 2018-06-14 +last-updated: yyyy-mm-dd +status: provisional +see-also: + - PR64973 +replaces: +superseded-by: +--- + +# SCTP support + +## Table of Contents + +* [Table of Contents](#table-of-contents) +* [Summary](#summary) +* [Motivation](#motivation) + * [Goals](#goals) + * [Non-Goals](#non-goals) +* [Proposal](#proposal) + * [User Stories [optional]](#user-stories-optional) + * [Story 1](#story-1) + * [Story 2](#story-2) + * [Implementation Details/Notes/Constraints [optional]](#implementation-detailsnotesconstraints-optional) + * [Risks and Mitigations](#risks-and-mitigations) +* [Graduation Criteria](#graduation-criteria) +* [Implementation History](#implementation-history) +* [Drawbacks [optional]](#drawbacks-optional) +* [Alternatives [optional]](#alternatives-optional) + +[Tools for generating]: https://github.com/ekalinin/github-markdown-toc + +## Summary + +The goal of the SCTP support feature is to enable the usage of the SCTP protocol in Kubernetes [Service][] and [NetworkPolicy][] as an additional protocol option beside the current TCP and UDP options. +SCTP is an IETF protocol specified in [RFC4960][], and it is used widely in telecommunications network stacks. +Once SCTP support is added as a new protocol option for Service and NetworkPolicy those applications that require SCTP as L4 protocol on their interfaces can be deployed on Kubernetes clusters on a more straightforward way. For example they can use the native kube-dns based service discovery, and their communication can be controlled on the native NetworkPolicy way. + + + +[Service]: https://kubernetes.io/docs/concepts/services-networking/service/ +[NetworkPolicy]: https://kubernetes.io/docs/concepts/services-networking/network-policies/ +[RFC4960]: https://tools.ietf.org/html/rfc4960 + +## Motivation + +SCTP is a widely used protocol in telecommunications. It would ease the management and execution of telecommunication applications on Kubernetes if SCTP were added as a protocol option to Kubernetes Service and NetworkPolicy. + + +### Goals + +Add SCTP support to Kubernetes Service and NetworkPolicy, so applications running in pods can use the native kube-dns based service discovery for SCTP based services, and their communication can be controlled via the native NetworkPolicy way. + + + +### Non-Goals + +It is not a goal here to add SCTP support to load balancers that are provided by cloud providers. I.e. the Kubernetes user can define Services with type=LoadBalancer and Protocol=SCTP, but if the actual load balancer implementation does not support SCTP then the creation of the Service/load balancer fails. + + +## Proposal + + +### User Stories [optional] + + +#### Service with SCTP and Virtual IP +As a user of Kubernetes I want to define Services with Virtual IPs for my applications that use SCTP as L4 protocol on their interfaces,so client applications can use the services of my applications on top of SCTP via that Virtual IP. +Example: +``` +kind: Service +apiVersion: v1 +metadata: + name: my-service +spec: + selector: + app: MyApp + ports: + - protocol: SCTP + port: 80 + targetPort: 9376 +``` + +#### Headless Service with SCTP +As a user of Kubernetes I want to define headless Services for my applications that use SCTP as L4 protocol on their interfaces, so client applications can discover my applications in kube-dns, or via any other service discovery method that gets information about endpoints via the Kubernetes API. +Example: +``` +kind: Service +apiVersion: v1 +metadata: + name: my-service +spec: + selector: + app: MyApp + ClusterIP: "None" + ports: + - protocol: TCP + port: 80 + targetPort: 9376 +``` +#### Service with SCTP without selector +As a user of Kubernetes I want to define Services without selector for my applications that use SCTP as L4 protocol on their interfaces,so I can implement my own service controllers if I want to extend the basic functionality of Kubernetes. +Example: +``` +kind: Service +apiVersion: v1 +metadata: + name: my-service +spec: + ClusterIP: "None" + ports: + - protocol: TCP + port: 80 + targetPort: 9376 +``` +#### NetworkPolicy with SCTP +As a user of Kubernetes I want to define NetworPolicies for my applications that use SCTP as L4 protocol on their interfaces, so the network controllers that support SCTP can control the accessibility of my applications on the SCTP based interfaces, too. +Example: +``` +apiVersion: networking.k8s.io/v1 +kind: NetworkPolicy +metadata: + name: myservice-network-policy + namespace: myapp +spec: + podSelector: + matchLabels: + role: myservice + policyTypes: + - Ingress + - Egress + ingress: + - from: + - ipBlock: + cidr: 172.17.0.0/16 + except: + - 172.17.1.0/24 + - namespaceSelector: + matchLabels: + project: myproject + - podSelector: + matchLabels: + role: myclient + ports: + - protocol: SCTP + port: 7777 +``` +#### User space SCTP stack +As a user of Kubernetes I want to deploy and run my applications that use a user space SCTP stack. + +### Implementation Details/Notes/Constraints [optional] + + +#### SCTP in Services +The Kubernetes API modification for Services is obvious. +The selected port shall be reserved on the node, just like for TCP and UDP now. Unfortunately, golang does not have native SCTP support in the "net" package, so in order to access the kernel's SCTP API we have to introduce a new 3rd party package as a new vendor package. +For Services with type=LoadBalancer we have to check how the cloud provider implementations handle new protocols, and we have to make sure that if SCTP is not supported then the request for a new load balancer, firewall rule, etc. with protocol=SCTP is rejected gracefully. + +#### SCTP in NetworkPolicy +The Kubernetes API modification for the NetworkPolicy is obvious. +In order to utilize the new protocol value the network controller must support it. + +#### Interworking with applications that use a user space SCTP stack +A userspace SCTP stack implementation cannot work together with the SCTP kernel module (lksctp) on the same node. That is, the loading of the SCTP kernel module must be avoided on nodes where such applications that use userspace SCTP stack are planned to be run. The problem comes with the introduction of the SCTP protocol option for Services with Virtual IP: once such a service is created the relevant iptables/ipvs management logic kicks-in on every node, and as a consequence it loads the SCTP kernel module. There are some ideas how to solve this interworking problem: + +1. "-p sctp" is not used in the iptables rules, the processing of requests to the Virtual IP is executed purely based on the destination IP address and port. In case of ipvs the protocol is a mandatory parameter, so ipvs with SCTP rules cannot be used on the node where userspace SCTP applications should run. +2. Fall back to the user space proxy on those specific nodes. The user space proxy shall also use a user space SCTP stack, of course. Also the iptables rules that direct the client traffic to the userspace proxy must be created without the "-p sctp" option. + +In any case we shall be able to dedicate these nodes for those userspace SCTP applications, or at least, we must achieve that "regular" SCTP user applications are not deployed on these nodes. The solution proposal for this node separation: + +- there shall be a new kube-proxy parameter. If the parameter is set, the kube-proxy switches to this new mode of operation (described above) for SCTP services +- if the new kube-proxy parameter is set the node must be tainted with a new taint, so the scheduler places only such SCTP applications on this node that use userspace SCTP stack. We must avoid the deployment of "regular" SCTP users on this node. + +### Risks and Mitigations + + +## Graduation Criteria + + +## Implementation History + + +## Drawbacks [optional] + + +## Alternatives [optional] + + \ No newline at end of file -- cgit v1.2.3 From 106b8e757028428274b6c434354de38f70d45c52 Mon Sep 17 00:00:00 2001 From: Laszlo Janosi Date: Sat, 16 Jun 2018 19:10:47 +0000 Subject: protocol is mandatory in order to use port in iptables rules. That is, we cannot use the port value in iptables rules if the protocol is not defined --- keps/sig-network/0015-20180614-SCTP-support.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/keps/sig-network/0015-20180614-SCTP-support.md b/keps/sig-network/0015-20180614-SCTP-support.md index ef3ac678..fb793adf 100644 --- a/keps/sig-network/0015-20180614-SCTP-support.md +++ b/keps/sig-network/0015-20180614-SCTP-support.md @@ -242,7 +242,7 @@ In order to utilize the new protocol value the network controller must support i #### Interworking with applications that use a user space SCTP stack A userspace SCTP stack implementation cannot work together with the SCTP kernel module (lksctp) on the same node. That is, the loading of the SCTP kernel module must be avoided on nodes where such applications that use userspace SCTP stack are planned to be run. The problem comes with the introduction of the SCTP protocol option for Services with Virtual IP: once such a service is created the relevant iptables/ipvs management logic kicks-in on every node, and as a consequence it loads the SCTP kernel module. There are some ideas how to solve this interworking problem: -1. "-p sctp" is not used in the iptables rules, the processing of requests to the Virtual IP is executed purely based on the destination IP address and port. In case of ipvs the protocol is a mandatory parameter, so ipvs with SCTP rules cannot be used on the node where userspace SCTP applications should run. +1. "-p sctp" is not used in the iptables rules, the processing of requests to the Virtual IP is executed purely based on the destination IP address. In case of ipvs the protocol is a mandatory parameter, so ipvs with SCTP rules cannot be used on the node where userspace SCTP applications should run. 2. Fall back to the user space proxy on those specific nodes. The user space proxy shall also use a user space SCTP stack, of course. Also the iptables rules that direct the client traffic to the userspace proxy must be created without the "-p sctp" option. In any case we shall be able to dedicate these nodes for those userspace SCTP applications, or at least, we must achieve that "regular" SCTP user applications are not deployed on these nodes. The solution proposal for this node separation: -- cgit v1.2.3 From f368332fe011875d0c6110d0a31549f70f368d7e Mon Sep 17 00:00:00 2001 From: Laszlo Janosi Date: Fri, 22 Jun 2018 13:39:31 +0000 Subject: Modified based on the first comments. Userspace SCTP related part re-worked based on further investigations. --- keps/sig-network/0015-20180614-SCTP-support.md | 141 ++++--------------------- 1 file changed, 21 insertions(+), 120 deletions(-) diff --git a/keps/sig-network/0015-20180614-SCTP-support.md b/keps/sig-network/0015-20180614-SCTP-support.md index fb793adf..b03c917b 100644 --- a/keps/sig-network/0015-20180614-SCTP-support.md +++ b/keps/sig-network/0015-20180614-SCTP-support.md @@ -5,14 +5,14 @@ authors: - "@janosi" owning-sig: sig-network participating-sigs: - - sig-cloud-provider + - sig-network reviewers: - "@thockin" approvers: - - TBD + - "@thockin" editor: TBD creation-date: 2018-06-14 -last-updated: yyyy-mm-dd +last-updated: 2018-06-22 status: provisional see-also: - PR64973 @@ -21,47 +21,9 @@ superseded-by: --- # SCTP support - ## Table of Contents - + * [Table of Contents](#table-of-contents) * [Summary](#summary) * [Motivation](#motivation) @@ -78,22 +40,12 @@ A table of contents is helpful for quickly jumping to sections of a KEP and for * [Drawbacks [optional]](#drawbacks-optional) * [Alternatives [optional]](#alternatives-optional) -[Tools for generating]: https://github.com/ekalinin/github-markdown-toc - ## Summary The goal of the SCTP support feature is to enable the usage of the SCTP protocol in Kubernetes [Service][] and [NetworkPolicy][] as an additional protocol option beside the current TCP and UDP options. SCTP is an IETF protocol specified in [RFC4960][], and it is used widely in telecommunications network stacks. Once SCTP support is added as a new protocol option for Service and NetworkPolicy those applications that require SCTP as L4 protocol on their interfaces can be deployed on Kubernetes clusters on a more straightforward way. For example they can use the native kube-dns based service discovery, and their communication can be controlled on the native NetworkPolicy way. - - [Service]: https://kubernetes.io/docs/concepts/services-networking/service/ [NetworkPolicy]: https://kubernetes.io/docs/concepts/services-networking/network-policies/ [RFC4960]: https://tools.ietf.org/html/rfc4960 @@ -102,42 +54,19 @@ A good summary is probably at least a paragraph in length. SCTP is a widely used protocol in telecommunications. It would ease the management and execution of telecommunication applications on Kubernetes if SCTP were added as a protocol option to Kubernetes Service and NetworkPolicy. - ### Goals Add SCTP support to Kubernetes Service and NetworkPolicy, so applications running in pods can use the native kube-dns based service discovery for SCTP based services, and their communication can be controlled via the native NetworkPolicy way. - - ### Non-Goals It is not a goal here to add SCTP support to load balancers that are provided by cloud providers. I.e. the Kubernetes user can define Services with type=LoadBalancer and Protocol=SCTP, but if the actual load balancer implementation does not support SCTP then the creation of the Service/load balancer fails. +It is not a goal to support multi-homed SCTP associations. - ## Proposal - ### User Stories [optional] - #### Service with SCTP and Virtual IP As a user of Kubernetes I want to define Services with Virtual IPs for my applications that use SCTP as L4 protocol on their interfaces,so client applications can use the services of my applications on top of SCTP via that Virtual IP. Example: @@ -168,7 +97,7 @@ spec: app: MyApp ClusterIP: "None" ports: - - protocol: TCP + - protocol: SCTP port: 80 targetPort: 9376 ``` @@ -183,7 +112,7 @@ metadata: spec: ClusterIP: "None" ports: - - protocol: TCP + - protocol: SCTP port: 80 targetPort: 9376 ``` @@ -224,68 +153,40 @@ As a user of Kubernetes I want to deploy and run my applications that use a user ### Implementation Details/Notes/Constraints [optional] - #### SCTP in Services The Kubernetes API modification for Services is obvious. The selected port shall be reserved on the node, just like for TCP and UDP now. Unfortunately, golang does not have native SCTP support in the "net" package, so in order to access the kernel's SCTP API we have to introduce a new 3rd party package as a new vendor package. For Services with type=LoadBalancer we have to check how the cloud provider implementations handle new protocols, and we have to make sure that if SCTP is not supported then the request for a new load balancer, firewall rule, etc. with protocol=SCTP is rejected gracefully. +DNS shall support SRV records with "_sctp" as "proto" value. #### SCTP in NetworkPolicy The Kubernetes API modification for the NetworkPolicy is obvious. -In order to utilize the new protocol value the network controller must support it. +In order to utilize the new protocol value the network controller must support it. #### Interworking with applications that use a user space SCTP stack -A userspace SCTP stack implementation cannot work together with the SCTP kernel module (lksctp) on the same node. That is, the loading of the SCTP kernel module must be avoided on nodes where such applications that use userspace SCTP stack are planned to be run. The problem comes with the introduction of the SCTP protocol option for Services with Virtual IP: once such a service is created the relevant iptables/ipvs management logic kicks-in on every node, and as a consequence it loads the SCTP kernel module. There are some ideas how to solve this interworking problem: +A userspace SCTP stack implementation cannot work together with the SCTP kernel module (lksctp) on the same node. That is, the loading of the SCTP kernel module must be avoided on nodes where such applications that use userspace SCTP stack are planned to be run. The problem comes with the introduction of the SCTP protocol option for Services with Virtual IP: once such a service is created the relevant iptables/ipvs management logic kicks-in on every node, and as a consequence it loads the SCTP kernel module. +NOTE! It is not a new interworking problem between the userspace SCTP stack implementations and the SCTP kernel module. It is a known phenomenon. The solution has been to dedicate nodes to userspace SCTP applications, and ensure that on those nodes the SCTP kernel module is not loaded. -1. "-p sctp" is not used in the iptables rules, the processing of requests to the Virtual IP is executed purely based on the destination IP address. In case of ipvs the protocol is a mandatory parameter, so ipvs with SCTP rules cannot be used on the node where userspace SCTP applications should run. -2. Fall back to the user space proxy on those specific nodes. The user space proxy shall also use a user space SCTP stack, of course. Also the iptables rules that direct the client traffic to the userspace proxy must be created without the "-p sctp" option. +For this reason the main task here is to provide the same isolation possibility: i.e. to provide the option to dedicate some nodes to userspace SCTP applications and ensure that k8s does not load the SCTP kernel modules on those dedicated nodes. -In any case we shall be able to dedicate these nodes for those userspace SCTP applications, or at least, we must achieve that "regular" SCTP user applications are not deployed on these nodes. The solution proposal for this node separation: +As we can easily see, it is pretty easy to separate application pods that use a userspace SCTP stack from those application pods that use the kernel space SCTP stack: the usual nodeselector label based mechanism, or taints are for this very purpose. +The real challenge here is to ensure that when an SCTP Service is created in a k8s cluster the k8s logic does not create iptables or ipvs rules on those nodes that are dedicated for the applications that use userspace SCTP stack - because such an action would trigger the loading of the kernel module, but at the same time those applications that use userspace SCTP stack can still access the just created SCTP based Service vie the ClusterIP of that service - assuming that the new Service has ClusterIP allocated. There is no such challenge with regard to headless SCTP Services. -- there shall be a new kube-proxy parameter. If the parameter is set, the kube-proxy switches to this new mode of operation (described above) for SCTP services -- if the new kube-proxy parameter is set the node must be tainted with a new taint, so the scheduler places only such SCTP applications on this node that use userspace SCTP stack. We must avoid the deployment of "regular" SCTP users on this node. +This is how our way of thinking goes: +The first task is to provide a way to dedicate nodes to userspae SCTP application so, that k8s itself is aware of that role of those nodes. It may be achieved with a node level parameter - e.g. in kube-proxy. Based on that parameter the k8be-proxy would be aware of the role of the node and it would not apply iptables or ipvs rules for SCTP Services on the node. +If a node is dedicated for userspace SCTP applications then whatever proxy solution is to run on that node, that proxy shall use userspace SCTP as well. That is, on those nodes we need a userspace proxy for the SCTP Services. Whether this usespace proxy shall be an extension of the current kube-proxy, or rather it shall be a new independent proxy - it is to be discussed. We are aware of the plans of which goal is to remove the userspace part of kube-proxy - however, we think, that this situation is different from those where the userspace kube-proxy is used for TCP or UDP traffic. I.e. even if the current TCP/UDP related userspace logic is removed from the kube-proxy, the foundations of that could be re-used for this case. +The userspace proxy would follow then the current high level logic of the kube-proxy: it would listen on an IP address of the local node, and it would establish connections to the application pods that provide the service. +The next task is to ensure that the packets that are sent by applications to the ClusterIP end up in the userspace proxy. It requires the careful setup of iptables or ipvs rules on the node, so those do not trigger the loading of the SCTP kernel module. It means, that those rules cannot use filter on the actual protocol value (SCTP), i.e. we end up with rules that simply forward the ClusterIP to the local host IP on which the userspace proxy listens. The consequence is, that the Service definition can contain only SCTP Ports, TCP or UDP Ports should not be used in that Service definition. + +NOTE! The handling of TCP and UDP Services does not change on those dedicated nodes, i.e. the current iptables/ipvs/etc. mechanisms can be used for those ### Risks and Mitigations - ## Graduation Criteria - ## Implementation History - ## Drawbacks [optional] - ## Alternatives [optional] - \ No newline at end of file -- cgit v1.2.3 From fdf2fc20a322bef445720ca85ef067348351bcb1 Mon Sep 17 00:00:00 2001 From: janosi Date: Sat, 30 Jun 2018 22:40:40 +0200 Subject: Added alternatives to handle the userspace SCTP incompatibility --- keps/sig-network/0015-20180614-SCTP-support.md | 40 +++++++++++++++++++++----- 1 file changed, 33 insertions(+), 7 deletions(-) diff --git a/keps/sig-network/0015-20180614-SCTP-support.md b/keps/sig-network/0015-20180614-SCTP-support.md index b03c917b..6c4c5367 100644 --- a/keps/sig-network/0015-20180614-SCTP-support.md +++ b/keps/sig-network/0015-20180614-SCTP-support.md @@ -148,28 +148,34 @@ spec: - protocol: SCTP port: 7777 ``` -#### User space SCTP stack +#### Userspace SCTP stack As a user of Kubernetes I want to deploy and run my applications that use a user space SCTP stack. ### Implementation Details/Notes/Constraints [optional] #### SCTP in Services The Kubernetes API modification for Services is obvious. -The selected port shall be reserved on the node, just like for TCP and UDP now. Unfortunately, golang does not have native SCTP support in the "net" package, so in order to access the kernel's SCTP API we have to introduce a new 3rd party package as a new vendor package. +The selected port shall be reserved on the node, just like for TCP and UDP now. Unfortunately, golang does not have native SCTP support in the "net" package, so in order to access the kernel's SCTP API we have to introduce a new 3rd party package as a new vendor package. We plan to use the go sctp library from github.com/ishidawataru/sctp. For Services with type=LoadBalancer we have to check how the cloud provider implementations handle new protocols, and we have to make sure that if SCTP is not supported then the request for a new load balancer, firewall rule, etc. with protocol=SCTP is rejected gracefully. -DNS shall support SRV records with "_sctp" as "proto" value. +Kube DNS shall support SRV records with "_sctp" as "proto" value. According to our investigations, the DNS controller is very flexible from this perspective, and it can create SRV records with any protocol name. #### SCTP in NetworkPolicy The Kubernetes API modification for the NetworkPolicy is obvious. In order to utilize the new protocol value the network controller must support it. #### Interworking with applications that use a user space SCTP stack -A userspace SCTP stack implementation cannot work together with the SCTP kernel module (lksctp) on the same node. That is, the loading of the SCTP kernel module must be avoided on nodes where such applications that use userspace SCTP stack are planned to be run. The problem comes with the introduction of the SCTP protocol option for Services with Virtual IP: once such a service is created the relevant iptables/ipvs management logic kicks-in on every node, and as a consequence it loads the SCTP kernel module. -NOTE! It is not a new interworking problem between the userspace SCTP stack implementations and the SCTP kernel module. It is a known phenomenon. The solution has been to dedicate nodes to userspace SCTP applications, and ensure that on those nodes the SCTP kernel module is not loaded. +A userspace SCTP stack implementation cannot work together with the SCTP kernel module (lksctp) on the same node. That is, the loading of the SCTP kernel module must be avoided on nodes where such applications that use userspace SCTP stack are planned to be run. The problem comes with the introduction of the SCTP protocol option for Services with Virtual IP: once such a service is created the relevant iptables/ipvs management logic kicks-in on every node, and as a consequence it loads the SCTP kernel module on every nodes. It immediately ruins the connectivity of the userspace SCTP applications on those nodes. + +NOTE! It is not a new interworking problem between the userspace SCTP stack implementations and the SCTP kernel module. It is a known phenomenon. The userpace SCTP stack creates raw sockets with IPPROTO_SCTP. As it is clearly highlighted in the [documentation of raw sockets][]: +>Raw sockets may tap all IP protocols in Linux, even protocols like ICMP or TCP which have a protocol module in the kernel. In this case, the packets are passed to both the kernel module and the raw socket(s). + +I.e. it is the normal function of the [kernel][], that it sends the incoming packet to both sides: the raw socket and the kernel module. In this case the kernel module will handle those packets that are destined to the raw socket as Out of the blue (OOTB) packets according to the rules defined in the [RFC4960][]. + +The solution has been to dedicate nodes to userspace SCTP applications, and to ensure that on those nodes the SCTP kernel module is not loaded. For this reason the main task here is to provide the same isolation possibility: i.e. to provide the option to dedicate some nodes to userspace SCTP applications and ensure that k8s does not load the SCTP kernel modules on those dedicated nodes. -As we can easily see, it is pretty easy to separate application pods that use a userspace SCTP stack from those application pods that use the kernel space SCTP stack: the usual nodeselector label based mechanism, or taints are for this very purpose. +As we can easily see, it is pretty easy to separate application pods that use a userspace SCTP stack from those application pods that use the kernel space SCTP stack: the usual nodeselector label based mechanism, or taints are there for this very purpose. The real challenge here is to ensure that when an SCTP Service is created in a k8s cluster the k8s logic does not create iptables or ipvs rules on those nodes that are dedicated for the applications that use userspace SCTP stack - because such an action would trigger the loading of the kernel module, but at the same time those applications that use userspace SCTP stack can still access the just created SCTP based Service vie the ClusterIP of that service - assuming that the new Service has ClusterIP allocated. There is no such challenge with regard to headless SCTP Services. This is how our way of thinking goes: @@ -178,7 +184,27 @@ If a node is dedicated for userspace SCTP applications then whatever proxy solut The userspace proxy would follow then the current high level logic of the kube-proxy: it would listen on an IP address of the local node, and it would establish connections to the application pods that provide the service. The next task is to ensure that the packets that are sent by applications to the ClusterIP end up in the userspace proxy. It requires the careful setup of iptables or ipvs rules on the node, so those do not trigger the loading of the SCTP kernel module. It means, that those rules cannot use filter on the actual protocol value (SCTP), i.e. we end up with rules that simply forward the ClusterIP to the local host IP on which the userspace proxy listens. The consequence is, that the Service definition can contain only SCTP Ports, TCP or UDP Ports should not be used in that Service definition. -NOTE! The handling of TCP and UDP Services does not change on those dedicated nodes, i.e. the current iptables/ipvs/etc. mechanisms can be used for those +NOTE! The handling of TCP and UDP Services does not change on those dedicated nodes, i.e. the current iptables/ipvs/etc. mechanisms can be used for those. + +We propose the following alternatives here for consideration in the community: + +##### Documentation only +In this alternative we describe in the Kubernetes documentation the mutually exclusive nature of userspace and kernel space SCTP stacks, and we would highlight, that the new SCTP Service feature must not be used in those clusters where userspace SCTP stack based applications are deployed, and in turn, userspace SCTP stack based applications cannot be deployed in such clusters where kernel space SCTP stack based applications have already been deployed. + +##### There would not be a ClusterIP -> service backends proxy on the dedicated nodes +In this alternative we would implement the option to dedicate nodes for userspace SCTP applications, but we do not implement the userspace proxy. That is: +* there would be a kube-prpxy parameter that indicates to the kube-proxy that it must not create iptables or ipvs rules for SCTP Services on its local node +* there would not be a userspace proxy to direct traffic sent to the SCTP Service's ClusterIP to the actual service backends + +As userspace SCTP applications could not use the benefits of Kubernetes Services before this enhanceent, those anyway had to implement their own service discovery and SCTP traffic handling mechanisms. Following this assumption we can say, that if they continue using their current logic, they do not and will not obtain the ClusterIP from the KubeDNS, but instead they use an alternative way to find their peers, and they use some other ways for connecting to their peers - like e.g. connecting to the IP of their peers directly without any ClusterIP-like solution. That is, they will not miss the possibility to use the ClusterIP of their peers, and consequently they do not need a proxy solution on their local nodes. + +##### Dedicated nodes and userspace proxy +In this alternative we would implement all the tasks that we listed above: +* node dedication +* userspace SCTP proxy on the dedicated nodes + +[documentation of raw sockets]: http://man7.org/linux/man-pages/man7/raw.7.html +[kernel]: https://github.com/torvalds/linux/blob/0fbc4aeabc91f2e39e0dffebe8f81a0eb3648d97/net/ipv4/ip_input.c#L191 ### Risks and Mitigations -- cgit v1.2.3 From d0ad13a09e25b0291ed30b879082504fda540f88 Mon Sep 17 00:00:00 2001 From: Janosi Laszlo Date: Tue, 3 Jul 2018 13:51:58 +0200 Subject: Example DNS SRV record added. Userspace SCTP alternatives updated. --- keps/sig-network/0015-20180614-SCTP-support.md | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/keps/sig-network/0015-20180614-SCTP-support.md b/keps/sig-network/0015-20180614-SCTP-support.md index 6c4c5367..e2c7dde1 100644 --- a/keps/sig-network/0015-20180614-SCTP-support.md +++ b/keps/sig-network/0015-20180614-SCTP-support.md @@ -157,7 +157,11 @@ As a user of Kubernetes I want to deploy and run my applications that use a user The Kubernetes API modification for Services is obvious. The selected port shall be reserved on the node, just like for TCP and UDP now. Unfortunately, golang does not have native SCTP support in the "net" package, so in order to access the kernel's SCTP API we have to introduce a new 3rd party package as a new vendor package. We plan to use the go sctp library from github.com/ishidawataru/sctp. For Services with type=LoadBalancer we have to check how the cloud provider implementations handle new protocols, and we have to make sure that if SCTP is not supported then the request for a new load balancer, firewall rule, etc. with protocol=SCTP is rejected gracefully. -Kube DNS shall support SRV records with "_sctp" as "proto" value. According to our investigations, the DNS controller is very flexible from this perspective, and it can create SRV records with any protocol name. +Kube DNS shall support SRV records with "_sctp" as "proto" value. According to our investigations, the DNS controller is very flexible from this perspective, and it can create SRV records with any protocol name. Example: + +``` +_diameter._sctp.my-service.default.svc.cluster.local. 30 IN SRV 10 100 1234 my-service.default.svc.cluster.local. +``` #### SCTP in NetworkPolicy The Kubernetes API modification for the NetworkPolicy is obvious. @@ -186,17 +190,18 @@ The next task is to ensure that the packets that are sent by applications to the NOTE! The handling of TCP and UDP Services does not change on those dedicated nodes, i.e. the current iptables/ipvs/etc. mechanisms can be used for those. -We propose the following alternatives here for consideration in the community: +We propose the following alternatives for consideration in the community: ##### Documentation only -In this alternative we describe in the Kubernetes documentation the mutually exclusive nature of userspace and kernel space SCTP stacks, and we would highlight, that the new SCTP Service feature must not be used in those clusters where userspace SCTP stack based applications are deployed, and in turn, userspace SCTP stack based applications cannot be deployed in such clusters where kernel space SCTP stack based applications have already been deployed. +In this alternative we would describe in the Kubernetes documentation the mutually exclusive nature of userspace and kernel space SCTP stacks, and we would highlight, that the new SCTP Service feature must not be used in those clusters where userspace SCTP stack based applications are deployed, and in turn, userspace SCTP stack based applications cannot be deployed in such clusters where kernel space SCTP stack based applications have already been deployed. We would also highlight, that the usage of headless SCTP Services is allowed because such services do not trigger the creation of iptables/ipvs rules, thus those do not trigger the loading of the SCTP kernel module on every node. -##### There would not be a ClusterIP -> service backends proxy on the dedicated nodes +##### Dedicated nodes without ClusterIP proxy In this alternative we would implement the option to dedicate nodes for userspace SCTP applications, but we do not implement the userspace proxy. That is: * there would be a kube-prpxy parameter that indicates to the kube-proxy that it must not create iptables or ipvs rules for SCTP Services on its local node * there would not be a userspace proxy to direct traffic sent to the SCTP Service's ClusterIP to the actual service backends -As userspace SCTP applications could not use the benefits of Kubernetes Services before this enhanceent, those anyway had to implement their own service discovery and SCTP traffic handling mechanisms. Following this assumption we can say, that if they continue using their current logic, they do not and will not obtain the ClusterIP from the KubeDNS, but instead they use an alternative way to find their peers, and they use some other ways for connecting to their peers - like e.g. connecting to the IP of their peers directly without any ClusterIP-like solution. That is, they will not miss the possibility to use the ClusterIP of their peers, and consequently they do not need a proxy solution on their local nodes. +As userspace SCTP applications could not use the benefits of Kubernetes Services before this enhancement, those anyway had to implement their own service discovery and SCTP traffic handling mechanisms. Following this assumption we can say, that if they continue using their current logic, they do not and will not obtain the ClusterIP from the KubeDNS, but instead they use an alternative way to find their peers, and they use some other ways for connecting to their peers - like e.g. connecting to the IP of their peers directly without any ClusterIP-like solution. That is, they will not miss the possibility to use the ClusterIP of their peers, and consequently they do not need a proxy solution on their local nodes. +Also we must note here, that even those userspace SCTP applications can enjoy the benefits of having the peer SCTP endpoints in KubeDNS, and the benefits of having the relevant Service/Endpoint information on the Kubernetes API. For example, they can replace their own service discovery mechanisms with a KubeDNS based one, their custom controllers (if any) can use the state reports of SCTP Services/Endpoints via the Kubernetes API. ##### Dedicated nodes and userspace proxy In this alternative we would implement all the tasks that we listed above: -- cgit v1.2.3 From 3bb64044f5e57ac1ec3b45e815f282e6bd4ca9ad Mon Sep 17 00:00:00 2001 From: Janosi Laszlo Date: Wed, 4 Jul 2018 15:49:22 +0200 Subject: It turned out that iptables and ipvs does not trigger the loading of the SCTP kernel module. The proposed solution is updated accordingly, and it became a lot simpler. Also the usage of SCTP as protocol value in the Pod/container descriptor is described in the document now. --- keps/sig-network/0015-20180614-SCTP-support.md | 74 +++++++++++++++++--------- 1 file changed, 49 insertions(+), 25 deletions(-) diff --git a/keps/sig-network/0015-20180614-SCTP-support.md b/keps/sig-network/0015-20180614-SCTP-support.md index e2c7dde1..11ceffb8 100644 --- a/keps/sig-network/0015-20180614-SCTP-support.md +++ b/keps/sig-network/0015-20180614-SCTP-support.md @@ -42,9 +42,9 @@ superseded-by: ## Summary -The goal of the SCTP support feature is to enable the usage of the SCTP protocol in Kubernetes [Service][] and [NetworkPolicy][] as an additional protocol option beside the current TCP and UDP options. +The goal of the SCTP support feature is to enable the usage of the SCTP protocol in Kubernetes Pod container port, [Service][] and [NetworkPolicy][] value as an additional protocol option beside the current TCP and UDP options. SCTP is an IETF protocol specified in [RFC4960][], and it is used widely in telecommunications network stacks. -Once SCTP support is added as a new protocol option for Service and NetworkPolicy those applications that require SCTP as L4 protocol on their interfaces can be deployed on Kubernetes clusters on a more straightforward way. For example they can use the native kube-dns based service discovery, and their communication can be controlled on the native NetworkPolicy way. +Once SCTP support is added as a new protocol option for Service, container port, and NetworkPolicy those applications that require SCTP as L4 protocol on their interfaces can be deployed on Kubernetes clusters on a more straightforward way. For example they can use the native kube-dns based service discovery, and their communication can be controlled on the native NetworkPolicy way. [Service]: https://kubernetes.io/docs/concepts/services-networking/service/ [NetworkPolicy]: https://kubernetes.io/docs/concepts/services-networking/network-policies/ @@ -52,15 +52,16 @@ Once SCTP support is added as a new protocol option for Service and NetworkPolic ## Motivation -SCTP is a widely used protocol in telecommunications. It would ease the management and execution of telecommunication applications on Kubernetes if SCTP were added as a protocol option to Kubernetes Service and NetworkPolicy. +SCTP is a widely used protocol in telecommunications. It would ease the management and execution of telecommunication applications on Kubernetes if SCTP were added as a protocol option to Kubernetes container port, Service and NetworkPolicy. ### Goals -Add SCTP support to Kubernetes Service and NetworkPolicy, so applications running in pods can use the native kube-dns based service discovery for SCTP based services, and their communication can be controlled via the native NetworkPolicy way. +Add SCTP support to Kubernetes container port, Service and NetworkPolicy, so applications running in pods can use the native kube-dns based service discovery for SCTP based services, they can define container ports for their SCTP based interfaces, and their communication can be controlled via the native NetworkPolicy way. ### Non-Goals It is not a goal here to add SCTP support to load balancers that are provided by cloud providers. I.e. the Kubernetes user can define Services with type=LoadBalancer and Protocol=SCTP, but if the actual load balancer implementation does not support SCTP then the creation of the Service/load balancer fails. + It is not a goal to support multi-homed SCTP associations. ## Proposal @@ -116,6 +117,24 @@ spec: port: 80 targetPort: 9376 ``` + +#### SCTP as container port protocol in Pod definition +As a user of Kubernetes I want to define hostPort based port mappings for the SCTP based interfaces of my applications +Example: +``` +apiVersion: v1 +kind: Pod +metadata: + name: mypod +spec: + containers: + - name: container-1 + image: mycontainerimg + ports: + - name: diameter + protocol: SCTP +``` + #### NetworkPolicy with SCTP As a user of Kubernetes I want to define NetworPolicies for my applications that use SCTP as L4 protocol on their interfaces, so the network controllers that support SCTP can control the accessibility of my applications on the SCTP based interfaces, too. Example: @@ -149,26 +168,34 @@ spec: port: 7777 ``` #### Userspace SCTP stack -As a user of Kubernetes I want to deploy and run my applications that use a user space SCTP stack. +As a user of Kubernetes I want to deploy and run my applications that use a user space SCTP stack, and at the same time I want to define SCTP Services in the same cluster. ### Implementation Details/Notes/Constraints [optional] #### SCTP in Services The Kubernetes API modification for Services is obvious. -The selected port shall be reserved on the node, just like for TCP and UDP now. Unfortunately, golang does not have native SCTP support in the "net" package, so in order to access the kernel's SCTP API we have to introduce a new 3rd party package as a new vendor package. We plan to use the go sctp library from github.com/ishidawataru/sctp. + +The selected port shall be reserved on the node, just like for TCP and UDP now. Unfortunately, golang does not have native SCTP support in the "net" package, so in order to access the kernel's SCTP API we have to introduce a new 3rd party package as a new vendor package. We plan to use the go sctp library from github.com/ishidawataru/sctp. + For Services with type=LoadBalancer we have to check how the cloud provider implementations handle new protocols, and we have to make sure that if SCTP is not supported then the request for a new load balancer, firewall rule, etc. with protocol=SCTP is rejected gracefully. + Kube DNS shall support SRV records with "_sctp" as "proto" value. According to our investigations, the DNS controller is very flexible from this perspective, and it can create SRV records with any protocol name. Example: ``` _diameter._sctp.my-service.default.svc.cluster.local. 30 IN SRV 10 100 1234 my-service.default.svc.cluster.local. ``` +#### SCTP in the Pod's container port +The Kubernetes API modification for the Pod is obvious. + +The selected port shall be reserved on the node, just like for TCP and UDP now. Unfortunately, golang does not have native SCTP support in the "net" package, so in order to access the kernel's SCTP API we have to introduce a new 3rd party package as a new vendor package. We plan to use the go sctp library from github.com/ishidawataru/sctp. #### SCTP in NetworkPolicy The Kubernetes API modification for the NetworkPolicy is obvious. + In order to utilize the new protocol value the network controller must support it. #### Interworking with applications that use a user space SCTP stack -A userspace SCTP stack implementation cannot work together with the SCTP kernel module (lksctp) on the same node. That is, the loading of the SCTP kernel module must be avoided on nodes where such applications that use userspace SCTP stack are planned to be run. The problem comes with the introduction of the SCTP protocol option for Services with Virtual IP: once such a service is created the relevant iptables/ipvs management logic kicks-in on every node, and as a consequence it loads the SCTP kernel module on every nodes. It immediately ruins the connectivity of the userspace SCTP applications on those nodes. +A userspace SCTP stack implementation cannot work together with the SCTP kernel module (lksctp) on the same node. That is, the loading of the SCTP kernel module must be avoided on nodes where such applications that use userspace SCTP stack are planned to be run. The problem comes with the introduction of the SCTP protocol option for Services with Virtual IP: once such a service is created the relevant port reservation logic kicks-in on every node, it starts listening on the port, and as a consequence it loads the SCTP kernel module on every nodes. It immediately ruins the connectivity of the userspace SCTP applications on those nodes. NOTE! It is not a new interworking problem between the userspace SCTP stack implementations and the SCTP kernel module. It is a known phenomenon. The userpace SCTP stack creates raw sockets with IPPROTO_SCTP. As it is clearly highlighted in the [documentation of raw sockets][]: >Raw sockets may tap all IP protocols in Linux, even protocols like ICMP or TCP which have a protocol module in the kernel. In this case, the packets are passed to both the kernel module and the raw socket(s). @@ -179,34 +206,31 @@ The solution has been to dedicate nodes to userspace SCTP applications, and to e For this reason the main task here is to provide the same isolation possibility: i.e. to provide the option to dedicate some nodes to userspace SCTP applications and ensure that k8s does not load the SCTP kernel modules on those dedicated nodes. -As we can easily see, it is pretty easy to separate application pods that use a userspace SCTP stack from those application pods that use the kernel space SCTP stack: the usual nodeselector label based mechanism, or taints are there for this very purpose. -The real challenge here is to ensure that when an SCTP Service is created in a k8s cluster the k8s logic does not create iptables or ipvs rules on those nodes that are dedicated for the applications that use userspace SCTP stack - because such an action would trigger the loading of the kernel module, but at the same time those applications that use userspace SCTP stack can still access the just created SCTP based Service vie the ClusterIP of that service - assuming that the new Service has ClusterIP allocated. There is no such challenge with regard to headless SCTP Services. +As we can easily see, it is pretty easy to separate application pods that use a userspace SCTP stack from those application pods that use the kernel space SCTP stack: the usual nodeselector label based mechanism, or taints are there for this very purpose. + +The real challenge here is to ensure that when an SCTP Service is created in a k8s cluster the k8s logic does not create listening SCTP sockets on those nodes that are dedicated for the applications that use userspace SCTP stack - because such an action would trigger the loading of the kernel module. + +There is no such challenge with regard to headless SCTP Services. This is how our way of thinking goes: -The first task is to provide a way to dedicate nodes to userspae SCTP application so, that k8s itself is aware of that role of those nodes. It may be achieved with a node level parameter - e.g. in kube-proxy. Based on that parameter the k8be-proxy would be aware of the role of the node and it would not apply iptables or ipvs rules for SCTP Services on the node. -If a node is dedicated for userspace SCTP applications then whatever proxy solution is to run on that node, that proxy shall use userspace SCTP as well. That is, on those nodes we need a userspace proxy for the SCTP Services. Whether this usespace proxy shall be an extension of the current kube-proxy, or rather it shall be a new independent proxy - it is to be discussed. We are aware of the plans of which goal is to remove the userspace part of kube-proxy - however, we think, that this situation is different from those where the userspace kube-proxy is used for TCP or UDP traffic. I.e. even if the current TCP/UDP related userspace logic is removed from the kube-proxy, the foundations of that could be re-used for this case. -The userspace proxy would follow then the current high level logic of the kube-proxy: it would listen on an IP address of the local node, and it would establish connections to the application pods that provide the service. -The next task is to ensure that the packets that are sent by applications to the ClusterIP end up in the userspace proxy. It requires the careful setup of iptables or ipvs rules on the node, so those do not trigger the loading of the SCTP kernel module. It means, that those rules cannot use filter on the actual protocol value (SCTP), i.e. we end up with rules that simply forward the ClusterIP to the local host IP on which the userspace proxy listens. The consequence is, that the Service definition can contain only SCTP Ports, TCP or UDP Ports should not be used in that Service definition. -NOTE! The handling of TCP and UDP Services does not change on those dedicated nodes, i.e. the current iptables/ipvs/etc. mechanisms can be used for those. +The first task is to provide a way to dedicate nodes to userspae SCTP application so, that k8s itself is aware of that role of those nodes. It may be achieved with a node level parameter. Based on that parameter the kube-proxy would be aware of the role of the node and it would not create listening SCTP sockets for SCTP Services on the node. + +NOTE! The handling of TCP and UDP Services does not change on those dedicated nodes. + +NOTE! When the user defines SCTP ports to a container in a Pod definition that triggers the creation of a listening SCTP socket (and thus the loading of the SCTP kernel module) only on those nodes to which the pod is scheduled - i.e. the regular node selectors and taints can be used to avoid the collision of userspace SCTP stacks with the SCTP kernel module. We propose the following alternatives for consideration in the community: ##### Documentation only -In this alternative we would describe in the Kubernetes documentation the mutually exclusive nature of userspace and kernel space SCTP stacks, and we would highlight, that the new SCTP Service feature must not be used in those clusters where userspace SCTP stack based applications are deployed, and in turn, userspace SCTP stack based applications cannot be deployed in such clusters where kernel space SCTP stack based applications have already been deployed. We would also highlight, that the usage of headless SCTP Services is allowed because such services do not trigger the creation of iptables/ipvs rules, thus those do not trigger the loading of the SCTP kernel module on every node. +In this alternative we would describe in the Kubernetes documentation the mutually exclusive nature of userspace and kernel space SCTP stacks, and we would highlight, that the new SCTP Service feature must not be used in those clusters where userspace SCTP stack based applications are deployed, and in turn, userspace SCTP stack based applications cannot be deployed in such clusters where kernel space SCTP stack based applications have already been deployed. We would also highlight, that the usage of headless SCTP Services is allowed because such services do not trigger the creation of listening SCTP sockets, thus those do not trigger the loading of the SCTP kernel module on every node. + +We would also describe that SCTP must not be used as protocol value in the Pod/container definition for those applications that use a userspace SCTP stack. -##### Dedicated nodes without ClusterIP proxy -In this alternative we would implement the option to dedicate nodes for userspace SCTP applications, but we do not implement the userspace proxy. That is: -* there would be a kube-prpxy parameter that indicates to the kube-proxy that it must not create iptables or ipvs rules for SCTP Services on its local node -* there would not be a userspace proxy to direct traffic sent to the SCTP Service's ClusterIP to the actual service backends +##### A node level parameter to dedicate nodes for userspace SCTP applications -As userspace SCTP applications could not use the benefits of Kubernetes Services before this enhancement, those anyway had to implement their own service discovery and SCTP traffic handling mechanisms. Following this assumption we can say, that if they continue using their current logic, they do not and will not obtain the ClusterIP from the KubeDNS, but instead they use an alternative way to find their peers, and they use some other ways for connecting to their peers - like e.g. connecting to the IP of their peers directly without any ClusterIP-like solution. That is, they will not miss the possibility to use the ClusterIP of their peers, and consequently they do not need a proxy solution on their local nodes. -Also we must note here, that even those userspace SCTP applications can enjoy the benefits of having the peer SCTP endpoints in KubeDNS, and the benefits of having the relevant Service/Endpoint information on the Kubernetes API. For example, they can replace their own service discovery mechanisms with a KubeDNS based one, their custom controllers (if any) can use the state reports of SCTP Services/Endpoints via the Kubernetes API. +In this alternative we would implement all the tasks that we listed above, i.e. a node level parameter based on which the kube-proxy logic can skip the creation of listening SCTP sockets on the affected nodes. -##### Dedicated nodes and userspace proxy -In this alternative we would implement all the tasks that we listed above: -* node dedication -* userspace SCTP proxy on the dedicated nodes [documentation of raw sockets]: http://man7.org/linux/man-pages/man7/raw.7.html [kernel]: https://github.com/torvalds/linux/blob/0fbc4aeabc91f2e39e0dffebe8f81a0eb3648d97/net/ipv4/ip_input.c#L191 -- cgit v1.2.3 From 683f9a0192ba0067a7935835319bff5a3028c32d Mon Sep 17 00:00:00 2001 From: Janosi Laszlo Date: Thu, 12 Jul 2018 08:09:09 +0200 Subject: Handling of Services with type=LoadBalaner changed. Support of HostPort with SCTP is clarified (not supported). --- keps/sig-network/0015-20180614-SCTP-support.md | 65 +++++++++++++++----------- 1 file changed, 38 insertions(+), 27 deletions(-) diff --git a/keps/sig-network/0015-20180614-SCTP-support.md b/keps/sig-network/0015-20180614-SCTP-support.md index 11ceffb8..7cbbc712 100644 --- a/keps/sig-network/0015-20180614-SCTP-support.md +++ b/keps/sig-network/0015-20180614-SCTP-support.md @@ -42,34 +42,41 @@ superseded-by: ## Summary -The goal of the SCTP support feature is to enable the usage of the SCTP protocol in Kubernetes Pod container port, [Service][] and [NetworkPolicy][] value as an additional protocol option beside the current TCP and UDP options. +The goal of the SCTP support feature is to enable the usage of the SCTP protocol in Kubernetes [Service][], [NetworkPolicy][], and [ContainerPort][]as an additional protocol value option beside the current TCP and UDP options. SCTP is an IETF protocol specified in [RFC4960][], and it is used widely in telecommunications network stacks. -Once SCTP support is added as a new protocol option for Service, container port, and NetworkPolicy those applications that require SCTP as L4 protocol on their interfaces can be deployed on Kubernetes clusters on a more straightforward way. For example they can use the native kube-dns based service discovery, and their communication can be controlled on the native NetworkPolicy way. +Once SCTP support is added as a new protocol option those applications that require SCTP as L4 protocol on their interfaces can be deployed on Kubernetes clusters on a more straightforward way. For example they can use the native kube-dns based service discovery, and their communication can be controlled on the native NetworkPolicy way. [Service]: https://kubernetes.io/docs/concepts/services-networking/service/ -[NetworkPolicy]: https://kubernetes.io/docs/concepts/services-networking/network-policies/ +[NetworkPolicy]: +https://kubernetes.io/docs/concepts/services-networking/network-policies/ +[ContainerPort]:https://kubernetes.io/docs/concepts/services-networking/connect-applications-service/#exposing-pods-to-the-cluster [RFC4960]: https://tools.ietf.org/html/rfc4960 + ## Motivation -SCTP is a widely used protocol in telecommunications. It would ease the management and execution of telecommunication applications on Kubernetes if SCTP were added as a protocol option to Kubernetes container port, Service and NetworkPolicy. +SCTP is a widely used protocol in telecommunications. It would ease the management and execution of telecommunication applications on Kubernetes if SCTP were added as a protocol option to Kubernetes. ### Goals -Add SCTP support to Kubernetes container port, Service and NetworkPolicy, so applications running in pods can use the native kube-dns based service discovery for SCTP based services, they can define container ports for their SCTP based interfaces, and their communication can be controlled via the native NetworkPolicy way. +Add SCTP support to Kubernetes ContainerPort, Service and NetworkPolicy, so applications running in pods can use the native kube-dns based service discovery for SCTP based services, and their communication can be controlled via the native NetworkPolicy way. ### Non-Goals -It is not a goal here to add SCTP support to load balancers that are provided by cloud providers. I.e. the Kubernetes user can define Services with type=LoadBalancer and Protocol=SCTP, but if the actual load balancer implementation does not support SCTP then the creation of the Service/load balancer fails. +It is not a goal here to add SCTP support to load balancers that are provided by cloud providers. + +It is not a goal to support multi-homed SCTP associations. Such a support also depends on the ability to manage multiple IP addresses for a pod, and in the case of Services with ClusterIP or NodePort the support of multi-homed assocations would also require the support of NAT for multihomed associations in iptables/ipvs. -It is not a goal to support multi-homed SCTP associations. +It is not a goal to support SCTP as protocol value for the container's HostPort. The reason: [the usage of HostPort is not recommended by Kubernetes][], and to ensure proper interworking of HostPort with userspace SCTP stacks (see below) would require an additional kubelet/kubenet configuration option. In order to keep the complexity and impact of the introduction of SCTP on a lower level we do not plan to support SCTP as new protocol value for HostPort. +[the usage of HostPort is not recommended by Kubernetes]:https://kubernetes.io/docs/concepts/configuration/overview/#services ## Proposal ### User Stories [optional] #### Service with SCTP and Virtual IP As a user of Kubernetes I want to define Services with Virtual IPs for my applications that use SCTP as L4 protocol on their interfaces,so client applications can use the services of my applications on top of SCTP via that Virtual IP. + Example: ``` kind: Service @@ -87,6 +94,7 @@ spec: #### Headless Service with SCTP As a user of Kubernetes I want to define headless Services for my applications that use SCTP as L4 protocol on their interfaces, so client applications can discover my applications in kube-dns, or via any other service discovery method that gets information about endpoints via the Kubernetes API. + Example: ``` kind: Service @@ -103,7 +111,8 @@ spec: targetPort: 9376 ``` #### Service with SCTP without selector -As a user of Kubernetes I want to define Services without selector for my applications that use SCTP as L4 protocol on their interfaces,so I can implement my own service controllers if I want to extend the basic functionality of Kubernetes. +As a user of Kubernetes I want to define Services without selector for my applications that use SCTP as L4 protocol on their interfaces, so I can implement my own service controllers if I want to extend the basic functionality of Kubernetes. + Example: ``` kind: Service @@ -119,7 +128,7 @@ spec: ``` #### SCTP as container port protocol in Pod definition -As a user of Kubernetes I want to define hostPort based port mappings for the SCTP based interfaces of my applications +As a user of Kubernetes I want to define containerPorts for the SCTP based interfaces of my applications Example: ``` apiVersion: v1 @@ -133,10 +142,12 @@ spec: ports: - name: diameter protocol: SCTP + containerPort: 80 ``` #### NetworkPolicy with SCTP -As a user of Kubernetes I want to define NetworPolicies for my applications that use SCTP as L4 protocol on their interfaces, so the network controllers that support SCTP can control the accessibility of my applications on the SCTP based interfaces, too. +As a user of Kubernetes I want to define NetworkPolicies for my applications that use SCTP as L4 protocol on their interfaces, so the network controllers that support SCTP can control the accessibility of my applications on the SCTP based interfaces, too. + Example: ``` apiVersion: networking.k8s.io/v1 @@ -168,26 +179,28 @@ spec: port: 7777 ``` #### Userspace SCTP stack -As a user of Kubernetes I want to deploy and run my applications that use a user space SCTP stack, and at the same time I want to define SCTP Services in the same cluster. +As a user of Kubernetes I want to deploy and run my applications that use a userspace SCTP stack, and at the same time I want to define SCTP Services in the same cluster. ### Implementation Details/Notes/Constraints [optional] #### SCTP in Services The Kubernetes API modification for Services is obvious. -The selected port shall be reserved on the node, just like for TCP and UDP now. Unfortunately, golang does not have native SCTP support in the "net" package, so in order to access the kernel's SCTP API we have to introduce a new 3rd party package as a new vendor package. We plan to use the go sctp library from github.com/ishidawataru/sctp. +In case of Servies with ClusterIP or NodePort or externalIP the selected port shall be reserved on the respective nodes, just like for TCP and UDP currently. Unfortunately, golang does not have native SCTP support in the "net" package, so in order to reserving those ports via the kernel's SCTP API we have to introduce a new 3rd party package as a new vendor package. We plan to use the go sctp library from github.com/ishidawataru/sctp. -For Services with type=LoadBalancer we have to check how the cloud provider implementations handle new protocols, and we have to make sure that if SCTP is not supported then the request for a new load balancer, firewall rule, etc. with protocol=SCTP is rejected gracefully. +For Services with type=LoadBalancer we reject the Service creation request for SCTP services at API validation time. -Kube DNS shall support SRV records with "_sctp" as "proto" value. According to our investigations, the DNS controller is very flexible from this perspective, and it can create SRV records with any protocol name. Example: +Kube DNS shall support SRV records with "_sctp" as "proto" value. According to our investigations, the DNS controller is very flexible from this perspective, and it can create SRV records with any protocol name. I.e. there is no need for additional implementation to achieve this goal. + +Example: ``` _diameter._sctp.my-service.default.svc.cluster.local. 30 IN SRV 10 100 1234 my-service.default.svc.cluster.local. ``` -#### SCTP in the Pod's container port +#### SCTP in the Pod's ContainerPort The Kubernetes API modification for the Pod is obvious. -The selected port shall be reserved on the node, just like for TCP and UDP now. Unfortunately, golang does not have native SCTP support in the "net" package, so in order to access the kernel's SCTP API we have to introduce a new 3rd party package as a new vendor package. We plan to use the go sctp library from github.com/ishidawataru/sctp. +We reject the pod creation request for pods that have containers with the combination of a hostPort and SCTP as protocol at API validation time. #### SCTP in NetworkPolicy The Kubernetes API modification for the NetworkPolicy is obvious. @@ -195,37 +208,35 @@ The Kubernetes API modification for the NetworkPolicy is obvious. In order to utilize the new protocol value the network controller must support it. #### Interworking with applications that use a user space SCTP stack -A userspace SCTP stack implementation cannot work together with the SCTP kernel module (lksctp) on the same node. That is, the loading of the SCTP kernel module must be avoided on nodes where such applications that use userspace SCTP stack are planned to be run. The problem comes with the introduction of the SCTP protocol option for Services with Virtual IP: once such a service is created the relevant port reservation logic kicks-in on every node, it starts listening on the port, and as a consequence it loads the SCTP kernel module on every nodes. It immediately ruins the connectivity of the userspace SCTP applications on those nodes. +A userspace SCTP stack implementation cannot work together with the SCTP kernel module (lksctp) on the same node. That is, the loading of the SCTP kernel module must be avoided on nodes where such applications that use userspace SCTP stack are planned to be run. The problem comes with the introduction of the SCTP protocol option for Services with Virtual IP (the "type" of the Service is ClusterIP or NodePort): once such a service is created the relevant port reservation logic kicks-in on every node in the cluster, it starts listening on the port, and as a consequence it loads the SCTP kernel module on every nodes. It immediately ruins the connectivity of the userspace SCTP applications on those nodes. + +The same interworking problem stands for Services with "externalIP" defined. NOTE! It is not a new interworking problem between the userspace SCTP stack implementations and the SCTP kernel module. It is a known phenomenon. The userpace SCTP stack creates raw sockets with IPPROTO_SCTP. As it is clearly highlighted in the [documentation of raw sockets][]: >Raw sockets may tap all IP protocols in Linux, even protocols like ICMP or TCP which have a protocol module in the kernel. In this case, the packets are passed to both the kernel module and the raw socket(s). -I.e. it is the normal function of the [kernel][], that it sends the incoming packet to both sides: the raw socket and the kernel module. In this case the kernel module will handle those packets that are destined to the raw socket as Out of the blue (OOTB) packets according to the rules defined in the [RFC4960][]. +I.e. it is the normal function of the [kernel][], that it sends the incoming packet to both sides: the raw socket and relevant the kernel module. In this case the kernel module will handle those packets that are destined to the raw socket as Out of the blue (OOTB) packets according to the rules defined in the [RFC4960][]. -The solution has been to dedicate nodes to userspace SCTP applications, and to ensure that on those nodes the SCTP kernel module is not loaded. +In order to resolve this problem the solution has been to dedicate nodes to userspace SCTP applications, and to ensure that on those nodes the SCTP kernel module is not loaded. -For this reason the main task here is to provide the same isolation possibility: i.e. to provide the option to dedicate some nodes to userspace SCTP applications and ensure that k8s does not load the SCTP kernel modules on those dedicated nodes. +For this reason the main task here is to provide the same isolation possibility: i.e. to provide the option to dedicate some nodes to userspace SCTP applications and ensure that the actions performed by Kubernetes do not load the SCTP kernel modules on those dedicated nodes. As we can easily see, it is pretty easy to separate application pods that use a userspace SCTP stack from those application pods that use the kernel space SCTP stack: the usual nodeselector label based mechanism, or taints are there for this very purpose. -The real challenge here is to ensure that when an SCTP Service is created in a k8s cluster the k8s logic does not create listening SCTP sockets on those nodes that are dedicated for the applications that use userspace SCTP stack - because such an action would trigger the loading of the kernel module. +The real challenge here is to ensure that when an SCTP Service is created in a Kubernetes cluster the Kubernetes logic does not create listening SCTP sockets on those nodes that are dedicated for the applications that use userspace SCTP stack - because such an action would trigger the loading of the kernel module. There is no such challenge with regard to headless SCTP Services. This is how our way of thinking goes: -The first task is to provide a way to dedicate nodes to userspae SCTP application so, that k8s itself is aware of that role of those nodes. It may be achieved with a node level parameter. Based on that parameter the kube-proxy would be aware of the role of the node and it would not create listening SCTP sockets for SCTP Services on the node. +The first task is to provide a way to dedicate nodes to userspae SCTP application so, that Kubernetes itself is aware of that role of those nodes. It may be achieved with a node level parameter. Based on that parameter the kube-proxy would be aware of the role of the node and it would not create listening SCTP sockets for SCTP Services on the node. NOTE! The handling of TCP and UDP Services does not change on those dedicated nodes. -NOTE! When the user defines SCTP ports to a container in a Pod definition that triggers the creation of a listening SCTP socket (and thus the loading of the SCTP kernel module) only on those nodes to which the pod is scheduled - i.e. the regular node selectors and taints can be used to avoid the collision of userspace SCTP stacks with the SCTP kernel module. - We propose the following alternatives for consideration in the community: ##### Documentation only -In this alternative we would describe in the Kubernetes documentation the mutually exclusive nature of userspace and kernel space SCTP stacks, and we would highlight, that the new SCTP Service feature must not be used in those clusters where userspace SCTP stack based applications are deployed, and in turn, userspace SCTP stack based applications cannot be deployed in such clusters where kernel space SCTP stack based applications have already been deployed. We would also highlight, that the usage of headless SCTP Services is allowed because such services do not trigger the creation of listening SCTP sockets, thus those do not trigger the loading of the SCTP kernel module on every node. - -We would also describe that SCTP must not be used as protocol value in the Pod/container definition for those applications that use a userspace SCTP stack. +In this alternative we would describe in the Kubernetes documentation the mutually exclusive nature of userspace and kernel space SCTP stacks, and we would highlight, that the new SCTP Service feature must not be used in those clusters where userspace SCTP stack based applications are deployed, and in turn, userspace SCTP stack based applications cannot be deployed in such clusters where kernel space SCTP stack based applications have already been deployed. We would also highlight, that the usage of headless SCTP Services is possible because such services do not trigger the creation of listening SCTP sockets, thus those do not trigger the loading of the SCTP kernel module on every node. ##### A node level parameter to dedicate nodes for userspace SCTP applications -- cgit v1.2.3 From c234ca8c12cb66cbfe1d13a41e065d8329943fe3 Mon Sep 17 00:00:00 2001 From: janosi Date: Fri, 24 Aug 2018 10:24:50 +0200 Subject: Updated according to the comments on github. Main functional changes: HostPort with SCTP shall be supported; type=LoadBalancer with SCTP shall be supported --- keps/sig-network/0015-20180614-SCTP-support.md | 106 ++++++++++++++++--------- 1 file changed, 70 insertions(+), 36 deletions(-) diff --git a/keps/sig-network/0015-20180614-SCTP-support.md b/keps/sig-network/0015-20180614-SCTP-support.md index 7cbbc712..370ac92e 100644 --- a/keps/sig-network/0015-20180614-SCTP-support.md +++ b/keps/sig-network/0015-20180614-SCTP-support.md @@ -44,7 +44,7 @@ superseded-by: The goal of the SCTP support feature is to enable the usage of the SCTP protocol in Kubernetes [Service][], [NetworkPolicy][], and [ContainerPort][]as an additional protocol value option beside the current TCP and UDP options. SCTP is an IETF protocol specified in [RFC4960][], and it is used widely in telecommunications network stacks. -Once SCTP support is added as a new protocol option those applications that require SCTP as L4 protocol on their interfaces can be deployed on Kubernetes clusters on a more straightforward way. For example they can use the native kube-dns based service discovery, and their communication can be controlled on the native NetworkPolicy way. +Once SCTP support is added as a new protocol option those applications that require SCTP as L4 protocol on their interfaces can be deployed on Kubernetes clusters on a more straightforward way. For example they can use the native kube-dns based service discvery, and their communication can be controlled on the native NetworkPolicy way. [Service]: https://kubernetes.io/docs/concepts/services-networking/service/ [NetworkPolicy]: @@ -61,15 +61,14 @@ SCTP is a widely used protocol in telecommunications. It would ease the manageme Add SCTP support to Kubernetes ContainerPort, Service and NetworkPolicy, so applications running in pods can use the native kube-dns based service discovery for SCTP based services, and their communication can be controlled via the native NetworkPolicy way. -### Non-Goals +It is also a goal to enable ingress SCTP connections from clients outside the Kubernetes cluster, and to enable egress SCTP connections to servers outside the Kubernetes cluster. -It is not a goal here to add SCTP support to load balancers that are provided by cloud providers. +### Non-Goals -It is not a goal to support multi-homed SCTP associations. Such a support also depends on the ability to manage multiple IP addresses for a pod, and in the case of Services with ClusterIP or NodePort the support of multi-homed assocations would also require the support of NAT for multihomed associations in iptables/ipvs. +It is not a goal here to add SCTP support to load balancers that are provided by cloud providers. The Kubernetes side implementation will not restrict the usage of SCTP as the protocol for the Services with type=LoadBalancer, but we do not implement the support of SCTP into the cloud specific load balancer implementations. -It is not a goal to support SCTP as protocol value for the container's HostPort. The reason: [the usage of HostPort is not recommended by Kubernetes][], and to ensure proper interworking of HostPort with userspace SCTP stacks (see below) would require an additional kubelet/kubenet configuration option. In order to keep the complexity and impact of the introduction of SCTP on a lower level we do not plan to support SCTP as new protocol value for HostPort. +It is not a goal to support multi-homed SCTP associations. Such a support also depends on the ability to manage multiple IP addresses for a pod, and in the case of Services with ClusterIP or NodePort the support of multi-homed assocations would also require the support of NAT for multihomed associations in the SCTP related NF conntrack modules. -[the usage of HostPort is not recommended by Kubernetes]:https://kubernetes.io/docs/concepts/configuration/overview/#services ## Proposal ### User Stories [optional] @@ -93,7 +92,7 @@ spec: ``` #### Headless Service with SCTP -As a user of Kubernetes I want to define headless Services for my applications that use SCTP as L4 protocol on their interfaces, so client applications can discover my applications in kube-dns, or via any other service discovery method that gets information about endpoints via the Kubernetes API. +As a user of Kubernetes I want to define headless Services for my applications that use SCTP as L4 protocol on their interfaces, so client applications can discover my applications in kube-dns, or via any other service discovery methods that get information about endpoints via the Kubernetes API. Example: ``` @@ -128,7 +127,7 @@ spec: ``` #### SCTP as container port protocol in Pod definition -As a user of Kubernetes I want to define containerPorts for the SCTP based interfaces of my applications +As a user of Kubernetes I want to define hostPort for the SCTP based interfaces of my applications Example: ``` apiVersion: v1 @@ -143,10 +142,48 @@ spec: - name: diameter protocol: SCTP containerPort: 80 + hostPort: 80 +``` + +#### SCTP port accessible from outside the cluster + +As a user of Kubernetes I want to have the option that clien applications that reside outside of the cluster can access my SCTP based services that run in the cluster. + +Example: +``` +kind: Service +apiVersion: v1 +metadata: + name: my-service +spec: + type: NodePort + selector: + app: MyApp + ports: + - protocol: SCTP + port: 80 + targetPort: 9376 +``` + +Example: +``` +kind: Service +apiVersion: v1 +metadata: + name: my-service +spec: + selector: + app: MyApp + ports: + - protocol: SCTP + port: 80 + targetPort: 9376 + externalIPs: + - 80.11.12.10 ``` #### NetworkPolicy with SCTP -As a user of Kubernetes I want to define NetworkPolicies for my applications that use SCTP as L4 protocol on their interfaces, so the network controllers that support SCTP can control the accessibility of my applications on the SCTP based interfaces, too. +As a user of Kubernetes I want to define NetworkPolicies for my applications that use SCTP as L4 protocol on their interfaces, so the network plugins that support SCTP can control the accessibility of my applications on the SCTP based interfaces, too. Example: ``` @@ -161,7 +198,6 @@ spec: role: myservice policyTypes: - Ingress - - Egress ingress: - from: - ipBlock: @@ -179,17 +215,23 @@ spec: port: 7777 ``` #### Userspace SCTP stack -As a user of Kubernetes I want to deploy and run my applications that use a userspace SCTP stack, and at the same time I want to define SCTP Services in the same cluster. +As a user of Kubernetes I want to deploy and run my applications that use a userspace SCTP stack, and at the same time I want to define SCTP Services in the same cluster. I use a userspace SCTP stack because of the limitations of the kernel's SCTP support. For example: it's not possible to write an SCTP server that proxies/filters arbitrary SCTP streams using the sockets APIs and kernel SCTP. ### Implementation Details/Notes/Constraints [optional] #### SCTP in Services -The Kubernetes API modification for Services is obvious. +##### Kubernetes API modification +The Kubernetes API modification for Services to support SCTP is obvious. + +##### Services with host level ports -In case of Servies with ClusterIP or NodePort or externalIP the selected port shall be reserved on the respective nodes, just like for TCP and UDP currently. Unfortunately, golang does not have native SCTP support in the "net" package, so in order to reserving those ports via the kernel's SCTP API we have to introduce a new 3rd party package as a new vendor package. We plan to use the go sctp library from github.com/ishidawataru/sctp. +The kube-proxy and the kubelet starts listening on the defined TCP or UDP port in case of Servies with ClusterIP or NodePort or externalIP, and in case of containers with HostPort defined. The goal of this is to reserve the port in question so no other host level process can use that by accident. When it comes to SCTP the agreement is that we do not follow this pattern. That is, Kubernetes will not listen on host level ports with SCTP as protocol. The reason for this decision is, that the current TCP and UDP related implementation is not perfect either, it has known gaps in some use cases, and in those cases this listening is not started. But no one complained about those gaps so most probably this port reservation via listening logic is not needed at all. -For Services with type=LoadBalancer we reject the Service creation request for SCTP services at API validation time. +##### Services with type=LoadBalancer +For Services with type=LoadBalancer we expect that the cloud provider's load balancer API client in Kubernetes rejects the requests with unsupported protocol. + +#### SCTP support in Kube DNS Kube DNS shall support SRV records with "_sctp" as "proto" value. According to our investigations, the DNS controller is very flexible from this perspective, and it can create SRV records with any protocol name. I.e. there is no need for additional implementation to achieve this goal. Example: @@ -200,47 +242,39 @@ _diameter._sctp.my-service.default.svc.cluster.local. 30 IN SRV 10 100 1234 my-s #### SCTP in the Pod's ContainerPort The Kubernetes API modification for the Pod is obvious. -We reject the pod creation request for pods that have containers with the combination of a hostPort and SCTP as protocol at API validation time. +We support SCTP as protocol for any combinations of containerPort and hostPort. #### SCTP in NetworkPolicy The Kubernetes API modification for the NetworkPolicy is obvious. -In order to utilize the new protocol value the network controller must support it. +In order to utilize the new protocol value the network plugin must support it. #### Interworking with applications that use a user space SCTP stack -A userspace SCTP stack implementation cannot work together with the SCTP kernel module (lksctp) on the same node. That is, the loading of the SCTP kernel module must be avoided on nodes where such applications that use userspace SCTP stack are planned to be run. The problem comes with the introduction of the SCTP protocol option for Services with Virtual IP (the "type" of the Service is ClusterIP or NodePort): once such a service is created the relevant port reservation logic kicks-in on every node in the cluster, it starts listening on the port, and as a consequence it loads the SCTP kernel module on every nodes. It immediately ruins the connectivity of the userspace SCTP applications on those nodes. - -The same interworking problem stands for Services with "externalIP" defined. -NOTE! It is not a new interworking problem between the userspace SCTP stack implementations and the SCTP kernel module. It is a known phenomenon. The userpace SCTP stack creates raw sockets with IPPROTO_SCTP. As it is clearly highlighted in the [documentation of raw sockets][]: +##### Problem definition +A userpace SCTP stack usually creates raw sockets with IPPROTO_SCTP. And as it is clearly highlighted in the [documentation of raw sockets][]: >Raw sockets may tap all IP protocols in Linux, even protocols like ICMP or TCP which have a protocol module in the kernel. In this case, the packets are passed to both the kernel module and the raw socket(s). -I.e. it is the normal function of the [kernel][], that it sends the incoming packet to both sides: the raw socket and relevant the kernel module. In this case the kernel module will handle those packets that are destined to the raw socket as Out of the blue (OOTB) packets according to the rules defined in the [RFC4960][]. - -In order to resolve this problem the solution has been to dedicate nodes to userspace SCTP applications, and to ensure that on those nodes the SCTP kernel module is not loaded. +I.e. if both the kernel module (lksctp) and a userspace SCTP stack are active on the same node both receive the incoming SCTP packets according to the current [kernel][] logic. -For this reason the main task here is to provide the same isolation possibility: i.e. to provide the option to dedicate some nodes to userspace SCTP applications and ensure that the actions performed by Kubernetes do not load the SCTP kernel modules on those dedicated nodes. +But in turn the SCTP kernel module will handle those packets that are actually destined to the raw socket as Out of the blue (OOTB) packets according to the rules defined in [RFC4960][]. I.e. the SCTP kernel module sends SCTP ABORT to the sender, and on that way it aborts the connections of the userspace SCTP stack. -As we can easily see, it is pretty easy to separate application pods that use a userspace SCTP stack from those application pods that use the kernel space SCTP stack: the usual nodeselector label based mechanism, or taints are there for this very purpose. +As we can see, a userspace SCTP stack cannot co-exist with the SCTP kernel module (lksctp) on the same node. That is, the loading of the SCTP kernel module must be avoided on nodes where such applications that use userspace SCTP stack are planned to be run. The SCTP kernel module loading is triggered when an application starts managing SCTP sockets via the standard socket API or via syscalls. -The real challenge here is to ensure that when an SCTP Service is created in a Kubernetes cluster the Kubernetes logic does not create listening SCTP sockets on those nodes that are dedicated for the applications that use userspace SCTP stack - because such an action would trigger the loading of the kernel module. +In order to resolve this problem the solution was to dedicate nodes to userspace SCTP applications in the past. Such applications that would trigger the loading of the SCTP kernel module were not deployed on those nodes. -There is no such challenge with regard to headless SCTP Services. +##### The solution in the Kubernetes SCTP support implementation +Our main task here is to provide the same node level isolation possibility that was used in the past: i.e. to provide the option to dedicate some nodes to userspace SCTP applications, and ensure that the actions performed by Kubernetes (kubelet, kube-proxy) do not load the SCTP kernel modules on those dedicated nodes. -This is how our way of thinking goes: +On the Kubernetes side we solve this problem so, that we do not start listening on the SCTP ports defined for Servies with ClusterIP or NodePort or externalIP, neither in the case when containers with SCTP HostPort are defined. On this way we avoid the loading of the kernel module due to Kubernetes actions. -The first task is to provide a way to dedicate nodes to userspae SCTP application so, that Kubernetes itself is aware of that role of those nodes. It may be achieved with a node level parameter. Based on that parameter the kube-proxy would be aware of the role of the node and it would not create listening SCTP sockets for SCTP Services on the node. +On application side it is pretty easy to separate application pods that use a userspace SCTP stack from those application pods that use the kernel space SCTP stack: the usual nodeselector label based mechanism, or taints are there for this very purpose. NOTE! The handling of TCP and UDP Services does not change on those dedicated nodes. -We propose the following alternatives for consideration in the community: - -##### Documentation only -In this alternative we would describe in the Kubernetes documentation the mutually exclusive nature of userspace and kernel space SCTP stacks, and we would highlight, that the new SCTP Service feature must not be used in those clusters where userspace SCTP stack based applications are deployed, and in turn, userspace SCTP stack based applications cannot be deployed in such clusters where kernel space SCTP stack based applications have already been deployed. We would also highlight, that the usage of headless SCTP Services is possible because such services do not trigger the creation of listening SCTP sockets, thus those do not trigger the loading of the SCTP kernel module on every node. - -##### A node level parameter to dedicate nodes for userspace SCTP applications +We propose the following solution: -In this alternative we would implement all the tasks that we listed above, i.e. a node level parameter based on which the kube-proxy logic can skip the creation of listening SCTP sockets on the affected nodes. +We describe in the Kubernetes documentation the mutually exclusive nature of userspace and kernel space SCTP stacks, and we would highlight, that the required separation of the userspace SCTP stack applications and the kernel module users shall be achieved with the usual nodeselector or taint based mechanisms. [documentation of raw sockets]: http://man7.org/linux/man-pages/man7/raw.7.html -- cgit v1.2.3