diff options
| author | janosi <laszlo.janosi1@gmail.com> | 2018-06-30 22:40:40 +0200 |
|---|---|---|
| committer | janosi <laszlo.janosi1@gmail.com> | 2018-08-24 10:37:16 +0200 |
| commit | fdf2fc20a322bef445720ca85ef067348351bcb1 (patch) | |
| tree | 76b3cd36d4519d9839015d213eb070bff9b5d360 | |
| parent | f368332fe011875d0c6110d0a31549f70f368d7e (diff) | |
Added alternatives to handle the userspace SCTP incompatibility
| -rw-r--r-- | keps/sig-network/0015-20180614-SCTP-support.md | 40 |
1 files changed, 33 insertions, 7 deletions
diff --git a/keps/sig-network/0015-20180614-SCTP-support.md b/keps/sig-network/0015-20180614-SCTP-support.md index b03c917b..6c4c5367 100644 --- a/keps/sig-network/0015-20180614-SCTP-support.md +++ b/keps/sig-network/0015-20180614-SCTP-support.md @@ -148,28 +148,34 @@ spec: - protocol: SCTP port: 7777 ``` -#### User space SCTP stack +#### Userspace SCTP stack As a user of Kubernetes I want to deploy and run my applications that use a user space SCTP stack. ### Implementation Details/Notes/Constraints [optional] #### SCTP in Services The Kubernetes API modification for Services is obvious. -The selected port shall be reserved on the node, just like for TCP and UDP now. Unfortunately, golang does not have native SCTP support in the "net" package, so in order to access the kernel's SCTP API we have to introduce a new 3rd party package as a new vendor package. +The selected port shall be reserved on the node, just like for TCP and UDP now. Unfortunately, golang does not have native SCTP support in the "net" package, so in order to access the kernel's SCTP API we have to introduce a new 3rd party package as a new vendor package. We plan to use the go sctp library from github.com/ishidawataru/sctp. For Services with type=LoadBalancer we have to check how the cloud provider implementations handle new protocols, and we have to make sure that if SCTP is not supported then the request for a new load balancer, firewall rule, etc. with protocol=SCTP is rejected gracefully. -DNS shall support SRV records with "_sctp" as "proto" value. +Kube DNS shall support SRV records with "_sctp" as "proto" value. According to our investigations, the DNS controller is very flexible from this perspective, and it can create SRV records with any protocol name. #### SCTP in NetworkPolicy The Kubernetes API modification for the NetworkPolicy is obvious. In order to utilize the new protocol value the network controller must support it. #### Interworking with applications that use a user space SCTP stack -A userspace SCTP stack implementation cannot work together with the SCTP kernel module (lksctp) on the same node. That is, the loading of the SCTP kernel module must be avoided on nodes where such applications that use userspace SCTP stack are planned to be run. The problem comes with the introduction of the SCTP protocol option for Services with Virtual IP: once such a service is created the relevant iptables/ipvs management logic kicks-in on every node, and as a consequence it loads the SCTP kernel module. -NOTE! It is not a new interworking problem between the userspace SCTP stack implementations and the SCTP kernel module. It is a known phenomenon. The solution has been to dedicate nodes to userspace SCTP applications, and ensure that on those nodes the SCTP kernel module is not loaded. +A userspace SCTP stack implementation cannot work together with the SCTP kernel module (lksctp) on the same node. That is, the loading of the SCTP kernel module must be avoided on nodes where such applications that use userspace SCTP stack are planned to be run. The problem comes with the introduction of the SCTP protocol option for Services with Virtual IP: once such a service is created the relevant iptables/ipvs management logic kicks-in on every node, and as a consequence it loads the SCTP kernel module on every nodes. It immediately ruins the connectivity of the userspace SCTP applications on those nodes. + +NOTE! It is not a new interworking problem between the userspace SCTP stack implementations and the SCTP kernel module. It is a known phenomenon. The userpace SCTP stack creates raw sockets with IPPROTO_SCTP. As it is clearly highlighted in the [documentation of raw sockets][]: +>Raw sockets may tap all IP protocols in Linux, even protocols like ICMP or TCP which have a protocol module in the kernel. In this case, the packets are passed to both the kernel module and the raw socket(s). + +I.e. it is the normal function of the [kernel][], that it sends the incoming packet to both sides: the raw socket and the kernel module. In this case the kernel module will handle those packets that are destined to the raw socket as Out of the blue (OOTB) packets according to the rules defined in the [RFC4960][]. + +The solution has been to dedicate nodes to userspace SCTP applications, and to ensure that on those nodes the SCTP kernel module is not loaded. For this reason the main task here is to provide the same isolation possibility: i.e. to provide the option to dedicate some nodes to userspace SCTP applications and ensure that k8s does not load the SCTP kernel modules on those dedicated nodes. -As we can easily see, it is pretty easy to separate application pods that use a userspace SCTP stack from those application pods that use the kernel space SCTP stack: the usual nodeselector label based mechanism, or taints are for this very purpose. +As we can easily see, it is pretty easy to separate application pods that use a userspace SCTP stack from those application pods that use the kernel space SCTP stack: the usual nodeselector label based mechanism, or taints are there for this very purpose. The real challenge here is to ensure that when an SCTP Service is created in a k8s cluster the k8s logic does not create iptables or ipvs rules on those nodes that are dedicated for the applications that use userspace SCTP stack - because such an action would trigger the loading of the kernel module, but at the same time those applications that use userspace SCTP stack can still access the just created SCTP based Service vie the ClusterIP of that service - assuming that the new Service has ClusterIP allocated. There is no such challenge with regard to headless SCTP Services. This is how our way of thinking goes: @@ -178,7 +184,27 @@ If a node is dedicated for userspace SCTP applications then whatever proxy solut The userspace proxy would follow then the current high level logic of the kube-proxy: it would listen on an IP address of the local node, and it would establish connections to the application pods that provide the service. The next task is to ensure that the packets that are sent by applications to the ClusterIP end up in the userspace proxy. It requires the careful setup of iptables or ipvs rules on the node, so those do not trigger the loading of the SCTP kernel module. It means, that those rules cannot use filter on the actual protocol value (SCTP), i.e. we end up with rules that simply forward the ClusterIP to the local host IP on which the userspace proxy listens. The consequence is, that the Service definition can contain only SCTP Ports, TCP or UDP Ports should not be used in that Service definition. -NOTE! The handling of TCP and UDP Services does not change on those dedicated nodes, i.e. the current iptables/ipvs/etc. mechanisms can be used for those +NOTE! The handling of TCP and UDP Services does not change on those dedicated nodes, i.e. the current iptables/ipvs/etc. mechanisms can be used for those. + +We propose the following alternatives here for consideration in the community: + +##### Documentation only +In this alternative we describe in the Kubernetes documentation the mutually exclusive nature of userspace and kernel space SCTP stacks, and we would highlight, that the new SCTP Service feature must not be used in those clusters where userspace SCTP stack based applications are deployed, and in turn, userspace SCTP stack based applications cannot be deployed in such clusters where kernel space SCTP stack based applications have already been deployed. + +##### There would not be a ClusterIP -> service backends proxy on the dedicated nodes +In this alternative we would implement the option to dedicate nodes for userspace SCTP applications, but we do not implement the userspace proxy. That is: +* there would be a kube-prpxy parameter that indicates to the kube-proxy that it must not create iptables or ipvs rules for SCTP Services on its local node +* there would not be a userspace proxy to direct traffic sent to the SCTP Service's ClusterIP to the actual service backends + +As userspace SCTP applications could not use the benefits of Kubernetes Services before this enhanceent, those anyway had to implement their own service discovery and SCTP traffic handling mechanisms. Following this assumption we can say, that if they continue using their current logic, they do not and will not obtain the ClusterIP from the KubeDNS, but instead they use an alternative way to find their peers, and they use some other ways for connecting to their peers - like e.g. connecting to the IP of their peers directly without any ClusterIP-like solution. That is, they will not miss the possibility to use the ClusterIP of their peers, and consequently they do not need a proxy solution on their local nodes. + +##### Dedicated nodes and userspace proxy +In this alternative we would implement all the tasks that we listed above: +* node dedication +* userspace SCTP proxy on the dedicated nodes + +[documentation of raw sockets]: http://man7.org/linux/man-pages/man7/raw.7.html +[kernel]: https://github.com/torvalds/linux/blob/0fbc4aeabc91f2e39e0dffebe8f81a0eb3648d97/net/ipv4/ip_input.c#L191 ### Risks and Mitigations |
