diff options
| -rw-r--r-- | sig-node/archive/meeting-notes-2015.md | 203 | ||||
| -rw-r--r-- | sig-node/archive/meeting-notes-2016.md | 1259 | ||||
| -rw-r--r-- | sig-node/archive/meeting-notes-2017.md | 1105 | ||||
| -rw-r--r-- | sig-node/archive/meeting-notes-2018.md | 702 |
4 files changed, 3269 insertions, 0 deletions
diff --git a/sig-node/archive/meeting-notes-2015.md b/sig-node/archive/meeting-notes-2015.md new file mode 100644 index 00000000..d8e69d08 --- /dev/null +++ b/sig-node/archive/meeting-notes-2015.md @@ -0,0 +1,203 @@ +# sig-node weekly meeting + +Dec. 9 + + + +* Meeting note with huawei@: + + https://docs.google.com/document/d/1H2FybZUh0qS2jlOGeVE85LOS_7VIxGKiz2piDKlXBro/edit?usp=sharing + +* ImageSpec + * https://github.com/kubernetes/kubernetes/pull/18308#issuecomment-162729102 + * not required for rkt 1.0, but need a path forward + * Dawn will file separate issue for follow up discussion on kubelet moving to full management of images. Unblock the current development progress. +* Logging management discussion (Vishnu) + * https://github.com/kubernetes/kubernetes/issues/17183 + * use cases we need to support: + * on-demand log rotation as pods start to come up against their disk usage limits + * shipping to something like fluentd + * either delegated to runtime or controlled by kubelet + * On disk space management, image accounting should be separated from runtime logging. +* Image accounting: + * Still to define what exactly this means in rkt: https://github.com/coreos/rkt/issues/1814 + * @vishh to provide feedback + * what about arbitrary image types like tarballs? + * jon: container runtime should provide image management +* cAdvisor integration with rkt updates (Dawn) + * blocked by broken scripts due to the integration of CoreOS and refactory + * Will pick up yifan's pr and retry. + * Yifan will bring up a cluster with rkt +* rkt roadmap updates (Jonathan) https://github.com/coreos/rkt/blob/master/ROADMAP.md + * kvm state? + * should have feature parity except for networking and testing + * rkt stage1 overhead WIP https://github.com/dgonyeo/rkt-monitor + +Dec. 2 + + + +* Logging management discussion (Vishnu) + + No time yet. Move to next week. + + + +* rkt roadmap updates (Jonathan) https://github.com/coreos/rkt/blob/master/ROADMAP.md +* benchmark stage1 resource usage https://github.com/coreos/rkt/issues/1788 +* Sumsung's use cases and requirements for node and container runtime (Bob) + + kubelet logging isssue https://github.com/coreos/bugs/issues/990 + +https://github.com/kubernetes/kubernetes/issues/14216 + +rkt prs: + + https://github.com/kubernetes/kubernetes/pull/17968 + + https://github.com/kubernetes/kubernetes/pull/17969 + + rkt fly + +https://github.com/coreos/rkt/pull/1825 + +Disk Accounting Proposal - https://github.com/kubernetes/kubernetes/pull/16889 + +Nov. 18 + + + +* kubelet + systemd? + +- design of cgroup hierarchy ideal (Vishnu) + +- Nalin to send out volumec branch and design doc + +- Discussion on logging (Vishnu to paste link to existing issue) + + + https://github.com/kubernetes/kubernetes/issues/17183 + + - rkt team to measure overhead of running the journal in container + +- Discussion on PID1 being systemd in the infrastructure container or not and different strategies; sounds like measuring needs to happen here https://github.com/coreos/rkt/issues/1788 + +Just to be super clear it seems like we have three divergent paths for running a docker container that are likely to exist in the kubelet: + +1) Docker engine mode + +2) rkt mode + +3) runc+systemd mode + + + +* Yifan side: + * kube-up for coreos/gce/docker, PRs out there (https://github.com/kubernetes/kubernetes/pull/17240 + + https://github.com/kubernetes/kubernetes/pull/17241 + + + https://github.com/kubernetes/kubernetes/pull/17243 ) + +* working on refactoring the rkt/kubelet with rkt api service. + +Nov. 11 + + + +* Cancelled as people are in the kube-con + +Nov. 4 + + + +* From google: 1.2 releases focuses on testing +* Node conformance test, e.g. including testing the kernel version, kernel config, docker runtime, rkt runtime, systemd version, etc. Such testing makes the node is validate to join the cluster. +* coreos side: yifan working on converting the gce master to use coreos image. also need to work together with node team on the node conformance tests for rkt. + * question on setting up the master node : what's the best practice to maintain those pod templates (e.g. kube-apiserver, logging, dns addons) https://github.com/kubernetes/kubernetes/pull/16760#discussion_r43826882 + * dawn chen said we can run a script that's in the saltstack dir on the node, which evaluate the pod templates. +* vish asked about the status for the rkt after our last discussion about the pod lifecycle + * not much progress, mentioned jonboulle's rkt fly pr https://github.com/coreos/rkt/pull/1416 (what's the status for that? @jonboulle) +* dawn chen asked about what's rkt's currently developing direction? https://github.com/coreos/rkt/blob/master/ROADMAP.md#rkt-roadmap +* Also she would like to know our OKRs in the next quarter. (rkt, etcd, tectonic?) Can we share that with her? @jonboulle +* mics: vish talked briefly about the latest updates in OCI, splitting starting of a container to create and start. yifan mentioned we have recently put efforts on acbuild (an appc image build tool) https://github.com/appc/acbuild + +Sept. 30 + +Date: Sept 30, 2015 + +Attendees: YiFang, Jonnathan from CoreOS, + + stclair@, yjhong@, vishnuk@, dawnchen@ from Google + +Agenda + + + +* oci status update: + + (jonboulle): OCI being blocked, waiting on a lot of issues, very slow progress. + + + +* appc + OCI harmonization effort: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/uo11avcWlQQ +* defining an image format in OCI: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/OqnUp4jOacs +* https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/1T0z1IJWxw8 +* appc status: + * work stalled on OCI announcement, starting to pick up again + * TODO(jonboulle): categorise appc issues and follow up with summary + * example: pod lifecycle https://github.com/appc/spec/pull/500 + * background discussion: https://github.com/appc/spec/issues/276 +* kubernetes restart policy + * applied per-pod + * _jonboulle: a container "restart" is underspecified (e.g., is filesystem persisted?). Currently, it is defined by implementation in Docker. In rkt, we currently just restart the entire pod, but are considering implementing more granular behaviour_ + * https://github.com/kubernetes/kubernetes/blob/ba89c98fc7e892e816751a95ae0ee22f4266ffa5/docs/user-guide/pod-states.md#restartpolicy +* rkt introduction + * background - https://coreos.com/blog/rocket/ +* Dawn: concerned about having systemd manage cgroup; unified hierarchy does not work for Google + * One issue for memory management policy: https://github.com/kubernetes/kubernetes/issues/14532 + * We might have different policy enforced by cgroup hierarchy for cpu, blockio, etc. + * _Should isolator adjustment be part of container runtime API?_ + * related appc issue: https://github.com/appc/spec/issues/54 + * rkt implementation-wise, this can be achieved today at both pod-level and app-level using systemd APIs ([SetUnitProperties](http://www.freedesktop.org/wiki/Software/systemd/dbus/) + [resource settings](http://www.freedesktop.org/software/systemd/man/systemd.cgroup.html#Options)) +* Node architecture discussion + * today there are two runtime integration points: + * kubelet, for creating/running/etc pods + * cAdvisor, for exposing stats + * Vish talked about proposal for cAdvisor to take over all responsibilities, so that container runtimes only need to integrate in one place + * simpler maintainability (one codebase) + * OR, could integrate cAdvisor responsibilities into kubelet + * e.g. moving [Container Runtime interface](https://github.com/kubernetes/kubernetes/blob/bd8ee1c4c49c724e80a8f8d59e732ea7855eba8e/pkg/kubelet/container/runtime.go#L53) into cAdvisor + * Container Runtime Interface is changing from declarative to imperative + * client/server? https://github.com/kubernetes/kubernetes/issues/13768 +* when updating pod spec: + * only thing that can be done today is changing the image version + * updating resources: https://github.com/kubernetes/kubernetes/issues/5774 + * dawn: add rant on https://github.com/appc/spec/issues/276 + * … + * perhaps kubernetes pod really conflates two levels of abstraction: _scheduling _and _resource enforcement/management_ +* rkt e2e + * yifan@ is working on this whenever he has time. + +What is a Pod: + + + +* Shared namespaces excepting mount +* Restart policies will be at pod level +* Per-container restart policies are required for certain volumes git pull? +* Life cycle hooks at the container level are needed. Pre-start hooks at the container level. +* Privileged pods require access to host + +Why pod updates? + + + +* Pod updates are needed for updating image names - misspelled image names and in-place image updates. +* In-place updates are very useful when containers cannot tolerate restarts. For example, applications that load a lot of data from volumes before functioning. +* Auto-scaling requires updates to pods. +* Adding/removing containers - in-place updates using a hot-swap mechanism - start a new container and remove the old one. Adding/Updating side-cars (logging, monitoring, etc) +* Updating volumes to pods are also a very useful feature for users. +* logging: yifan to check fluentd, how can it integrate with rkt/journal + * Getting logs after pods exit: https://github.com/coreos/rkt/issues/1528 diff --git a/sig-node/archive/meeting-notes-2016.md b/sig-node/archive/meeting-notes-2016.md new file mode 100644 index 00000000..43b1201d --- /dev/null +++ b/sig-node/archive/meeting-notes-2016.md @@ -0,0 +1,1259 @@ +# sig-node weekly meeting + +# Dec 13 {#dec-13} + + + +* virtlet demo + * Piotr from https://github.com/Mirantis/virtlet + * a Kubernetes runtime server which allows you to run VM workloads + * reschedule to next time +* cri-o demo + * Antonio from redhat + * pod and container lifecycle management works, and image service implementation is done + * start integrating with Kubelet +* CRI rollout and planning + * 1.5 alpha API is ready + * in 1.6 using CRI and docker implementation in production + * validate the API in productiont + * Rollout plan/proposal: https://docs.google.com/document/d/1cvMhah42TvmjANu2rRCFD3JUT0eu83oXapc_6BqE5X8/edit?usp=sharing + * Introducing flag to enable / disable the feature + * Backward compatibility: container name, etc. Requires draining the node. + * Rollback planning. +* Resource management workgroup + * Started from Jan 3rd, and expected is dismissed once the roadmap and planning is done. + * preemption & eviction priority and schema +* Will CRI shim for docker support exclusively CNI for networking? + * The shim currently supports cni and kubenet + * There is a [PR](https://github.com/kubernetes/kubernetes/pull/38430) up for allow the native docker networking + * https://github.com/kubernetes/kubernetes/issues/38639 discusses the support for the exec network plugin + + +# Dec 6 {#dec-6} + + + +* Will CRI support exclusively CNI for networking? + * CRI right now with dockershim's impl only supports CNI + * Next step is file a github issue or slack ping probably since we didn't get a clean answer here +* Additional CNI question: no distinction between networks for workloads + * same with/without CRI: networking is set up once per-node + * better question for sig-networking + * If required, CRI can evolve as well. +* garden team: demo + * Julz + * Garden, Cloud Foundry's Container Runtime + * Given CRI / cri-o / rktlet, seems similar to garden + * Plans to integrate into CRI/K8s effort? + * Hope to share ideas, code, whatever due to similarity; opportunity to probably share ideas and code + * Many links available related to this: + * https://github.com/cloudfoundry/garden + * https://github.com/cloudfoundry/guardian + * https://github.com/cloudfoundry/grootfs +* docker 1.12 liverestore (https://docs.docker.com/engine/admin/live-restore/) should it be enabled by default? + * Related to disruptive updates; if we wish to be less disruptive + * CoreOS does not enable it currently, but @euank doesn't know if there's a specific reason +* Shaya/AppOrbit, Infrantes public cloud CRI demo + * Enables pods to be isolated within independent VMs + * differs from hyper in that the VMs can be separate public cloud instances where nested virt isn't supported + * Enables orchestration of full VM images that aren't running containers but act as Pods to the k8s cluster + * + + +# Nov 29 Agenda {#nov-29-agenda} + + + +* FYI: [Shared pid namespace](https://github.com/kubernetes/kubernetes/issues/1615) + * No discussion needed + * [PR for rollout sent](https://github.com/kubernetes/kubernetes/pull/37404) for review + * brendanburns suggests we consider supporting isolated namespaces indefinitely +* rkt attach demo + * Implemeting the design proposed in [#34376](https://github.com/kubernetes/kubernetes/blame/master/docs/proposals/kubelet-cri-logging.md#L220-L225). + * Addresses a problem of the pre-CRI rkt implementation; support `kubectl attach` and `kubectl run -it` +* Dawn: Some ideas for sig-node next year: + * Node level debugability + * Node management / availability (daemon update, repair, etc) + * Node allocatable rollout (e.g. daemon overhead iiuc?) + * CRI, validation test, beta/stable + * Checkpointing (finally) + * Tackle logging story + * Auth/security between daemons / auth between applications + * Resource management (related to many of the above too, yeah) + * Mostly for reliability, not efficiency + * pod overhead, etc + * Better guarantees for performance / etc node + * Disk management + * Final: kubelet as a standalone thing/"product" + * Checkpointing, node level api and versioning +* virtlet + * https://github.com/Mirantis/virtlet + * As of now it can use kvm/qemu to run images + * CRI implementation + * Demo forthcoming + + +# Nov 22 {#nov-22} + + + +* Announcements & sync-up + * Derek: Putting together a workgroup that reports back to sig-node for resource management. Specifically to allow management of resources outside the kubelet for exploratory work, identify ones that should be managed by kubelet. + * Look for an announcement sometime next week +* Status of CRI on 1.5 + * v1alpha "done" for docker, community can and should try it and give feedback +* Shared pid namespace:([verb@google.com](mailto:verb@google.com)) + * First step is to make infra container reap zombies + * https://github.com/kubernetes/kubernetes/pull/36853 + * But will infra container even be around for all run times in the future? + * Yes please! + * First step is the pause container as an init process + * Other runtimes already handle that (e.g. rkt and cri-o) + * On by default? + * Some containers assume PID 1 (e.g. for exec kill -SHUP 1, or in their /entrypoint.sh messy pseudo-init). + * Some containers also bundle init systems + * Dawn: there was discussion about having our own init for a pod + * rkt, pause container, cri-o all have init processes. infra is a bit of a hack and docker specifc, but we should be able to get rid of those. + * For now, his change makes sense, just go with it, and we can consider long-term unification in parallel/later +* Backwards compatibility concerns: + * disruptive CRI + * disruptive cgroup rollout + * Have we done disruptive rollout before? + * In GKE do that. + * Openshift: drain nodes before said updates do that. + * In the past, maybe docker labeling broke this? No specific memory. + * Currently planning to make both of those disruptive + * Euan: Action item, double check this is sane from CoreOS side +* Hi from Cloud Foundry garden team! (What can we share, how can we help?) + * Next call, maybe demo and talk about garden a little +* rkt roadmap: + * 1.5 continuing to work to get attach and e2e happy (might bleed into 1.6) + * 1.6 alpha release and recommend general "tire-kicking" +* CRI status, api interface. Rollout in 1.5, alpha api, what does that mean? + * Cluster api, we don't suggest prod usage because it might change + * This is internal, so it's different, right? Compatibility is an internal detail, not external/user. +* Will CRI support exclusively CNI for networking? + * Furthermore, is the network config missing from the CRI? + * Maybe? It's alpha + * Come back next week +* 1.6 roadmap planning + * Community meeting talked about reliability. + * Resource management + * disk management + * Lots of "in-flight" features which are not marked stable yet, have not been seen through. + * Use 1.6 release to "finish work" and rollout + * CRI + * Pod Cgroup + * Node allocatable + * ….. + * Part of "nodespec work" + * Focus on reliability and rollout of features. Finish node level testing. + * Let us/Dawn know about other potential items for the roadmap. + * Expected date? presumably before 1.6 :) TBD + + +# Nov 01 {#nov-01} + + + +* Image Exec (verb@google.com) + * Better support for containers built from scratch + * <code>kubectl exec -it -m <em>image_name</em> <em>pod_name</em></code> + * Proposal: https://github.com/kubernetes/kubernetes/pull/35584 + * Usecase primarily dev cluster only or dev+prod? + * Both + * Mount namespace vs filesystem view (e.g. proc and dev and so on might differ) + * No solution offered for this + * Pod lifecycle for this pod + * run a pod + exec into a pod + * Dangling pod problem with `kubectl run --rm`? + * No answer known yet + * Display to user: + * Is it hidden from regular listing? (e.g. get pod) + * Right now there's an idea of separate init container statuses. Will there be a separate debug container construct? + * There's been discussion before of "tainting" pods that have been exec'd into, debugged. + * Resources? This can add additional resource needs. Do we reserve things? Push it onto the user? + * Derekwayncarr: Size pods in advance to be able to be debugged + * Cleanup / reliability: + * Fairly intrusive.. + * Security? + * Image whitelist / blacklist? + * That's being added, but now needs to be added in one more place + * Admission controllers will need to tie in too + * one implication: Userns + host bindmounts, depending on how userns is implemented, could be messy (hostmount implies no userns, but userns relabeling might be enabled for the whole pod) + * No answer to this. + * Does SELinux interact with this? Do we need to relabel container rootfss in a scary way? + * No answer for this. + * Concern about how this interacts with introspection tools RH has that e.g. scan podstatus for image IDs + * Alternative: Allow modifying the podspec to run a new container? + * Dawn: Idealy, but that has a problem of needing more upstream changes, can't just be kubelet +* Kubelet pod API, initial version + * New kubelet API for bootstrapping + * https://docs.google.com/document/d/1tFTq37rSRNSacZeXVTIb4KGLGL8-tMw30fQ2Kkdm7JQ/edit?usp=sharing + * Minimal guarantees in this implementation ("best-effort") + * Differences from static pods: + * These are persisted into the api-server whereas static+mirror are "fake"/read-only + * These give you a "real" r/w copy in api-server + * Other option is bootkube. Temporary control plane with a "real" api-server, client transitions between em + * Complexity, painful to maintain the code + * This implementation adds a new pod source of an API + * Due to security concerns, it would be a default-off api, potentially listening on a unix socket + * Will create pods before the api-server is connected to. Will move them to the api-server when able to + * Api-server pod persistence will result in a pod restart to fixup UIDs effectively + * Derekwayncarr: We want to get rid of the existing ways; what other alternatives are there? + * The bootstrap/run-once kubelet was shot down. And this has some other nice properties as well (e.g. disaster recovery) + * Not a full api-server in the kubelet though. essentially only nodename stored pods + * Derek: Will this be versione[/1B856NU1Ie0Pid4xGV2D9QZUJhUD24QzYsJYqH0yJU8A/edit#gid=0](https://docs.google.com/spreadsheets/d/1B856NU1Ie0Pid4xGV2D9QZUJhUD24QzYsJYqH0yJU8A/edit#gid=0) + * No sig-node weekly meeting on the 8th dd, will it be pretty and hygenic? Do we distribute clients? + * We don't have answers for those + * Maybe just experimental for now and not tackle those? + * Derek: Concerned with experimental features that aren't as well thought out + * +* Demo: rkt with runc (@casey) + * Does runc list +* Kubecon f2f details + * Someone should add any details we do know to https://docs.google.com/spreadsheets/due to kubecon + + +# Oct 25 Agenda {#oct-25-agenda} + +NOTE: This meeting is being held in ZOOM due to troubles with hangouts: http://www.zoom.us/my/sigwindows + + + +* Conflict with DiskPressure and Volume cleanups + * https://github.com/kubernetes/kubernetes/issues/35406#issuecomment-256101016 + * Suggestion: We should be always cleaning up memory-backed volumes, but at minimum we need to start doing that on eviction + * Prioritization? Red Hat is willing to help fix this for 1.5 since it's causing real pain for them with secrets + eviction + * Possible problem: Kubelet falls over, pod is removed, kubelet comes back up, volume manager can't find from the state whether an emptydir is tmpfs trivially. + * Doesn't matter, volume manager should still clean it up, unmount, what have you? Dig into it a bit more and comment on that issue +* Image Exec (verb@google.com) + * Better support for containers built from scratch + * <code>kubectl exec -it -m <em>image_name</em> <em>pod_name</em></code> + * Draft proposal: http://bit.ly/k8s-imageexec + * verb@ to open a PR to main k8s repo + * DONE: https://github.com/kubernetes/kubernetes/pull/35584 + * tl;dr; Run another container image in the same namespaces as the pod; Use the other image to debug the pod. + * Expected to be reviewed this week + * Probably discussed more next week in sig-node as well + * Some questions about how to properly deal with viewing its mount namespace + * Post them on the proposal! +* Quick rktlet demo (init containers & adding bindmounts) + * Demo will happen next week :) +* rktlet status + * Vendoring in in-progress (couple dependency issues); to be followed by e2e testing + * Attach & logging +* Kubelet in a chroot https://github.com/kubernetes/kubernetes/pull/35328 + * CoreOS has been shipping kubelet in a chroot for the last half year + * Volumes might require path remapping + * Does not concern most k'let developer +* Kubelet mounts volumes using a container + * Using rkt fly to run a mounter image to setup k'let volumes. + * Stop gap solution until k'let in a chroot lands + + +# Oct 18 {#oct-18} + + + +* CRI exec/attach/streaming + * https://docs.google.com/document/d/1OE_QoInPlVCK9rMAx9aybRmgFiVjHpJCHI9LrfdNM_s/edit?ts=5800133e# (?) + * Original issue: https://github.com/kubernetes/kubernetes/issues/29579 + * TTY vs. streams, to be tackled offline +* Per-pod overhead + * exec, logs, supervision + * where to account? kubelet level? pod level? + * + + +# Oct 11 {#oct-11} + + + +* Reducing privileged components on a node + * Can other components (kube-proxy) delegate privileged host operations to kubelet? (e.g. firewalld operations) + * Dawn: Opinion, that makes kubelet more monolithic. It's the main agent, but it should be able to delegate. Preference for moving things to separate plugins where reasonable. + * Euan: Counterpoint, multiple things talking to api-server has some extra auth problem + * Is kube-proxy actually core? Out of scope of this sig :) + * Minhan: Note that kube-proxy is an implementation detail; over time it will potentially differ. + * This discussion is also about more than just kube-proxy +* Pod cgroups (demo update - [decarr@redhat.com](mailto:decarr@redhat.com)) + * q on default: Can't it default to 'detect' and try docker info, and if it doesn't have info fallback to cgroupfs? + * It does do the right thing for docker integration, but document says the wrong default :) + * Note: Only works for systemd for 229+ because of opencontainers slice management stuff + * Upgrade of existing node to this feature? + * Evacuate the node first. We don't support 'in-place', it's not in-scope + * We don't currently tune memory limits over time either + * Some docker-things are not charged per-cgroup (like docker's containerd-shim for example) + * Also not in-scope; upstream changes + * Euan: Will also look into making sure the systemd driver works well for rkt. It should work with effectively `systemd-run --slice=$parent` under systemd driver + * Yuju: We can add the info call to give that info to CRI, just do it :) +* When can we assume pod-cgroups exist, e.g. for eviction and so on? +* CRI logging proposal: https://github.com/kubernetes/kubernetes/pull/34376 + * q: This is only under CRI/experimental, right? Yes, it's part of an experimental flag, default should not differ +* CRI networking: https://github.com/kubernetes/kubernetes/pull/34276 + * Original issue: https://github.com/kubernetes/kubernetes/issues/28667 + * Who should own it? Kubelet or runtime? + * How about the configs (e.g. plugin dirs etc) + * Freehan: Eventually move to part of the runtimes, deprecate kubelet flags + * Will kubenet still exist? Only CNI? + * Eventually it'll be a cni plugin perhaps + * sig-networking discussed this some + * Some considered out-of-band vs passing through all networking stuff + * In the future, higher level "Network" objects might exist. Already, networking exists as a kubernetes construct to a degree. + * CRI will have to grow for this to include a good chunk of that.. or out-of-band + * In the future, the 'UpdateConfig' object might expand and these objects will have to be somewhat runtime-implemented + * CRI will hve to include tc, etc type stuff so that the runtime can apply network shaping + * There's also the implicit assumption that networking metrics aren't core when they're moved out of kubelet maybe + * Conclusion: Let's roll forwards so we can look at more than just a tiny start, reconvine next week. + + +# Oct 04 {#oct-04} + + + +* CRI status updates + * Exec/port-forward/attach shim; can we just implement the grpc bit without it due to time concerns? Should we start implementing the shim? + * a) Implement internally for now is okay as docker did (yes)? The shim-library will provide a fairly similar interface + * Logging + * Try to draw conclusion this week. + * Monitoring + * Use cadvisor for now, punt till post 1.5 + * Docker integration status: + * Passing most of the tests (testgrid link) + * node e2e: https://k8s-testgrid.appspot.com/google-node#kubelet-cri-gce-e2e + * cluster e2e: https://k8s-testgrid.appspot.com/google-gce#gci-gce-cri + * Adding serial and per-PR builders soon + * Lantao (random-liu@) is working on integration over grpc: https://github.com/kubernetes/kubernetes/pull/33988 + * Networking + * Freehan is working on doing the networking plugin work for CRI + * Moving tests to node-e2e, and then will do docker-integration work + * Help wanted: CRI validation test + * https://github.com/kubernetes/kubernetes/issues/34040 +* rktlet demo +* KubeCon sig-node F2F tentaively planned for 11/7. Kindly respond to the [existing thread](https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/kubernetes-sig-node/t1S77NwujlE/ybK-UtvaAAAJ) if you can make it. Video Conferencing can be setup for remote attendees. +* Node-e2e; should it use kube-up-like environments instead of what it has now? + * Provides benefit in that you're testing a more "production" environment + * Could at least share the startup scripts fairly easily + * If we had a standardized "node-setup" shared piece at the bottom, then we could more easily test/validate by knowing there's this one way. + * Current kube-up setup has duplicated code and burdon and it makes it tricky to claim x distro is supported. Goal is to make it easier to add new distros and maintain better. + * Share more code at the node level for setup. Document the exact steps needed for setup in general. +* + + +# Sept 27 {#sept-27} + + + +* Change the way of expressing capability in API (cri first) + * Currently the default is just "whatever the runtime does" + * This proposes moving it to kubelet setting an explicit default for add/remove and tell the runtime *exactly* what to set, not a set to add and remove from its ad-hoc set. + * This sets the path forwards to potentially exposing it to the external API, but that's out of scope for sanity reasons. + * https://github.com/coreos/rkt/issues/3241 + * Current it's docker specified, with [a default list + add_caps - remove_caps](https://github.com/docker/docker/blob/eabae093f4df2d6cd3f952131ae7109d92480674/daemon/oci_linux.go#L209-L222) + * It's preferred to change to a [white list based interface](https://github.com/yifan-gu/rktlet/blob/2e3a9b77d32e765da5a3e557009607bddeb4e362/rktlet/runtime/helpers.go#L225-L238) at least for the CRI. We are not gonna touch the k8s API for now + * Propose that CRI is provided an explicit set of capabilities, not an add and remove set. Normalizes runtimes and behavior + * Take the default from docker, merge in what was changed + * Makes sense probably, "PRs welcome" + * Long term, maybe a cluster policy or pod-security-policy for default set + * **Action**: Yifan (or other) propose / issue to discuss further based on what we have here. +* rkt CRI demos: + * isolators in rkt app add (sur) + * port forwarding (casey) +* CRI e2e testing + * https://github.com/kubernetes/kubernetes/issues/33189 + * https://k8s-testgrid.appspot.com/google-node#kubelet-cri-gce-e2e + * Bypassed CRI to support logs/exec for now + * Support only docker 1.11+ + * Q: If you're writing your own CRI implementation, how can you test it? + * Will we setup a conformance a CRI conformance test? + * No plan yet, for now we can use node-e2e test, and once we have more cycles we can create something specialized/better + * Adding new tests and testing external things is hard and complicated (lots of discussion) + * Talk to sig-testing! +* node-e2e testing discussion (what are they, etc): + * How they're run: https://github.com/kubernetes/kubernetes/blob/master/test/e2e_node/jenkins/e2e-node-jenkins.sh + * There's also a make target for it https://github.com/kubernetes/kubernetes/blob/master/docs/devel/e2e-node-tests.md#running-tests + * Node conformance is a separate thing, it's a contaienrized test framework target for validating whether a node meets the minimum requirement of a Kubernetes node. + * https://github.com/kubernetes/features/issues/84 (see links in there) + * Docs of current version http://kubernetes.io/docs/admin/node-conformance/ (We've pushed the alpha version image onto the registry, currently several volumes need to be mounted into the container, we'll simplify that in newer version) + * Pending PRs: + * system verification: https://github.com/kubernetes/kubernetes/pull/32427 + * containerize node conformance test: https://github.com/kubernetes/kubernetes/pull/31093 +* CRI logging discussion continuation + * https://github.com/kubernetes/kubernetes/pull/33111 + + +# Sept 20 {#sept-20} + + + +* F2F meeting after KubeCon - Kindly respond to sig-node email thread [here](https://groups.google.com/forum/#!topic/kubernetes-sig-node/t1S77NwujlE) +* Discuss SELinux and pod level vs container level SELinux context + * Should this context be moved to the sandbox level? https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/api/v1alpha1/runtime/api.pb.go#L1238 + * Derek Wayne Carr to defer to Paul Morie + * From Yuju in slack: @philips there are `PodSecurityContext` and per-container `SecurityContext` in k8s API. The per-container security context takes precedence over the pod-level one. SELinux is included in both security contexts. + * +* TTY Attach progress update + * https://github.com/kubernetes-incubator/rktlet/issues/8#issuecomment-248000500 + * Luca still digging into technical details of the forwarding and working on the prototype that he has working (see [WIP code](https://github.com/lucab/rkt/commit/cca63145d01641e3da75d029160725fd47013bff)) + * Need to figure out how we encode and proxy resize, etc. Any ideas? + * Derek to ask Mrunal to review for ocid +* minikube rktnetes demo (sur) + * https://github.com/coreos/minikube-iso/#quickstart + * https://github.com/kubernetes/minikube/issues/571 +* Demo rktlet + new cri interface rkt (yifan) +* Logging in CRI @tmrts [(https://github.com/kubernetes/kubernetes/pull/33111](https://github.com/kubernetes/kubernetes/pull/33111)) + + +# Sept 13 {#sept-13} + + + +* ocid demo / status update + * why does this still have an infra container? + * to support init systems in pods in future (?) +* reminder: cAdvisor release will be cut 9/15 + * https[://github.com/kubernetes/kubernetes/issues/27097#issuecomment-237422351](https://github.com/kubernetes/kubernetes/issues/27097#issuecomment-237422351) +* Discuss "Expose Canonical Image Identifier" proposal path forward + * https://github.com/kubernetes/kubernetes/pull/32159 + * Red Hat has asked for closure on this design this week + * RepoDigest is the preferred ImageID when present is key point +* rkt CRI demo/status update (sur) + * WIP at https://github.com/coreos/rkt/pull/3164 + * kubernetes integration at https://github.com/kubernetes-incubator/rktlet/pulls +* rkt attach status update (lucab) + * https://github.com/kubernetes-incubator/rktlet/issues/8 + + +# Sept 06 {#sept-06} + + + +* Pod Container Status ImageID (derekwaynecarr DirectXMan12) https://github.com/kubernetes/kubernetes/issues/31621 + * Right now we run by tag / sha. When you run by tag, you lose auditability. There's a local image id, but that image id is *not* the content-addressable ID + * ContainerStatus ImageID is thus not really actually useful. At all. + * Should users be able to determine exactly the image running? A) obviously yes, but you can't now + * History: Why do we have ImageID? + * Based on the 'ssh into a node, docker ps' stuff mostly + * Docker does not provide the sha when you pull by tags in a structured way, just a status text (what you see printed when you docker pull foo:tag) + * Docker does not always populate repo digest + * Possible solution: Have kubelet resolve tag -> digest itself + * Downside, kubelet is now relying less on docker and has to do more + * Maybe an admission controller could translate? + * What do we do with the existing field? Do we replace it with hash, or do we have to add a new field? + * Should it be lockstep across all nodes; resolve it before hitting kubelet? + * I don't think we can because of RestartPolicy + ImagePullPolicy interactions. Unrelated. + * This issue is *just informational*; ContainerStatus tells the truth of the current running reality, no more, no less, no other changes. + * Discuss on the issue +* User-namespace + * https://github.com/kubernetes/kubernetes/pull/30684 + * Was not included in 1.4 as a "late exception" + * Discussion about whether this should be done differently, what the process really should be.. + * TODO, we have some idea of what it should be, @derek to writeup / provide a link + * More inclusion of the community would help in terms of transparency as well + * Push for userns in 1.5 :) +* Pre-review of node-team 1.5 features + * https://docs.google.com/spreadsheets/d/1nBneiObfPHZwMrsIuvgCSWDm90INdxSJRUFUf11KLhY/edit?usp=sharing +* Node-e2e flakes a bit more on CoreOS than other images + * Should we be running these tests for all distros? Is there really value in blocking per-pr? + * AI: dawnchen: file the issue to carry the discussion +* Rktnetes sync notes: + * Moving to rktlet repo and re-vendoring to kubelet + * WIP CRI implementation on rkt side + * Encourage creating issues about rktnetes in rktlet repo + * We recognize issues will still be filed against the main repo, and we don't want to proactively move issues because github doesn't support that, but if we can start with them there, that's easier to triage for us. + + +# August 30 {#august-30} + + + +* Node conformance test suite: + * https://docs.google.com/presentation/d/18NbozIL22BM4RI2jD7sdWG5DphYpa6pLvTOrF6ROYJM/edit#slide=id.g165a166f0d_0_0 + * https://docs.google.com/spreadsheets/d/1yib6ypfdWuq8Ikyo-rTcBGHe76Xur7tGqCKD9dkzx0Y/edit#gid=0 + * How much will this validate container-runtime configuration? E.g. userns settings + * Container runtime validation tests, which right now is really docker validation. We don't have CRI to run validation against + * +* Kubelet distribution -- Build + deployment flags for different kubelet varients? No varients? + * Context: runtimes, cadvisor, the like; vendored dependencies + * Concerns (Dawn): + * +* rktlet: In process vs out of process, opinions, reasons? https://github.com/kubernetes/rktlet/pull/3#issuecomment-242180549 + * Philips: Discussion around being consistent for all CRI being in or out + * Vish: I think the discussion is really about the longterm end state of the CRI and deployment model + * Concerns: + * Client-server because we have to support many + * More clear-cut interface + * Better release philosophy maybe + * Dawn: Push discussion to a PR that was just posted +* Discussion: Ideas to improve debugging kubelet in production (@derekwaynecarr) + * Kubelet wedged, etc. Verify difficult to debug + * Proper audit / trace logs for debugging would be great + * ref: https://github.com/kubernetes/kubernetes/issues/31720 + * the life of a pod needs better logging to point out a specific issue + * Create a doc with some of these ideas, options + * Kubelet socket to query debug info from? + * There's an http api for much of this information + * Related: Tool to improve introspecting a node, e.g. "what pods are running, what do they look like" +* Host user namespaces in 1.4 (https://github.com/kubernetes/kubernetes/pull/31169) +* oci integration status update + * Create start stop cycle stuff works in a basic way. Missing tty (coming soon™) + * Demo in a week or two + + +# August 23 -- Cancelled {#august-23-cancelled} + + + +* Node conformance test suite +* Minukube + rktnetes, demo @sur (move to next week as well) + + +# August 16 {#august-16} + + + +* Node performance benchmark (zhoufang) + * Slides link: https://docs.google.com/presentation/d/1pYNnKo7OF-IHOwnSJ1hZvKmEwzEKc2y3IoZIOjYcDK4/edit + * Do we have a repo / issue / PR we can follow for running this on our own? + * Not yet, more stuff needs to be merged and so on first. TODO, make and link a tracking issue (maybe a future sig-node) + * Will this be integrated with the existing testing dashboards / gubernator stuff? + * For now pushed to GCS, in the future talk to infra team + * Is there an option to push this to prometheus? Other sigs have made it possible to support e2e -> prometheus pushes + * Not right now? We don't have a real answer right now + * Long term, we should be alerting when there are regressions shown in this, it will be automatically run + * How does this work? + * Standalone cadvisor + * It should already be mostly runtime agnostic + * Yifan and Zhou to sync on how to run this for rktnetes to get pretty results there :) +* CRI: Is there a way to show a runtime's cgroup? + * Not at this moment afaik; it would make sense to add it as part of the 'Version' api + * Should we add a cgroup path for the runtime's cgroup(s?) to the kubelet? + * Vish: The kubelet cares about pods and the node as a whole, why do we care about this? + * Derek: Open an issue for this, more discussion https://github.com/kubernetes/kubernetes/issues/30702 +* CRI attach/exec/portforward + * https://github.com/kubernetes/kubernetes/issues/29579 + * Milestones / v1 definition for interested parties: + * https://github.com/kubernetes/kubernetes/issues/28789 + * Concrete next step for this issue? + * TODO (yujuhong@): Clearly summarize the possible options and tradeoffs so far to improve coherency of the issue, and hopefully be able to decide to go with one? +* CRI area owners + * Dawn brought up that we should discuss area owners to help drive the progress in individual areas. +* Could we add GPU discovery/assignment capabilities to kubelet in v1.5? + * https://github.com/kubernetes/kubernetes/pull/28216 + * There were still some question I think? But yes, you can start working on 1.5 now, good luck getting reviews :) +* Should we use annotations to expose sorta security related pod features, specifically sysctls + * Sysctls: https://github.com/kubernetes/kubernetes/pull/26057 + * Current proposal is first-class, not annotations. Concerns around annotations, security… + * Vish: Why is validation a concern for an alpha opt-in feature? + * (ref app-armor, also annotations) + * Maybe have a node-whitelist that is enforced on the kubelet layer. Kinda messy UX, but it should resolve these security concerns + * Do we need to have scheduling decisions? + * We can use taints-tolerances + * Derek: What are the next steps for sysctls? We already know what's accounted / not accounting, how do we decide a whitelist. + * Vish: Proposal didn't make this clear enough; it needs the information in a better form. + * In the 1.4 timeline: Start with a node whitelist (default empty list of sysctls), and then on a per kubelet basis people can choose what they're okay with + * **Vish**: Comment something to the above's effect on the sysctl PR +* Pod level cgroups: will they land in 1.4 timeline? + * Dawn has it marked for 1.5 as a p0. + * Vish: Probably won't happen this week + * Action Item (?): Disable flags for 1.4 if it's not making it in +* UsernsMode + * https://github.com/kubernetes/kubernetes/pull/30684 + * + + +# August 9 {#august-9} + + + +* Add a k8s repo for rkt/CRI integration ? (rktlet) + * Conclusion: yes, will be done by Dawn for sig-rktnetes after this meeting + * https://github.com/kubernetes/rktlet +* Add support for UsernsMode? + * Redhat wants it for 1.4 and is willing to do the work for it + * No need to change the default, just make it possible + * Willing to also do podsecuritycontext stuff so administrators can control it correctly + * Dawn: Some concerns about whether we can finish it in time since we have a "freeze" coming up so soon. We do want the feature though (:+1:), but we need to follow the rules. + * Some technical/semantic issues with userNS since it will break some existing kubernetes features (other namespacing, mounts of volumes owned by root, etc, kube tools?) + * Proposal incoming for future discussion with use case +* Docker validation updates + * No manual validation this time. We only have automated docker validation result, it only runs on GCI. + * Current test result: + * Functionality: We are running node e2e in automated docker validation, currently all green against docker 1.12.0. + * Performance: Not yet. We will run kubelet performance test in node e2e against docker 1.12.0 soon. + * Like previous release, support multiple docker versions still, and documented the known issues +* 1.4 feature updates + * https://github.com/kubernetes/features/labels/team%2FSIG-Node + * https://github.com/kubernetes/features/issues/39 + * OOD handling: getting inode support. No concern + * Issues found on devicemapper support + * InitContainer to beta / GA + * Should we move it to beta / GA, considering that not all runtimes support it now (rkt known issue; will be solved by CRI) + * Should there be a procedure for whether a runtime specific feature can be included as GA? Does it need to be expressed in the CRI where it's possible for something to integrate with it? Is there a better answer? Need to have this discussion somewhere + * Does there need to be a threshold for number of runtimes support, or api can show it, or what? + * Dawn will file an issue to discuss this (ping sig-node) + * Node conformance test feature: + * NodeSpec: https://github.com/kubernetes/features/issues/56 + * Node conformance test: https://github.com/kubernetes/kubernetes/issues/30122 +* So far we've improved node-e2e in various ways. + * Next step, maybe not for 1.4, package conformance tests as a container image that can run all the tests against that node + * Substep 1: static binary + * Substep 2: docker image + * AppArmor support https://github.com/kubernetes/features/issues/24 + * On track + * per-pod cgroup / qos enforcement on the node + * need to get existing open PRs merged! + * trying to get a systemd support in next week (once prereqs merge) + * full details: https://github.com/kubernetes/kubernetes/blob/master/docs/proposals/pod-resource-management.md#implementation-status + * dynamic configure Kubelet https://github.com/kubernetes/kubernetes/issues/27980 + * On track +* Status on kubelet/Docker CRI integration + * In progress: added a [kuberuntime](https://github.com/kubernetes/kubernetes/tree/master/pkg/kubelet/kuberuntime/) package on the kubelet side to use CRI. + * Not tied to any release, WIP still.. + * Added a shim for docker to implement CRI. Currently this supports only basic features with little test/validation. This is blocked on the kuberuntime implementation for a more complete validtion. + * Other parts of CRI are still under discussion (e.g.., [exec with terminal resizing](https://github.com/kubernetes/kubernetes/issues/29579), metrics, logging, etc) + * (metrics, comment on [Dawn's summary](https://github.com/kubernetes/kubernetes/issues/27097#issuecomment-237422351) near the end) https://github.com/kubernetes/kubernetes/issues/27097 + * Need to correct issues with stdin/stdout/stderr/etc streams in the proto files for exec +* Follow-up on new repo in Kubernetes org for node feature discovery + * https://github.com/kubernetes/kubernetes/issues/28311 + * Process, how do you get an actual conclusion at the end or ownership of the new repo etc. The decision is still not well defined, and the procedure still needs help + * Should this be part of the k8s-community meeting or k8s-dev mailing list rather than sig-node? +* Very short update on sysctls (sttts): + * "the table": kernel code-level analysis of sysctls on proposed whitelist: https://github.com/sttts/kubernetes/blob/bd832c98794bbbdf3618c41e996d74fa091143e5/docs/proposals/sysctl.md#summary + * kmem accounting for ipc works until kernel 4.4 + * broken since 4.5 due to switch to opt-in; probably simple fixes +* [Remove some kubelet dependencies](https://github.com/kubernetes/kubernetes/issues/26093) (dims) (PR's for [pidof](https://github.com/kubernetes/kubernetes/pull/30002), [brctl](https://github.com/kubernetes/kubernetes/pull/30056), [pkill](https://github.com/kubernetes/kubernetes/pull/30087)) - Do we want to do some of the cleanup for 1.4? + + +# August 2 {#august-2} + + + +* Discuss and get feedback on adding [snap](https://github.com/intelsdi-x/snap) as a supported datasource to heapster + * There's been discussion of splitting metrics out into "core" and "monitoring"/additional ones + * Core ones should be consistent, well understood, defined strongly by kubelet probably + * Heapster currently does both core and monitoring. If snap is meant to be in addition to "core" metrics, then that's great, if it's meant to also replace "core" then it needs to be a more involved process. +* Proactively reclaiming node resources, possibly with an option of administrator provided scripts https://github.com/kubernetes/features/issues/39#issuecomment-235069913 +* Discuss sysctl proposal questions and kmem (sttts and derekwaynecarr) https://github.com/kubernetes/kubernetes/pull/26057#issuecomment-236574813 + * We can expose sysctls as knobs so long as they're properly accounted for in the memcg + * Argument for "unlimited" other than the memcg limits for some of them (e.g. tcp buffers) + * Potential issues if applications change its behavior based on specific sysctl values + * Further experiment that the whitelist of sysctls are all "safe" to increase; they're all namespaced, but all they all resource-isolated (accounted). + * Separate discussion: + * Node defaults for sysctls +* Brief follow up on new repo in Kubernetes org for node feature discovery (Connor/Intel, cross-posted to sig-node from dev). +* Reminder: Expecting code/feature freeze in 3 weeks for v1.4. Bug-fixes and additions to 1.4 features will be accepted beyond that cutoff, but not new feature impls. + + +# July 26 {#july-26} + +Agenda: + + + +* Discuss and get feedback on adding [snap](https://github.com/intelsdi-x/snap) as a supported datasource to heapster +* Shoutout to Lantao for his work on improving node e2e tests :) https://github.com/kubernetes/kubernetes/pull/29494 +* https://github.com/kubernetes/features/issues?q=is%3Aissue+is%3Aopen+label%3Ateam%2FSIG-Node +* Meeting called short, go have fun :) + + +# July 19 {#july-19} + + + +* Last week there was a discussion with CoreOS etc about attach and logging features + * [Attach](https://github.com/kubernetes/kubernetes/issues/23335); should we deprecate? Do we have a way at all? + * [Logging](https://github.com/kubernetes/kubernetes/issues/27154); `kubectl logs --since` and other bits make it nice to have timestamps. Still up for discussion for how the stdout actually gets logged / stored. General consensus is 1) have timestamps 2) have it consistent across runtimes + * Conclusion tending strongly towards keeping the since feature and having an opinion on the logging format, including timestamps + * User experience / tooling that is "kubernetes native", not runtime-specific + * Brandon: feature/doc talking about what this might look like + * Networking in CRI: + * Push down to the runtime vs hybrid between kubelet owning it; more sig-networking + * Volumes: Kubelet handles plugins pretty much entirely; continue with this as well. +* KubeletConfiguration related refactoring and removal of the old KubeletConfig type (https://github.com/kubernetes/kubernetes/issues/27980, https://github.com/kubernetes/kubernetes/pull/29216) + * Allow the kubelet to consume a configmap from api-server + * Also improve test setups +* Focus for v1.4 for all sig-node members: + * Note: Feature-deadline is this Friday (right?) + * https://docs.google.com/document/d/1Ei7uDriZenhFRJQAzYIwEgQm39kR0jCxTHK4qPXQFDo/edit + * (CoreOS doc has a node section): https://docs.google.com/document/d/1Jq_U46dSsI5rXAfAAOegtKu2YMoWHhZECvoM3I9vGaM/edit?ts=5775c00a#heading=h.q5z0exowt425 + * Clean and efficient pod lifecycle extensions in the new container runtime architecture (https://github.com/kubernetes/kubernetes/issues/28789) + * As an operator (not user) how do I add an additional label? + * Option: configure kubelet to talk to a proxy docker.sock that adds on labels maybe (but how do you get the podspec) + * How do we actually track sig-node features in a discoverable way? Too many issues under label:team/node, not enough in feature repo? + * Should we just put everytihng in the features repo? + * Let's go with that for now + * How do we track node specific issues in features repo? Apply a sig-node label maybe? +* Next week google sig-node will be OOTO + * Euan to drive the meeting, possibly a short one due to lack of the Google folks :) +* Tangent discussion of zoom vs hangouts vs if people can even join without a Googler on the call + * External hangout? (a limit of 10 people) + * Zoom? (Doesn't work with chromebox and chromebook) + * Defer, a Googler (Vish & Tim St. Clair) will join and make sure people joined. +* Pod Cgroups + * https://github.com/kubernetes/kubernetes/pull/26751 + * https://github.com/kubernetes/kubernetes/issues/27204 + * Started as a simple feature, but was surprisingly complex? + * Learnings to share + * https://docs.google.com/presentation/d/13N1ZdCk1Dg_JZGCJhLZTb5j5vtq9RPM31ZIFJ2xu690/edit?usp=sharing + * Still pending work to do with things like qos policies, systemd integration, rkt, etc + * To hit 1.4, will likely need a new owner +* Refactoring kubelet code + * https://github.com/kubernetes/kubernetes/pull/29216 + * Shoutout to reviewing that to improve things a bit + * Broad tracking issue for cleaning up kubelet actions (does one exist already?) +* Is my thing a feature for github.com/kubernetes/feature + * https://github.com/kubernetes/features#is-my-thing-a-feature + +Aside: + + We need some container runtime interface implementation term that is prounancable. + + + +* oclet (awk-let) rklet (rock-let) docklet (dock-let) + +**Action**: + + + +* Feature owners: File a feature in the feature repository by Friday which at least has a good title +* Paul: Tracking issue for kubelet cleaning/refactoring + + +# July 12 {#july-12} + + + +* https://github.com/kubernetes/features/issues/16 + * Extensibility for monitoring and logging will be handled by kubernetes + * But extensibility from a lifecycle sense is handled by the new runtime API. + * https://github.com/kubernetes/kubernetes/issues/28789 + * Once docker runtime integration has been asbtracted out to use the runtime API, that plugin can be forked to meet this use case. +* More of a question rather than an agenda item - Are there any old ideas/pr(s)/issues(s) around Virtual Machines in Kubernetes. (Dims) + * https://github.com/kubernetes/kubernetes/pull/17048 + * https://github.com/kubernetes/frakti + * https://github.com/kubernetes/kubernetes/pull/28396/ + * rkt can also launch pods in lkvm-stage1 today + * http://kubernetes.io/docs/getting-started-guides/rkt/#modular-isolation-with-interchangeable-stage1-images +* Vertical scaling of pods + * Hard for many reasons; in the short, we have to change a large portion of the stack because it's such a significant change + * Scheduling, quota + * node level actually launching + * etc + * Defer solving it for now because it's so complex? + * Maybe at least have a roadmap for it + * Maybe part of sig-autoscaling + * Feature issue opened: https://github.com/kubernetes/features/issues/21 + * We need a feature owner for this. + * Loop in @ncdc (Andy) from RedHat on the feature issue +* OCI runtime naming - https://github.com/kubernetes/kubernetes/pull/26788#issuecomment-231144959 +* Updating max pods per core - issue number? https://github.com/kubernetes/kubernetes/pull/28127 +* Container Runtime API status - https://github.com/kubernetes/kubernetes/issues/28789 +* Docker releases are being tested automatically as of now. We've already tested docker 1.12-rc2 and rc3, the e2e test is all green now. +* Simplying kubelet configuration - [#27980](https://github.com/kubernetes/kubernetes/issues/27980) +* Google main focusses this release: + * Disk isolation / handling of problems + * Runtime interface work + * Node/host validation + * Improving the configuration story? +* https://github.com/kubernetes/kubernetes/issues/24677#issuecomment-232135730 +* Share tests between e2e and node e2e: https://github.com/kubernetes/kubernetes/pull/28807 + * Just a shoutout to that work, no questions / problems :) +* + + +# July 5 {#july-5} + +Cancelled today. + + +# June 28 {#june-28} + +Cancelled today. + + +# June 21 {#june-21} + + + +* Note taker: Michael Taufen +* 10 minute overview of OpenShift System Verification and Test Repo (svt) -- Jeremy Eder + * https://github.com/openshift/svt +* Container runtime interface -- Yuju Hong +* Brainstorming on 1.4 features / roadmap + * https://docs.google.com/document/d/1Ei7uDriZenhFRJQAzYIwEgQm39kR0jCxTHK4qPXQFDo/edit proposed by Dawn Chen + * rktnetes 1.1 roadmap https://docs.google.com/document/d/1_A6NwPlQuV4t4uKe1RD9d-uhgSLImFevoS7tmg2P12M/edit?usp=sharing + * Work items proposed by community: + * Expose sysctl (Redhat) + * Pod checkpointing in kubelet (by CoreOS), + * required for self-hosting. + * Some other items (noted by Red Hat, but not proposed for 1.4) + * Be able to build NUMA + * Standardize GPU support, also any device that can be ref counted + * Additional sysctls + * Third party kernel modules + * Nvidia made their whole CUDA stack container friendly, but not every vendor will. They are hearing a lot of interest in hardware accelerators that require out of tree modules. + + +# June 14 {#june-14} + + + +* Cancelled + + +# June 7 {#june-7} + + + +* ~~10 minute overview of OpenShift System Verification and Test Repo (svt) -- Jeremy Eder~~ -- cant make it this week sorry +* Container Runtime Interface (hopefully final discussion for the initial proposal) +* per-pod cgroup proposal + * https://github.com/kubernetes/kubernetes/pull/26751 +* cAdvisor release cut status +* rktnetes v 1.0 status + * Selinux regression https://github.com/kubernetes/kubernetes/pull/26936 + * https://tests.rktnetes.io + * DNS - (systemd bug) https://github.com/kubernetes/kubernetes/issues/24563#issuecomment-224043285 + * + * hostpath/subpath error - Unknown, need to triage (https://github.com/kubernetes/kubernetes/issues/26986) + * Some issues running master node with rkt under kube-up right now (mostly related to how it's configured) + * Dns domain failure (systemd bug) https://github.com/kubernetes/kubernetes/issues/24563 + * Influxdb isn't running failure (https://github.com/kubernetes/kubernetes/issues/26987) + * Docs and people kicking tires would be welcome/helpful (the first helpful for the second) + * Known issues docs: https://github.com/kubernetes/kubernetes/issues/26201 + * Needed doc: "How to use rktnetes; the rktnetes user guide :)", we have one WIP, but not as a PR yet. Euan in charge of getting that in asap + * Nice-to-have: A doc also explaining the features / differences of rkt and why you might want to switch. + * Move rktnetes 1.0 to p0 (@yifan)and kubernetes 1.3 milestone. (so burndown meetings have visibility / other 1.3 visibility) + * sysctl proposal: + * https://github.com/kubernetes/kubernetes/pull/26057 +* Announcement: + * [NodeProblemDetector](https://github.com/kubernetes/node-problem-detector) demo + * 1.4 node roadmap planning + + +# May 31 {#may-31} + +Docker v1.10 is having performance issues. @timothysc https://github.com/kubernetes/kubernetes/issues/19720 + +Derek Carr is working on updating systemd config. Needs help with update existing CoreOS images. + +Node e2e has ability to pass ginkgo flags. Tests that fail on RHEL can be blacklisted using ginkgo `skip` flags. + +Container Runtime Interface Discussion - How to make progress? Vendors need to be unblocked. - https://github.com/kubernetes/kubernetes/pull/25899 + + + +* How will new features be implemented given the pending refactoring? Dockertools package will be moved to use the new interface. +* New runtimes will not be part of the main kubernetes repo +* Yu-Ju will be working on re-factoring the docker runtime package to work against the new API. +* Euan to review the runtime proposal today and provide his feedback. There hasn't been any other concerns from the community as of now + + +# May 24 {#may-24} + + + +* [node e2e test improvements ](https://docs.google.com/document/d/1Q3R-wmPDJesFz-MoLEZxlNZwDgRCenhzwETunIqPNK4/edit#) + * Submit queue looks bad + * e2e flakes are a big problem + * one way to tackle is to test separate components separately + * Makes it easier to debug, run in parallel, isolate issues -> PRs +* **TODOs:** +* integrating with the rest of the conformance testing suites + * how are node tests run when you run conformance tests + * against every node in the cluster? then we know every single node complies instead of just generally "we think the nodes comply" + * also need to run tests against previous releases, so that we run node e2e tests against everything we run the regular e2e tests against +* isolating the node e2e job from other jenkins jobs so we don't exhaust jenkins resources + * moving our things to their own "projects" +* concurrency + * sometimes on same instance, but we can also shard tests. + * as we scale this up, this will be important to keeping run time for tests low (~10 minutes ideal) +* test harness + * could use some additional features, such as supporting different values for the same flag +* Debug logs -> to files instead of just the crappy terminal dump +* Better log file organization +* Test coverage + * Moving more stuff from e2e test suite to node component test suite + * Start removing tests from full e2e suite as we add them to the node test suite + * Helps address flakiness because suites will run faster +* rktnetes updates (https://docs.google.com/document/d/1otDQ2LSubtBUaDfdM8ZcSdWkqRHup4Hqt1VX1jSxh6A/edit# ) + * 2 pending network related PRs for fixing network regression + * https://github.com/kubernetes/kubernetes/pull/26096 + * https://github.com/kubernetes/kubernetes/pull/25902 + * Small PRs waiting to get in: + * Remove systemctl shell out with API calls https://github.com/kubernetes/kubernetes/pull/25214 + * Add pod selinux support https://github.com/kubernetes/kubernetes/pull/24901 + * Need to address paul's comments and rebase. + * Read-only rootfs https://github.com/kubernetes/kubernetes/pull/25719 pending to be merged + * + * rktnetes benchmark https://github.com/kubernetes/kubernetes/issues/25581 + + +# May 17 {#may-17} + + + +* 1.3 status updates +* rktnetes updates: + * shaya's cadvisor PR is getting close to merge (merged) https://github.com/google/cadvisor/pull/1284/commits + * tested it works for the autoscaling tests + * working on benchmark (yifan/shaya) + * Need to run benchmark e2e tests + * CNI/kubenet PR, @freehan will offer a review (reviewed, LGTM with nits) + * https://github.com/kubernetes/kubernetes/pull/25062 + * Experiencing some regression on getting logs from journalctl + * https://github.com/coreos/rkt/issues/2630 + * Conformance tests on baremetal with lkvm: (https://docs.google.com/spreadsheets/d/1S5mswBYpkT2IYAMcuzoTyOE_cdae-2vHUpnghKt6lsQ/edit#gid=0 ) + * the journal log issue: https://github.com/coreos/rkt/issues/2630 + * the flannel plugin doesn't have default routes: https://github.com/containernetworking/cni/issues/184 + * Portforwarding is not implemented for lkvm stage1 + * Following issues are all covered by LGTM PRs: + * Kubectl exec error: https://github.com/kubernetes/kubernetes/issues/25430 + * Pod status error: https://github.com/kubernetes/kubernetes/issues/25242 + * Port forward error: https://github.com/kubernetes/kubernetes/issues/25117 + * Support per pod stage1: https://github.com/kubernetes/kubernetes/issues/23944 + * Selinux support (not really LGTM, needs someone to take another review @pmorie?): https://github.com/kubernetes/kubernetes/issues/23773 + * What new 1.3 features we need to be involved? + * Decision: not to support new 1.3 features for rktnetes 1.0 +* Kubelet bootstrap PR https://github.com/kubernetes/kubernetes/pull/25596 +* Continuation of container runtime interface discussion + + +# May 10 {#may-10} + + + +* Kurma demo from apcera.com (Ken Robertson from apcera) (15mins) +* Talk: rkt architecture and rktnetes (~~30~~15mins) https://docs.google.com/presentation/d/1HFXemzInO4LhZ3pVJtNp0zgLQI0vXEOD4Yzigk9yisA/edit?usp=sharing +* rktnetes v1.0 updates (10mins) + * Run rkt in CNI network created by kubelet https://github.com/kubernetes/kubernetes/pull/25062 + * cAdvisor support using rkt api service: https://github.com/google/cadvisor/pull/1263 + * Other rkt issues: + * https://github.com/coreos/rkt/issues/2580 (bump CNI to fix iptable leak) + * https://github.com/coreos/rkt/pull/2593 (return partial pod in rkt api service) + * https://github.com/coreos/rkt/issues/2487 (read-only rootfs) + * https://github.com/coreos/rkt/issues/2504 (cgroup leak) +* docker version + + +# May 3 {#may-3} + + + +* GPU support +* ContainerRuntime interface continue + * https://docs.google.com/document/d/1ietD5eavK0aTuMQTw6-21r67UU73_vqYSUIPFdA0J5Q/edit# +* Eviction proposal ? any more feedback? +* Docker 1.10 and Docker 1.11 discussion + * Docker 1.11 validation issue: https://github.com/kubernetes/kubernetes/issues/23397 + * Action Item (timstclair): email k8s-dev about docker 1.11 preferences +* Reduce kubelet LOC via moves https://github.com/kubernetes/kubernetes/pull/25028 +* rktnetes: + * let rkt run in the network namespace created by kubelet/network plugin: + * https://github.com/kubernetes/kubernetes/pull/25062 + * This could solve a bunch of remaining failing e2e tests (most of them are looking for pod IP in downward API and /etc/hosts ) + * update the cloud config and kubernetes compoments yamls, has some issues + * https://github.com/kubernetes/kubernetes/pull/22663 + * fixing rkt gc problem (bug in rkt and cni) + * https://github.com/coreos/rkt/issues/2565 + * working on letting user to specify different stage1 image in pod annotation + * PR not ready yet + + +# April 26 {#april-26} + + + +* New ContainerRuntime interface discussion + * Difficult integrate at Pod level declarative interface. + * Expect imperative container level interface, and OCI compatible in a long run + * Proposed 2 options: + * 1) introduce OCI compatible interfaces now + * 2) introduce docker like container level interface first + * AI: yuju write a high level design doc and continue discussing next week +* Node on Windows - initial investigation / next steps + * https://docs.google.com/presentation/d/16nYb13oulBoB4d6QZXYm-9sk3TN898qumzY5ur4wirU/edit?usp=sharing + * https://docs.google.com/document/d/1qhbxqkKBF8ycbXQgXlwMJs7QBReiSxp_PdsNNNUPRHs/edit# +* Followup with custom metrics discussion + * AI: agoldste@ from Redhat is going to write a high-level requirement doc to share with us. A separate VC to continue the discussion. +* rktnetes status updates: + * e2e testing: http://rktnetes.io + * work on stableness issue + * support stage1 fly + * work on fixing the race between cgroup discovery and pod discovery by rkt api service + * rkt usage monitoring: + * https://github.com/dgonyeo/rkt/blob/ddd59b8935a3a9ee7c031e753642ea7863526fde/tests/rkt-monitor/README.md + * Need to put more example results here +* NVIDIA GPU support: + * Kubelet: + + The kubelet should have "--device" options, and volumes for GPU, NVIDIA GPU is part of it, and also have the interface to support other GPUs. + + + Cadvisor include the NVML libs to find NVIDIA GPUs on host, then kubelet could send GPU information to kube-scheduler. + +* Kube-scheduler: + * Include GPU information: + + Number: // how many GPUs needed by the container. + + Vendor: // So far, only NVIDIA GPU. + + + Library version: // run the container on the right host, not just a host with GPUs. + + +# April 19 {#april-19} + + + +* InitContainer proposal discussion: ([#23666](https://github.com/kubernetes/kubernetes/pull/23666)) +* Demo: Out-of-resource eviction? +* rktnetes status updates: + * rkt running on master (demo) + * rktnetes jenkins e2e testing infra + * milestones: https://github.com/kubernetes/kubernetes/milestones/rktnetes-v1.0 + * e2e failures: 14-15, with 8ish we know that we are not supporting, or the tests are too specific. e.g. DNS tests + * Met some stableness issue with kubelet running rkt, need to be fixed + * Per pod overhead measurement (systemd, journald) + * https://github.com/coreos/rkt/pull/2324 +* cAdvisor roadmap updates: + * punt standalone cAdvisor + * cAdvisor validation and testing +* announcement: [minikube](https://github.com/kubernetes/minikube) on local host + + +# April 12 {#april-12} + + + +* Experimental NVIDIA GPU support ([#24071](https://github.com/kubernetes/kubernetes/pull/24071), [#17035](https://github.com/kubernetes/kubernetes/issues/17035), [#19049](https://github.com/kubernetes/kubernetes/issues/19049), [#21446](https://github.com/kubernetes/kubernetes/pull/21446)): add support in kubelet to map host devices into containers, add new `nvidiaGpu` resource (@therc) + * More constrained initial implementation than in #24071 + * Hardcode kubelet to report one NVIDIA GPU when a special experimental flag is enabled + * When a pod requires a GPU, expose hardcoded list of host devices into container + * Do not use any code from nvidia-docker and its Docker plugin yet. This means that images will have to bundle required NVIDIA libraries + * For next steps, consider an approach halfway between having NVIDIA libraries inside every Docker image vs. having them exposed by the nvidia-docker volume driver: have administrators set up a directory on the host with NVIDIA binaries and libraries + * @therc/@Hui-Zhi will send PRs and look into rkt support as well (more feasible if we don't use the Docker plugin framework) +* rktnetes status updates: + * fixed service ip reachable issue, rkt locking issue. + * working on master node using rkt. + * milestone: https://github.com/kubernetes/kubernetes/issues?q=is%3Aopen+is%3Aissue+milestone%3Arktnetes-v1.0 + * things might not be supported: + * hostipc/hostpid mode https://github.com/systemd/systemd/pull/2982#issuecomment-208757763 + * kubectl attach https://github.com/kubernetes/kubernetes/issues/23889 + * read-only rootfs https://github.com/kubernetes/kubernetes/issues/23837 +* Self-hosted kubelet: https://github.com/kubernetes/kubernetes/pull/23343 + +April 4 + + + +* e2e node tests are failing +* cAdvisor with rkt/kvm-stage1? https://github.com/coreos/rkt/issues/2341#issuecomment-205799519 +* rkt container level interface https://github.com/coreos/rkt/issues/2375 +* rktnetes status: https://docs.google.com/document/d/1otDQ2LSubtBUaDfdM8ZcSdWkqRHup4Hqt1VX1jSxh6A/edit?usp=sharing +* Self-hosted kubelet proposal: https://github.com/kubernetes/kubernetes/pull/23343 +* + +Mar 29 + + + +* Kubelet issue with termination - issue with watch: https://github.com/openshift/origin/issues/8176 +* Yifan looking at issues with rkt pod startup latency +* State of cAdvisor refactoring + * Tim St. Clair mentioned that no changes inside kubelet in the near term. +* Systemd driver in Kubelet - Does it need first class support? + * Probably not. We need more information to discuss further +* Kubelet bootstrapping + * GKE will not switch, so there is no immediate changes required for other providers. +* Kubelet CI + * Kubelet Node e2e is now a merge blocker + * Hackathon next week - will be open to the entire SIG. Goal is to increase test coverage + * How do we add a test to the Conformance suite? + * It needs to include only the core kubelet features. But the Kubelet API is expected to be portable. + * Some tests have failed on Systemd in the past like Filesystem accounting + * Some of the kubelet API is distro dependent. + * Why/When write a node e2e test? + * Any node level feature that can be tested on a single host. + * Simpler test infrastructure. Easier to test node conformance + * Smaller scope + * We need multiple deployment configurations for the e2e suite and have the tests be able to discover these configurations. +* Increase maximum pods per node for kube-1.3 release #23349 + * https://github.com/kubernetes/kubernetes/issues/23349 + * Benchmarking is the first step forward + +Mar 22 + + + +* cAdvisor 1.3+ work planning by stclair + * https://docs.google.com/document/d/1aLJ7OBVRO2QKnf8xO0Mbkd_Wyca8J0sr9q12ZObmkO0/edit +* rktnetes: + * https://github.com/kubernetes/kubernetes/issues/23335 + * cadvisor: https://github.com/google/cadvisor/pull/1154 + * Milestone: 85% e2e tests passed. +* Increase max-pods https://github.com/kubernetes/kubernetes/issues/23349 +* + +Mar 15 + + + +* 1.3 planning (proposed by Dawn) + * https://docs.google.com/a/google.com/document/d/12dyUH3HWUMjetWnslnPbY89fyN5I8vLjl6Gi9F2HqOk/edit?usp=sharing + * Not final yet. Waiting for feedback from community, partners, developers and PMs +* rktnetes: + * post-start/pre-stop hooks problem with slow rkt startup + * e2e status: experienced regression on rkt + * kube-proxy issue with kubenet. (doesn't seem to be specific to rktnetes) + * https://github.com/kubernetes/kubernetes/issues/20475#issuecomment-195096662 +* Developer docs: https://github.com/kubernetes/kubernetes/issues/23033 + +Mar 8 + + + +* rktnetes status: + * Working on post-start/pre-stop hooks + * Fixing ENTRYPOINT/CMD issue + * cAdvisor support WIP + * 38 failures out of 178 tests, categorized https://docs.google.com/document/d/1otDQ2LSubtBUaDfdM8ZcSdWkqRHup4Hqt1VX1jSxh6A/edit# +* e2e flake: Downward API should provide pod IP: https://github.com/kubernetes/kubernetes/issues/20403 + +Mar 1 + +[Node e2e tests](https://github.com/kubernetes/kubernetes/blob/master/docs/devel/e2e-node-tests.md) + + + +* Run against PRs using the trigger phrase "`@k8s-bot test node e2e experimental`" +* Run locally using "make test_e2e_node" +* Tests can be built to tar.gz and copied to arbitrary host +* Would like to distribute testing of distros and setup dashboard to publish results + +rktnetes status: + + + +* https://docs.google.com/document/d/1otDQ2LSubtBUaDfdM8ZcSdWkqRHup4Hqt1VX1jSxh6A/edit?usp=sharing + +Feb. 23 + +Docker v1.10 integration status + +- Lantao has fixed all the issues that has been identified. Docker go client had an issue that has been fixed. Docker startup scripts have been updated. A PR fixing -d option is here https://github.com/kubernetes/kubernetes/pull/20281 + +- Prior to upgrade we'll need a daemonset pod to migrate the images. + +- Vishh is working on disabling seccomp profiles by default. + +Yifan and Shaya status updates: + +CNI plugin support for rkt + + + +* kubenet support for rkt https://github.com/kubernetes/kubernetes/pull/21047 +* cadvisor integration + * Basic cgroups stats are available. Patch is out for review. + * Additional metrics will be added soon. + +TODO: vishh to post documentation on patch policy for releases. + +v1.2 will support 100 pods by default. + +Docker v1.9 validation is having issues. We are performing a controlled restart of the docker daemon to try and mitigate the issue. + +Feb. 16 + + + +* Projects and priorities brainstorm + * Refactor kubelet: Too complex for new developer, we should refactor the code. Better separation and architecture is needed. + * Dawn: One important thing is to cleanup container runtime and image management interface. Maybe separate pod level and runtime level api. + * Tim works on cleanup cadvisor kubelet interface. + * Should have a sig-meeting soon. + * @Dawn will file an issue about this. + * Better Disk management (disk resource quota mgmt) + * Openshift issue https://github.com/openshift/origin/pull/7206 + * Performance requirement for 1.3 + * Control Plane Networks, network segregation: https://github.com/kubernetes/kubernetes/issues/21331 + * Share related validation test in openshift. + * Kubelet machine health API + * Kubelet should provide api to receive categorized machine problem from "Machine Doctors" such as machine monitor, kernel monitor etc. + * Some existing systems such as Ganglia https://github.com/kubernetes/kubernetes/issues/21333 + * Who should take actions: Kubelet? Application? Operator? + * Use DaemonSet to handle it and mark kubelet as NotReady? + * Determine if we should support cpuset-cpus and cpuset-mem: https://github.com/kubernetes/kubernetes/issues/10570 + * Arbitrary tar file? + +Feb. 10 + + + +* schedule change? + * going to try Tuesday 11-12 PST +* disk monitoring for rkt + * two proposals: all in-kubelet, in-cadvisor + * https://github.com/kubernetes/kubernetes/pull/20887 + +Feb. 3 + + + +* docker 1.10 validation updates + * https://github.com/kubernetes/kubernetes/issues/19720 + * A go-dockerclient bug: https://github.com/fsouza/go-dockerclient/issues/455 +* updates on sig-node (Asia): + * Meeting notes: https://docs.google.com/document/d/1L8s6Nyu5hNJxCOZLqJsuVEFAScWKhbwcis0X-j-upDc/edit?usp=sharing + * Topics: + * Docker container runtime conformance test. + * Rktnetes outstanding issues. + * Hyper integration. +* image management plans + * image manager to support multi runtimes: https://github.com/kubernetes/kubernetes/issues/19519 + * vish to create follow up doc for discussion +* kernel issues (#20096) + * Conformance tests ? (https://github.com/kubernetes/contrib/issues/410) +* sig-scale concern issue [20011](https://github.com/kubernetes/kubernetes/issues/20011) about kubelet changes potentially breaking kubemark. \ + + +Jan. 27 + + + +* What tests exists for validating kubelet performance across docker releases? + * Docker Mirco Benchmark: https://docs.google.com/document/d/1aKfsxRAmOtHkFf4Wcn0qoFJRU6qOzwATxOVKDVKQucY/edit + * Docker Event Stream Benchmark: https://docs.google.com/document/d/1FUBwdWEqdUR7h-dAwDvOWg9RrjkerN48QmMm7yCylJY/edit + * Code: https://github.com/Random-Liu/docker_micro_benchmark + * +* rkt?https://docs.google.com/document/d/132fMB60poMCcejf8l82Vxpynff8_BRYcZeOn24jn8pc/edit + +Jan. 20 + + + +* OCI meeting updates +* rkt / CoreOS integration status updates +* 1.2 features status recap + +Jan. 13 + + + +* Node scalability issue recap. Discussed via issues and PRs +* systemd spec https://github.com/kubernetes/kubernetes/pull/17688 ready for implementation. Minor discussion on the cgroup library problems in libcontainer. +* OCI meeting is going on, will have more updates in next sig-node sync + +Jan. 6 + + + +* Node scalability and performance (redhat): https://github.com/kubernetes/kubernetes/issues/19341 +* 1.2 scalability goals: https://github.com/kubernetes/kubernetes/issues/16943 +* Node performance testing guide: https://github.com/kubernetes/kubernetes/pull/18779 +* RH goal for 1.2: 160 pods per node diff --git a/sig-node/archive/meeting-notes-2017.md b/sig-node/archive/meeting-notes-2017.md new file mode 100644 index 00000000..e6b5166b --- /dev/null +++ b/sig-node/archive/meeting-notes-2017.md @@ -0,0 +1,1105 @@ +# sig-node weekly meeting + +# Dec 19 Proposed Agenda {#dec-19-proposed-agenda} + + + +* Cloud Providers and Kubelet + * Move cloud provider related code to separate repos + * breaking node controller manager into pieces + * Node lifecycle controller. + * IPAM (IP, CIDR). + * https://github.com/kubernetes/kubernetes/pull/55634 + * https://github.com/kubernetes/kubernetes/pull/50811 +* Logistics + * Jan 2nd meeting? + * Switch to zoom next year +* Next Year: + * Secure Container/Pod + * Node level debuggability + * Windows Container. + * Container Runtime Interface + * Container Monitoring. + * Container Logging. + * Secure Container/Pod. + * More? Come back on January. + + +# Dec 12 {#dec-12} + + + +* Node fencing discussion (speaker: Yaniv Bronheim, Red Hat) + * initial proposal: https://github.com/kubernetes/community/pull/1416 + * Dawn: One concern for Node Fencing is that, the concrete treatment might be vendor specific, can we have a general node fencer to do that? The reason why we haven't worked on remedy system is similar. + * In general, is this useful as a Kubernetes cluster component? + * Need to talk with sig-cluster lifecycle if want this to be a default cluster addon. + * The proposal itself is reasonable. + * Other vendors may have similar solution in their environment, so we can't say whether this should become the default. + * Continue the POC, it is useful, especially for on-prem user. +* [KEP](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/architecture/1-kubernetes-enhancement-proposal-process.md) vs community design docs: what is the preferred way to propose features for sig-node? (Connor) +* SecureContainer updates + * Slides: https://docs.google.com/presentation/d/1yIgNcZjNoIMNNbErBArbJq2bEn16FnMROskmym_T1nM/edit?usp=sharing + * Entitlements: https://github.com/moby/libentitlement + * Higher level security abstraction on top of all the existing (linux) security features. + * Q: Are we going to change CRI for this? - A: There are several options: + * Option 1: Kubelet translate entitlements into a list of configurations and pass to CRI, and keep CRI unchanged. + * Option 2: Pass through the entitlements to container runtime, and let container runtime deal with it. (Need CRI change). + * Open Question: + * How to interact with other plugins on the node, e.g. Device Plugin? + * Should entitlement be at pod level or container level? - Seems to be both, e.g. network entitlement at pod level, some others at container level. + * Secure Containers/Pods: + * Katacontainers: https://katacontainers.io/ + * Q: Talking about the secure sidecar example, what's the difference with today's existing security solution? - A: Hard to measure or define a bar for security level, but today's linux container security features are not enough. + * Virtual kubelet https://github.com/virtual-kubelet/virtual-kubelet. Similar with what @spotter proposed before. +* FYI RHEL e2e-node results for regular and containerized kubelet now in TestGrid (Seth) + * Big thanks to Jan Chaloupka for the work on this + * Uses Red Hat Origin CI infra for doing the tests and publishing to GCS + * https://k8s-testgrid.appspot.com/sig-node-kubelet#kubelet-conformance-aws-e2e-rhel + * https://k8s-testgrid.appspot.com/sig-node-kubelet#kubelet-containerized-conformance-aws-e2e-rhel +* 1.9 release status updates + + +# Nov 21 {#nov-21} + + + +* Feel free to propose KubeCon signode F2F meeting agenda: https://docs.google.com/document/d/1cwKlEqwBCveFUX0vZgB-Si5M6kvnaedGblpleo1NERA/edit +* Derek: Why are we enabling the device plugin alpha feature by default? + * @vishh is owner, need to follow up on this #55828 +* + + +# Nov 14 {#nov-14} + + + +* Device plugin enhancement by Jiaying + * Feature was release in Alpha in 1.8 + * A couple of enhancements in 1.9: + * Completely support dynamical registeration for plugin lifecycle + * Introduce garbage collection mechanism for device plugin + * export gpu related stats through cAdvisor +* Containerd & cri-containerd status updates + * Containerd F2F Meeting Notes: https://docs.google.com/document/d/1TDp5kEuWLS9cL38dZmpffRowd7CKWbzBQrSGawVCoJ0/edit?usp=sharing + * Docker may have some behavior change with the new GC model. + * In short term, nothing changed in kubelet, kubelet will still do synchronized image garbage collection to reclaim disk resource. + * In long term, we may want to more aggressively remove image. And in that case, we may need an asynchronized image removal or policy based garbage collection. +* Container runtimes release policy + * Should publish test result for each release - node e2e, cluster e2e, performance. + * How to add test result to node perf dashboard? + * Vendor may want to run node performance dashboard in their own environment, because: + * We've packaged node performance dashboard, although may not be quite polished. + * Today's node performance dashboard is running on top of GCE, may not make sense to some vendor. + * CRI-O can publish an image, and picked up by node e2e, but should not be PR blocker. + * CRI-O will publish test result on test-grid. + * We do need to run test on different OS distributions with different container runtime, however we are not sure whether we should run them on GCE. + + +# Nov 7 {#nov-7} + + + +* Pod creation latency regression: https://github.com/kubernetes/kubernetes/issues/54651 + * Detected the issues through density tests + * Caused by cni 0.6.0 updates + * Mitigate the issues through PR #? + * Still under investigation. One pr to disable DAD https://github.com/kubernetes/kubernetes/pull/55247, not merged yet. + * Need to define the SLOs now? Need large audience for the decision. +* Live node capacity change: https://github.com/kubernetes/kubernetes/issues/54281 + * concerns: 1) dynamically scale up and down the capacity, which causes system-overhead being dynamically changed. 2) Too complicated. 3) Difficult to management QoS tree cgroups. + * No strong use cases for this feature request. + * Can workaround by restarting Kubelet, then prototype. +* Node Problem Detector enhancement + * Proposal made by Andy Xie : https://docs.google.com/document/d/1jK_5YloSYtboj-DtfjmYKxfNnUxCAvohLnsH5aGCAYQ/edit?usp=sharing + * By default there is no scripts being included in main repo. Only building a framework. + * Not use as the remedy system +* Pod level metrics + * https://docs.google.com/document/d/19bvc8rY5w5ED2yqve8AQdnczumFgrnsbYZk-PV8CyGU/ +* Process improvements: who's interested?/pitch (Lauri, lappleapple in Slack) + * Suggestions from the SIG: + * anthology or newsletter by SIGs with updates of major news + * low-volume communication channel to identify/discover key design changes + * Added context: [Value Delivery Mgmt in the K8s Ecosystem](https://docs.google.com/document/d/113gnr3tKXv79J0IYHyg6VbQwLfb-sCYGphW4D9dLZ_E/edit#) proposal, [Kindness Under Pressure](https://github.com/LappleApple/Teaminess-as-a-Service/blob/master/Kindness-Under-Pressure-Exercise.md) session concept + + +# Oct 31 {#oct-31} + + + +* Shared PID FYI: [#1048](https://github.com/kubernetes/community/pull/1048) now proposes changing the CRI +* Proposal FYI: https://docs.google.com/a/redhat.com/document/d/1FvK_fProM1910m1q0PHoW2nqXMWO8hgrGzkUVNy3KKM/edit?usp=sharing + + +# Oct 24 {#oct-24} + + + +* Cancelled since there is no topic proposed. + + +# Oct 17 {#oct-17} + + + +* Logging issues + * logging vs. CRI: + * Settled with file-based logging for containers. + * Kubelet consumes log files directly, so that Kubelet can have full control on logs + * Issues: + * logging agent to understand the metadata. Can we start making a list? Issue exists somewhere (link?) + * Logging format. Issues with multiline (partial field?). fluentd compatibility? + * Who should rotate logs? + * Used to use logrotate, which used to cause lost log lines. Switched to docker json logging driver, but this cannot be used for all runtimes. + * Kubelet could rotate, but runtime would need to re-open files. CRI-O just implemented this. + * Goal for 1.9: finalize the design + * Next actions: +* PSA: Never too early to start thinking about docs/release notes for 1.9, also [Release Team](https://github.com/kubernetes/features/pull/487) is finalized - 11/22 code freeze, 12/13 release +* Sig specific blog! + + +# Oct 10 {#oct-10} + +][' + + + +* Kubelet bootstrap-checkpoint for self host cluster: + * on track for 1.9 release. P0 + * Owner: Tim + * Pending prs +* GPU metrics updates + * Design doc: https://docs.google.com/document/d/13O4HNrB7QFpKQcLcJm28R-QBH3Xo0VmJ7w_Pkvmsf68/edit#heading=h.mdp6bb226gj7 + * /summary API to expose GPU metrics + * cAdvisor collects GPU metrics through [NVIDIA Management Library (NVML)](https://developer.nvidia.com/nvidia-management-library-nvml). + * cAvisor will be dynamically linked, hence Kubelet too + * Raised question: why not in Device Plugin? Time concerns from accelerator folks. +* CRI-containerd and cAdvisor integration updates + * Slides: https://docs.google.com/presentation/d/1Os3nyMRBlFuiBLCjPgeaPv6jXylrZW5jiDXJejlA3Wg/edit?usp=sharing + * Summary from yesterday's offline discussion + * Core Metrics vs. Monitoring Metrics for containerd integration + * core metrics from runtime through CRI + * To support full /summary API, the rest metrics will be collected through cAdvisor + * From sig-instrumentation (Solly): in the future, /summary is required? Not necessary. It depends on separate monitoring pipeline. + * Monitoring agent on node: how to figure out container and pod relationship. Expecting runtime can expose container related metrics. + * Plug-in resource, like GPU? + * User-defined metrics? +* Peak memory usage for pod/container + * Not in /summary API + * But expose this stats through cAdvisor API + * https://github.com/google/cadvisor/pull/1768 +* [#53396](https://github.com/kubernetes/kubernetes/issues/53396) Missing modprobe ([PR](https://github.com/kubernetes/kubernetes/pull/53642), 1.8), Debian Bump [#52744](https://github.com/kubernetes/kubernetes/pull/52744) (master) [rphillips] + + +# Oct 03 {#oct-03} + + + +* Docker Validation for K8S releases: + * Proposal: https://github.com/kubernetes/kubernetes/issues/53221 + * Although each Docker version has a particular API version, but it does support a range of API versions. + * Stephen: How long does the Docker support window need to be to make this not a problem? - Dawn: 9 months? + * Derek: We'll have multiple CRI implementations, and this should be the responsibility of each CRI implementer instead of sig-node putting resources for container runtime validation. + * Dawn: Agree. + * Docker is still a bit different. Docker is the container runtime we're using for a long time. This is a regression to the user, the change to validate based on API version is still required today. + * But sig-node should not sign on validating any other container runtime. We'll provide portable validation tool for different container runtime implementation, e.g. [CRI Validation Test](https://github.com/kubernetes-incubator/cri-tools/blob/master/docs/validation.md). + * Derek: They are going to add node e2e test on centos and fedora to avoid the [distribution specific docker issue](https://github.com/kubernetes/kubernetes/issues/52110) in 1.8 release in the future. + * Stephen: If there is anything Docker could help, feel free to reach out to them. + * Dawn: It seems that there is no objection of validating Docker API version in Q4. + * Derek: When can we stop giving recommendation for docker versions? Dawn: We are already doing this for a long time, and there are still users and sig groups relying on our decision. But other container runtime should carry the validation themselves. +* cAdvisor refactor discuss: [Slides](https://docs.google.com/presentation/d/1eZIKbJ5DibBN_CRVQfhZ7Vfs1ZGnlApOATXrsjnFMyw/edit?usp=sharing) + * Does On-Demand Metrics include pod level metrics? Node-Level is higher priority and more useful to eviction at least for now. + * Possible issues: + * CAdvisor metrics endpoint and api need to change (e.g. no container metrics), this may break existing users. + * There is work going on to collect per-container GPU metrics in cadvisor, which is not part of CRI today. => Derek & Dawn: We may be able to get this from device plugin. => This is not what Accelerator team tackling now, may take more time. + * Some concern about the skew between cadvisor metrics and container runtime metrics. However, we already have skew today, metrics of different containers are collected separately and periodically. + * In the long run, kubelet will make decision based on node/pod level metrics. Today's pod level metrics is derived from container metrics, but may not be the case in the future. + * HPA, VPA need container metrics. + * Q4 we'll work on cadvisor refactoring; Q1 2018 we could start conversation with sig-instruments to talk about separate monitoring pipeline and standalone cadvisor. +* Shared PID: [Are breaking changes for CRI v1alpha1 encouraged?](https://github.com/kubernetes/community/pull/1048#discussion_r141199512) + * Both cri-containerd and cri-o are fine with the CRI change to use enum for pid namespace (shared/host/private) + + +# Sep 26 Proposed Agenda {#sep-26-proposed-agenda} + + + +* Isolated PID namespaces in non-docker runtimes [verb] + * Can we make this the default for all runtimes implementing CRI? + * containerd: either one is fine + * cri-o: either one is fine, but prefer the isolated + * rktlet: ? + * frakti: not relavant anyway + * Windows container support: not relavant +* Containerd integration status update https://github.com/kubernetes-incubator/cri-containerd + * Cut v1.0.0-alpha.0 release this week. + * Missing container metrics (pending PR under review, merging this week). + * Image gc is not supported in containerd yet, will be added later. + * Containerd does provide functions to remove underlying storage of image, but the interface is relatively low level, which is not easy enough to use. + * Containerd is adding garbage collection support, which will be much easier to consume. + * CRI-Containerd image gc will build on top of containerd garbage collection. + * User guide on how to create cluster with containerd is available: https://github.com/kubernetes-incubator/cri-containerd/blob/master/contrib/ansible/getting-started.md + * A running sockshop demo: http://104.197.75.252:30001 + * Cluster e2e tests will be added soon + * 1.9 plan: bug fixes, and set up CI tests. focus on: tools / tests / util +* Windows Container Support - update from peterhornyack@google.com + * [SIG-Windows current status](https://github.com/apprenda/sig-windows), [roadmap](https://docs.google.com/document/d/1LWi9-NZslZM5lTzMYoAKWXPAJwte5Ow3ySqkrGdHoLg/edit#heading=h.fz2smrec3l4) and [meeting notes](https://docs.google.com/document/d/1Tjxzjjuy4SQsFSUVXZbvqVb64hjNAG5CQX8bK7Yda9w/edit) + * Targeting "beta" for 1.9 release + * Windows Server "17.09" release (coming early October) will significantly improve native container networking - see [k8s blog post](http://blog.kubernetes.io/2017/09/windows-networking-at-parity-with-linux.html) + * Cloudbase (OVN/OVS) and Tigera are concurrently working on their own CNI networking plugins for Windows + * Recent PRs: + * [Windows CNI](https://github.com/kubernetes/kubernetes/pull/51063) - active + * [Windows kernel-mode proxying](https://github.com/kubernetes/kubernetes/pull/51064) - merged + * [Windows container stats via kubelet](https://github.com/kubernetes/kubernetes/pull/50396) - merged + * [Plan](https://github.com/kubernetes/kubernetes/pull/50396#issuecomment-324728727) for Windows metrics going forward + * [CRI stats in docker shim](https://github.com/kubernetes/kubernetes/pull/51152) - nearly merged + * Other current tasks + * Validating functionality: storage + volumes, secrets, and metrics + * kubeadm support for bringing up clusters with Windows nodes + * Once this is working, automated testing will follow + * Cluster support + * master on linux node, only windows worker nodes join the cluster. + * Questions / comments? Ask on #sig-windows on Slack + + +# Sep 19 {#sep-19} + + + +* rktlet - https://github.com/kubernetes-incubator/rktlet + * Some slides: \ +https://docs.google.com/presentation/d/1SoBtxvs2kSs7aad2GByafov8AnKJZ6Z0vB5gxVpuv1c/edit#slide=id.p + * Started to work on rktlet again since a couple of weeks ago (Kinvolk with Blablacar) + * Demo with Weave Socks with Kubernetes & rktlet (CRI), by Iago (@iaguis) + * TODOs: + * start integration test (conformance, e + * 2e tests) + * log/attach etc. + * In the 1.9, we want to delete the rkt package in the main Kubernetes repo. Rktnetes user should switch to rktlet. Rktlet is still WIP, so they are not sure whether they could remove the in-tree rkt package. + * What version of Kubernetes does rktnetes support? We don't have test result for it for a while. - Probably 1.7, not sure. + * + * Demo.Use kube-spawn to bring up a cluster: https://github.com/kinvolk/kube-spawn + * A demo of sock-shop: https://github.com/microservices-demo/microservices-demo + * What is the particular feature we really want from rkt integration? + * Major advantage. Daemon restart doesn't restart containers. (Docker live-restore, cri-o and cri-containerd all support this) + * Different image formats + * ACI image format + * It also supports docker image format (thus support OCI image format). Conversion is still required. +* Other topics? + + +# Sep 12 Proposed Agenda {#sep-12-proposed-agenda} + + + +* [Debug Containers Proposal](https://github.com/kubernetes/community/pull/649): Updates and process to merge [verb] +* [Shared PID Proposal](https://github.com/kubernetes/community/pull/1048) for 1.9 [verb] + * Full support for isolated namespaces or migration to shared? + * per-pod shared namespace in 1.9 + * Support shared pid namespace in a long run (v2 API)? or support both modes? + * Agreed: CRI changes to support both modes. +* Triage for 1.8 feature issues + * [Extension to support new compute resources](https://github.com/kubernetes/features/issues/368) + * Related: [Document extended resources and OIR deprecation](https://github.com/kubernetes/kubernetes.github.io/pull/5399) + * Docs PR merged + * [CRI validation test suite](https://github.com/kubernetes/features/issues/292) + * [Containerd integration](https://github.com/kubernetes/features/issues/286) + * updated with status + * [Dynamic Kubelet Configuration](https://github.com/kubernetes/features/issues/281) + * [Containerized Mounts](https://github.com/kubernetes/features/issues/278) + * [CPU Manager](https://github.com/kubernetes/features/issues/375) + * [Static CPU manager policy should release allocated CPUs when container enters "completed" phase.](https://github.com/kubernetes/kubernetes/issues/52351) (BUG) + * [Node e2e tests](https://github.com/kubernetes/kubernetes/pull/51041) + * Docs are merged + * [Add support for pre-allocated hugepages](https://github.com/kubernetes/features/issues/275) + * Doc PR opened https://github.com/kubernetes/kubernetes.github.io/pull/5419 + * Feature issue updated with link + * [Support for Hardware Accelerators](https://github.com/kubernetes/features/issues/192) + * [Further differentiate performance characteristics associated with pod level QoS](https://github.com/kubernetes/features/issues/276) + * this is duplicate with the CPU manager feature issue (but predates) + * will close the QoS feature then [Caleb] +* [1.8 release notes draft](https://github.com/kubernetes/features/blob/master/release-1.8/release_notes_draft.md) + * Please update and sign off on release notes for node components + * Please update "major themes" section by SIG with what the development goals were for the 1.8 release [Derek to do] +* 1.9 reliability release planning + * +* [#50865](https://github.com/kubernetes/kubernetes/issues/50865): Rebuilding pause containers [verb] + + +# Sept 5 {#sept-5} + + +# Aug 29 {#aug-29} + + + +* Shared PID namespace discussion (part 2) [derek, mrunal] + * Appealing reason: not all containers within a pod can share pid namespace, due to security concern. + * Breaking the backward compitility + * Considering introducing v2 Pod API to make shared pid namespace by default. + * With the latest inputs, we decided 1) disable the feature by default in 1.8, 2) discussed pod level api in 1.9. +* cAdvisor native support for cri-o [derek, mrunal] + * any objections? + * how to plugin w/ new refactor (cadvisor vs cri) + * Try to rollout cri-o, and may carry patch for 1.8 to make it work. + * Even container stats come from CRI, but they still run cadvisor to monitor other stuff, for example, filesystem stats. They don't want to run 2 instances of cadvisor, one in kubelet and another in cri-o. + * CRI is introduced to make kubelet runtime agnostic. This breaks the initial goal in some way. + * Based on data from their environment, cgroups parsing is expensive, @derek doesn't want 2 agents parsing cgroups. + * Containerd provides the container metrics itself, so it doesn't have this problem. However, we may still need to take care of the overhead. Containerd collects stats on-demand, it's cri-containerd/kubelet's responsibility to control the polling period and whether to cache the stats or not. + * Cadvisor has introduced many issues each release whenever it's updated, because it's not well maintained in current stage. So we discussed that not to add more complexity into cadvisor. + * Can we make cadvisor metrics collecting logic configurable, so that we could collect different runtime stats based on the passed-in configuration, instead of changing cadvisor code every time. + * Talk about the short term after the sig-node meeting. Reopen the topic about the container metrics through CRI. +* PSA: [Open issues in the 1.8 milestone](https://github.com/kubernetes/kubernetes/issues?utf8=%E2%9C%93&q=is%3Aissue%20is%3Aopen%20milestone%3Av1.8%20label%3Asig%2Fnode) ~ need resolution or removal from the milestone or else they are considered release-blocking +* PSA: [Open feature issues in the 1.8 milestone](https://github.com/kubernetes/features/issues?utf8=%E2%9C%93&q=is%3Aissue%20is%3Aopen%20label%3Asig%2Fnode%20milestone%3A1.8) - need update +* PSA: Ensure [release notes for Node](https://github.com/kubernetes/features/blob/master/release-1.8/release_notes_draft.md#node-components) are correct + + +# Aug 22 Proposed Agenda {#aug-22-proposed-agenda} + + + +* [#48937](https://github.com/kubernetes/kubernetes/issues/48937): Shared PID namespace with docker monoliths [verb] + * gluster/gluster-centos is a container image that runs systemd so that it can run ntpd, crond, gssproxy, glusterd & sshd inside a single container + * When invoked as /usr/sbin/init, systemd refuses to run if pid != 1 + * The image runs with the following config: \ +` env: \ + - name: SYSTEMD_IGNORE_CHROOT \ + value: "1" \ + command: \ + - /usr/lib/systemd/systemd \ + - --system` + * Should shared pid be enabled for docker in 1.8? + * Do we still agree that in the long term we should try to push forward inner pod shared pid namespace? - Could be default, but should be able to overwrite. + * Overwrite could be done at node or pod level. Should this be node level configuration, or part of pod spec? Pod spec overwrite means API change and long term support, do we what to do that? + * **Kubernetes 1.8 decision:** Default shared pid namespace for Docker 1.13 with node level configurable. For other Docker versions, shared pid namespace will always be disabled. + * **Revisit this in Kubernetes 1.9:** discuss more granulary configuration. + * Other runtime should also share pid namespace. Do we have test coverage for this? We do have a node e2e test for it, but it's docker specific for now because of the docker version check. + * It also implies that the non-docker runtime must support isolated namespaces +* https://github.com/kubernetes/kubernetes/pull/50859 [derekwaynecarr] + * Alpha support for pre-allocated hugepages + * A new feature flag will be introduced to enable the hugepage support; + * Cadvisor will support huge page; + * Could be consumed via EmptyDir. +* https://github.com/kubernetes/kubernetes/pull/49186 [sjenning] + * CPU manager + * We may not want to have more than 2 new features from resource management each release, we are now lack of review bandwidth. + * We could have multi-stage review, first round for technical review, and production review etc. + * We need to grow the reviewer pool in 1.9 for /pkg/kubelet based on updated contribution in community +* https://github.com/kubernetes/kubernetes/pull/50396 [davidporter] + * Windows Container Stats + * Container stats should go through CRI, not done yet. + * Q: How efficient is the newly introduced windows stats? Cadvisor does per-10-second poll. - A: Not sure, just call docker stats API to get stats. + * Node level stats is windows specific, use windows perfcounters now. + * Q: Should the perfcounters stuff in another repo? Because there are only 2 files, does it make sense to move it to another repo? - A: It's fine to leave the node level metrics implementation in kubelet as temporary solution. In the future, we may extract core cadvisor out as a library in Kubelet, and run cadvisor as a standalone daemon. + * Q: Do we have a document or any test result about what features are support for K8s on windows? - A: E2E test is still a TODO now. + * Q: Beta in September is not very comfortable to us, we need better document and test. We once deleted the previous windows implementation by mistake, and no test caught it. - A: They don't have e2e test now, and they are working on that. They do have several users now. + * Sig-windows wants to be first citizen, and if there is any feature introduced affect windows, they want to be in the discussion. + * Derek: don't think that Windows can totally hide behind container runtime, there are many other kubelet level features which need corresponding windows support: EmptyDir, Resource Management etc. + * Q: Who's reviewing the windows PR? - A: Currently, mainly Dawn and Yuju. However, code freeze is coming, we may not have enough review bandwidth this release, and there should be a corresponding feature in feature repo. + * **Current decision:** 1) We could add a flag to disable the monitoring feature which causes kubelet crash loop on windows; 2) We are fine with leaving the stats code in Kubelet for now; 3) They need to rewrite the PR based on the feedback, and properly plan and design Kubelet windows support. + * We may not have time to review the windows container stats PR this release, we may have to do that after code freeze. +* https://github.com/kubernetes/kubernetes/pull/50984 [tstclair-offline] + * PSA - Checkpoint POC available for feeback. +* Pod checkpointing [proposal](https://docs.google.com/a/google.com/document/d/1qmK0Iq4fqxnd8COBFZHpip27fT-qSPkOgy1x2QqjYaQ/edit?usp=sharing) +* kubelet pod mount propagation + * https://github.com/kubernetes/kubernetes/pull/46444 + * https://github.com/kubernetes/community/pull/659 + * https://github.com/kubernetes/community/pull/589 + * + + +# Aug 15 {#aug-15} + + + +* Resource Management Workgroup updates + * Crossing several sigs: node, scheduling, etc. + * 1.8 releases, 3-4 projects for better support high performance workloads + * static cpu pining: The guaranteed pod will be assigned the exclusive cpus. Will be alpha feature in 1.8 release. More enhancement, such as dynamic in the future + * device plugin: a separate daemonset to handle device discovery, initiation, and allocation and destroy. Ready for alpha release. + * hugepage support: For stateful set. Alpha feature in plan. +* Mount namespace propagation reviews needed: + * https://github.com/kubernetes/community/pull/659 + * https://github.com/kubernetes/kubernetes/pull/46444 +* Kubelet checkpoint + * Two approaches proposed, need to decide which one should be the final one. + * Scopes for the feature? - Secrets is excluded from the scope for now. +* PSA: Release notes [drafts due](https://groups.google.com/d/msg/kubernetes-dev/sEP1YRWBnEk/tYWnDAKfBwAJ) + + +# Aug 8 Proposed Agenda {#aug-8-proposed-agenda} + + + +* [Kubernetes preemption / eviction priority](https://github.com/dashpole/community/blob/2f68945935155b3071390986aa592cd49ad8e0f3/contributors/design-proposals/priority-eviction.md) + * Preemption vs. Eviction: Preemption is invoked by scheduler, eviction is invoked by Kubelet upon the resource starvation situation. + * Preemption will preempt pod gracefully. + * Should we define SLO for this? + * Q: E.g. guaranteed job could still be preempted, should we have an SLOs for this, how rarely should this happen? - It's cluster admin's responsibility to properly configure the system to meet their SLOs. + * Q: What about SLIs? - Yes. + * Q: Will kube-proxy be preempted? + * By default, all pods have the same priority, so there's no difference from before. However, if new pods are added with higher priority, it's possible to preempt kube-proxy. + * We also have 2 default priority classes `node-critical` and `cluster-critical`. + * Roadmap: + * Eviction will follow the new priority schema in 1.8. + * Try to get preemption in in 1.8. + * Priority in resource quota in 1.9. +* Kubelet checkpoint proposal + * https://docs.google.com/document/d/1hhrCa_nv0Sg4O_zJYOnelE8a5ClieyewEsQM6c7-5-o/edit?usp=sharing + * https://github.com/kubernetes/kubernetes/issues/49236 +* Device plugin proposal updates + * https://github.com/kubernetes/community/pull/695 + * Q: Authentication to kubelet? How do we know whether the device plugin could talk with kubelet? + * Communication between device plugin and kubelet is through a host unix socket, and only privileged pod could create host unix socket. + * The alpha version will rely on this assumption, and it's cluster admin's responsibility to only grant privileged permission to trusted device plugin. + * Q: Should we create a separate repo for the device plugin? + * It's hard to track all kinds of repositories during release process. + + +# Aug 1 {#aug-1} + + + +* Containerd-CRI status updates + * [cri-containerd Q3 plan](https://docs.google.com/document/d/1tx9NUm6UsWBWjB98rm6QGnwDbl7IX3f9PAtk-famrHM/edit#) + * `Current status:` + * `Features: cri-containerd` v0.1.0 supports all basic functionalities including: + * Sandbox/container lifecycle management; + * Image management; + * Sandbox networking; + * Container logging; + * Run command synchronously in container; + * … + * Test: + * CRI validation test: 30/36 (pre-submit test). + * Node conformance test: 116/122. + * In-Use Containerd Version: [v0.2.3-1098-g8ed1e24](https://github.com/containerd/containerd/commit/8ed1e24ae925b5c6d8195858ee89dddb0507d65f) + * Newest Containerd Version: v1.0.0-alpha2 (weekly alpha release) + * Q&A: + * Containerd release schedule after 1.0: Probably monthly at first, and then focus on bug fixes and security patches. + * Package solution: Only release binaries for convenience of testing. No plan to package for now. + * Swarm containerd integration: Working version merged. + * Docker containerd integration plan: Happen in a moby branch now. + * API change review process after 1.0: Deprecation policy, support policy etc. Don't have one, and haven't thought about this yet. Need a proposal, we can submit one and discuss. +* PSA: Feature freeze is today ~ [Current feature list](https://github.com/kubernetes/features/issues?q=is%3Aissue+is%3Aopen+label%3Asig%2Fnode) + + +# July 25 {#july-25} + + + +* summarize mount propagation conclusions (@tallclair) + * Alpha feature in 1.8: non-priority container with hostpath will get slave mode; while priority container with hostpath will be opt-in with shared-propagation mode, thus it is visible to the outside. + * There is no security concerns. + * API changes: annotation for alpha. + * Using gRPC over the socket, similar to CRI model. + * Owner: Jan +* discuss problems surrounding pod mounts supported by userspace processes (i.e. fuse) (@sjenning) + * tl;dr, Some filesystems require user processes to maintain mount points. These processes are children of the kubelet and are started in the kubelet.service cgroup when running in systemd. Restarting kubelet.service kills them. Setting KillMode=process prevents that but lets other node children leak (i.e. find/du/journalctl exec'ed by cadvisor, for example) + * https://github.com/kubernetes/kubernetes/pull/23491 + * https://github.com/kubernetes/kubernetes/issues/34965 + * Owner: Seth +* Present: system spec and image validation (@ygg) +* Sig-windows updates (@Shaya) + * The immediate concerns are getting it into a usable state, which includes finishing the network story (using ovn/ovs), volumes and metrics. + * Suggested them to go with CRI approach, but not in 1.8 timeline. +* Initial alpha proposed checkpointing https://github.com/kubernetes/kubernetes/issues/49236 + + +# July 18 {#july-18} + + + +* set up mount propagation meeting + * jsafrane@redhat.com can't be on sig-node today + * offers: + * 10am PDT this Wednesday or Thursday + * 9am PDT this Friday + * 9am PDT next Monday (24th) + + +# July 11 {#july-11} + + + +* Q3 planning + * https://docs.google.com/document/d/1Sq3Cr0_udLtksBaogQWhAhwjoG5jj8TfHzftw4Kzg0I/edit +* mount propagation (@jsafrane) + * https://github.com/kubernetes/kubernetes/pull/46444 + * https://docs.google.com/document/d/1XXKdvFKnbV8MWNchjuLdLLCEs0KdWelpo3tpS6Wr18M +* Debug Container API compromise (@verb) + * [kubernetes/community#649](https://github.com/kubernetes/community/pull/649) + * [Google docs version of Debug Containers proposal](https://docs.google.com/document/d/1tds_D3aoUtMjlKpuVdr88oDrRzUS4OuShlen53RSAo8/edit?ts=595cb285#heading=h.vf1xpupyfm12) + + +# July 4 {#july-4} + +No meeting + + +# Jun 27 {#jun-27} + + + +* Rktlet status (@alban) + * https://github.com/kubernetes-incubator/rktlet/issues + * Need rktlet integration and test status update. + * Dawn: Why does @alban care about the underlying runtime? + * They are using existing rktnetes integration, so they want to make sure rkt keep working with Kubernetes, rktlet seems to be the way to go. + * They are using ACI image. (Dawn: Why ACI instead of Docker image?) Not quite clear, one is that they want dependency between images. They use [dgr](https://github.com/blablacar/dgr) to build container images. +* 1.7 Docker version information? (@calebamiles) + * Support 1.12.? - Yeah, already supported in K8s 1.6. + * Feature wanted from new docker version (>= docker 1.12): + * Overlay2: Need validation. Cadvisor already supports overlay2 now. + * Live restore: Help node reliability. Need validation. + * Shared pid namespace: Shared pid namespace bug in docker 1.12 is fixed in docker 1.13. + * Docker version supported in K8s 1.7 is 1.10, 1.11 and 1.12. We plan to validate docker 1.13 in K8s 1.8 and deprecate docker 1.10. + * Where is the docker version related information now? - Mostly tracked by different docker related Kubernetes issues. We always include the docker version supported in release notes, and link to known issues. + * [1.12](https://github.com/kubernetes/kubernetes/issues/28698) + * [1.13](https://github.com/kubernetes/kubernetes/issues/42926) + * @Dawn: We don't enforce docker version or container runtime for vendors. We've provided the portability (CRI), and also portable validation test ([CRI validation test](https://github.com/kubernetes-incubator/cri-tools)). Each vendor should validate and choose container runtime and version based on their own business/technical interests. + * @timothysc: An integration testbed is required for this issue. + * @Dawn: Node e2e and conformance test are also built for vendors and users to validate their node setup including the docker version. + * @Lantao: Kubeadm also enforce docker version in the pre-flight check. + * We already run many tests for docker validation, but we do need to organize and expose the information better. + * Node e2e test: https://k8s-testgrid.appspot.com/sig-node#kubelet + * Docker validation test: https://k8s-testgrid.appspot.com/google-docker (e2e-cos suite is not working properly because of some legacy reason, our GCI team is working on that) + * @michaelcrosby: Containerd will run integration test to avoid PRs breaking swarm, Kubernetes etc. + * @Dawn: Both cluster e2e and node e2e are too heavy for containerd to carry. That's why we build the [CRI validation test](https://github.com/kubernetes-incubator/cri-tools), which test against CRI directly which is much more lightweight. +* mount propagation (@jsafrane) + * https://github.com/kubernetes/kubernetes/pull/46444 + * https://docs.google.com/document/d/1XXKdvFKnbV8MWNcchjuLdLLCEs0KdWelpo3tpS6Wr18M + * Schedule design meeting, 9:00 am PDT + + +# Jun 20 {#jun-20} + + +# Jun 13 {#jun-13} + + + +* 1.7 release updares + * DynamicKubeletConfig alpha was dropped from 1.7. But will merge it once the code freeze is lifted. +* CRI-O presentation and demo (Samuel Ortiz [samuel.ortiz@intel.com](mailto:samuel.ortiz@intel.com)/Mrunal Patel/Antonio Murdaca) + * Slides https://docs.google.com/presentation/d/1lqH3VNhYUmp0WbBZ6iNzbNPPkHSPVHc2XH74qUjyNDI/edit?usp=sharing + * Package solution: working on it for all os distr after 1.0.0 + * Plan to cut 1.0.0 alpha: week of June 13 +* kubelet event spam in large clusters or pods in constant backoff + * budget per pod per event type (Warning vs Normal) per interval + * budget per namespace + * do we really need to report at such granular levels + * https://github.com/kubernetes/kubernetes/pull/47367#issuecomment-308002427 +* D + + +# Jun 6 {#jun-6} + + + +* Virtlet demo (cloud-init support, running Kubernetes in Kubernetes, etc.) \ +(Ivan Shvedunov -- [ishvedunov@mirantis.com](mailto:ishvedunov@mirantis.com)) + * The primary difference between virlet and frakti: can run any vm images: windows, any legacy applications which cannot be containerized + * Integrated with Kubernetes through CRI + * Demo with VM with a stateful set and service + * Using cloud-init to mount Kubernetes required volumes + * Potential use cases: any applications, including windows apps, hybrid environmen, malware detection(?) + * Can orchestrate vms the same way as containers. + * Using libvirt, and introduced a cri-proxy which can decide if launching a container-based pod or virt-based pod. +* Discussion about the entitlements (not enough time on last meeting) by Nassim + * link to the doc? + * Image publisher would define the security profiles + * github/docker/entitlement + * contacts: + * timstclair - node security, proposal process + * liggitt - security / sig-auth TL + * yujuhong - CRI / kubelet / runtime +* CRI-O presentation and demo (Samuel Ortiz [samuel.ortiz@intel.com](mailto:samuel.ortiz@intel.com)) + * CRI-O overall progress (Antonio Murdaca [runcom@redhat.com](mailto:runcom@redhat.com)/Mrunal Patel [mpatel@redhat.com](mailto:mpatel@redhat.com)) + * CRI-O cluster demo (Antonio Murdaca [runcom@redhat.com](mailto:runcom@redhat.com)) + * pass full node-e2e and e2e tests + * support docker v2schema1 images out of the box + * can run Clear Containers +* Quick status update on SIG Node features for 1.7 (Caleb Miles caleb.miles@coreos.com) + * [scratch summary](https://gist.github.com/calebamiles/adfc84814d503501bcebd7cd2551d10c) + * [CRI validation test suite](https://github.com/kubernetes/features/issues/292) + * Done with alpha, and had a demo. + * The engineers switched to help containerd integration + * [Enhance the Container Runtime Interface](https://github.com/kubernetes/features/issues/290) + * API changes for monitoring stats, but not done with the implementation. + * [Containerd CRI Integration](https://github.com/kubernetes/features/issues/286) + * Achieved the basic goals for this item + * Haven't switched to the containerd new API, and new containerd client and binary yet + * Containerd plan to introduced v2 scheme1 support + * Already cut a release for today's integration + * [Dynamic Kubelet Configuration](https://github.com/kubernetes/features/issues/281) + * Filed the exception request + * Wrote the test plan and started the manual tests + +# May 30 + + + +* LinuxKit: build secure and minimal host OSes for Kubernetes - Riyaz Faizullabhoy @riyazdf + * Overview of project security goals - [slides here](https://drive.google.com/file/d/0BzuxnKD0WQ4wbGlpdXk4QUdRcWs/view?usp=sharing) + * Toolkit for building secure, portable, and lean operating systems for containers + * Immutable infrastructure + * Leverages buildchain and userspace tools from alpine linux + * Securely configure modern kernel (Kernel config) && collaborate with KSPP + * Incubating projects: Landlock LSM, Type-safe system daemons && mirageSDK, okernel: Separate kernel into outer-kernel and inner-kernel + * Demo of building and running Kubernetes on customized Linuxkit based OS + * Next steps for making it easy to use Kubernetes + LinuxKit + * list of Kubernetes dependencies document? + * other security features that k8s has been thinking about in host OS? +* Security profile configuration: Entitlements on Moby and Kubernetes? + * A user-friendly high-level interface for security configuration based on different categories (network, host devices, resources..) and levels of usage for each one + * Quick overview of the proposal by @nass https://github.com/moby/moby/issues/32801 + * How can this fit in Kubernetes ? What are the requirements ? + * https://github.com/docker/libentitlement + * Slides https://drive.google.com/file/d/0B2C3Ji-3avH8bUI1eF9zS2d4aGM/view?usp=sharing +* Minimizing tail latency for applications through cache partitioning + * https://docs.google.com/document/d/1M843iO76DiPCGkKU3NNsiuwmL-aYC8HK_5973hfUaH4/edit + + +# May 23 Agenda {#may-23-agenda} + + + +* Running unikernels with Kubernetes. We've been experimenting with Virtlet to deploy light OSv-based unikernels; each of these unikernels is comprised of the library OS, nodejs runtime and a single service + * The demo briefly explains the rationale, unikernel concepts, caveats, etc. + * [Virtlet repository](https://github.com/Mirantis/virtlet) + * [Demo app repo](https://github.com/mikelangelo-project/osv-microservice-demo) + * [slides](https://docs.google.com/presentation/d/1hMh-kb-zsKpbAtP87Vv4cGODGIwXdkOsvZFt33TWtwE/edit?usp=sharing) +* Containerized mount utilities [jsafrane] + * GCI, CoreOS and AtomicHost don't want to distribute gluster, ceph and nfs utilities on the host, we need a way how to distribute them in a container. @jsafrane (sig-storage) will present https://github.com/kubernetes/community/pull/589 that includes some kubelet changes. +* PTAL Alpha Dynamic Configuration PR: https://github.com/kubernetes/kubernetes/pull/46254 + + +# May 16 Proposed Agenda {#may-16-proposed-agenda} + + + +* Frakti demo + * [repository](https://github.com/kubernetes/frakti) + * demo + * Support mixed container runtime: [hyper](http://hypercontainer.io/), docker (privileged container). + * Frakti deploy with kubeadm for alpha support. + * Support OCI runtime spec, but different with Docker, e.g. only supports hard resource limit because of Frakti is using VM. + * Questions: + * Run runtime daemon (docker/containerd) inside the VM? - No, run container directly inside the VM with namespace cgroup etc. No security features like seccomp, apparmor inside the VM. + * [slides](https://drive.google.com/file/d/0B6uGv-NC7DxDSmREaUhEdXl4NGM/view) + * Questions: + * Why does hyper support high density, is there security problem here? + * No security problem; + * Only share the readonly part (Which part?) + * Does hyper use VM as a pod or container? - As pod. + * How does hyper run multiple containers in the VM? - Simple init system to manage containers inside the VM, called <code><em>[hyperstart](https://github.com/hyperhq/hyperstart)</em></code>. + * Will hyperstart monitor and collect the container metrics? - Yeah. + * Secure Container Runtime + * What kind of api? What are the semantic requirements for a "secure pod"? + * Two use cases: + * Public cloud, multi-tenancy. + * Runtime selectivity (sounds like less important) + * Is the current pod concept, security context, and security policy model enough? Or should we have a new high level overall design to support secure pod? + * An insecure pod running on the host may make the other secure pods on the host "insecure". We may want to make sure running a "secure pod" in a "secure environment". + * Definition: hard multi-tenancy vs. soft multi-tenancy? + * What is the user requirement? Currently the security features in Kubernetes are mostly added for engineering reason, or derived from Docker. Can we abstract and categorize these security features based on user requirement? + * Action Items: + * Proposal: what is the definition, what features are required, what is the security model. +* Lessons learned from https://github.com/kubernetes/kubernetes/pull/45747 + * we upgrade clusters and were stuck with 1000s of pods not deleting + * it appears that future kubelets must do patch rather than update status + + +# May 9 Proposed Agenda {#may-9-proposed-agenda} + + + +* CRI util / test demo + * [repository](https://github.com/kubernetes-incubator/cri-tools) + * [slides](https://docs.google.com/presentation/d/1VM4Tx85ffYZl_VOnBuxN1VQ-RhiM8PSg0gt_D1mP2-0/edit) +* Resource Management F2F summit updates + * meeting notes: https://docs.google.com/document/d/13_nk75eItkpbgZOt62In3jj0YuPbGPC_NnvSCHpgvUM/edit + * Resource class to enable support for resources require additional metadata to management, for example: GPU + * Extensible device support + * CPU enhancement including cpuset handling + * Hugepage will be supported as the first class resource + * NUMA support + * Performance benchmark + * Only agreed upon the list, but not prioritized yet. +* Pod Troubleshooting alpha features, merge [proposal](https://github.com/kubernetes/kubernetes/pull/35584) + * Any volunteers to review PRs? +* SIG PM would like demos for 1.7 features + * Possible demos + * Dynamic Kublet config + * CRI validation test suite + * Containerd CRI implementation + * Enhanced GPU support + * Pod troubleshooting + * [Schedule](https://docs.google.com/document/d/1YqIpyjz4mV1jjvzhLx9JYy8LAduedzaoBMjpUKGUJQo/edit#heading=h.micig3b7ro3z) +* Should [features/#192](https://github.com/kubernetes/features/issues/192) be moved into the 1.7 milestone + * answer no + + +# May 2 {#may-2} + + + +* Defer container: https://github.com/kubernetes/community/pull/483 + * similar to prestophook, but try to solve the issues + * For stateful application, so that they can terminated in sequence. + * Can we improve upon the existing prestophook? + * More challenges at termination stage: +* Non-CRI integration status: + * Plan to clean up non-cri integration with Kubelet + * Decide to remove docker management via old API + * Keep rkt as is until rkt integration move to CRI + + +# April 25 {#april-25} + + + +* Defer container: https://github.com/kubernetes/community/pull/483 + * Postpone to next week +* Demo: [Pod Troubleshooting](https://github.com/kubernetes/kubernetes/pull/35584) (verb@) + * Improve troubleshooting support in K8s + * Debug containers regardless of the debugging tools available in a given container image, and regardless of the state of the container, without direct access to the node. + * Adds a `kc debug` command that can run a debug container with debugging tools in the Pod namespace (shared PID namespace). + * Does not add a new container to the Pod spec, instead this creates a streaming API call similar to exec, and the Kubelet executes this container in the Pod - the state for debug containers is maintained entirely within the Kubelet and the runtime. + * Permissions - do debug containers have more permissions than the Pod's other containers? + * You can access other containers' filesystems via /proc/{PID}/root + * How does this behave with crash looping Pods? Does this pause some of the Kubelet's restart behavior? Turns out we can create a container in the Pod namespace and the Kubelet won't touch it when syncing the Pod. + * State of the debug pod is maintained within CRI just like any other container in the Pod. Debug pod has an additional label. + * **Pod status will include these debug containers, even though Pod spec will not.** + * Would be useful to report the command that was run in the Pod status as well + * Could be useful to be able to specify a default debug container or set of debug actions for a Pod, that would be relevant to that specific Pod + * Still need to think about the disk side of the debug container + * If we want to make this an alpha feature for 1.7, file feature request in the features repo. + + +# April 18 {#april-18} + + + +* containerd F2F meeting + * Meeting notes: https://docs.google.com/a/google.com/document/d/1lsz8pd4aIcdAFrsxi0C1vB2ceRCyq36bESoncEYullY/edit?usp=sharing (need permission to access, feel free to request access) + * cri-containerd repo: https://github.com/kubernetes-incubator/cri-containerd + + +# April 11 {#april-11} + + + +* containerd update + * runtime-side: almost feature complete, work on checkpoint and restore. + * image-side: centralized storage. + * When does Docker start to rebase on new containerd. + * Rebase on execution part first when the execution part reaches feature complete; + * Rebase on image part will be later. Backward compatibility may need some effort. Snapshotter ←→ Graph Driver + + +# April 4 {#april-4} + + + +* 1.7 roadmap + * https://docs.google.com/a/google.com/spreadsheets/d/1-hADEbGEUrW04QP4bVk7xf1CWuqbFU6ulMAH1jqZQrU/edit?usp=sharing + * Any opensource contribution is welcome. + * Needs to set the expectation and milestone for debug pod. +* Containerd and CRI integration updates + * POC: https://github.com/kubernetes/kubernetes/pull/43655 + * Proposal WIP, plan to send out this week, will create a new project in kubernetes-incubator + * Alpha release plan: Basic container and image lifecycle management [P0]; container streaming/logging [P1]; metrics [P2]. +* Open floor for additional dynamic config questions at end of meeting + + +# Mar 28 {#mar-28} + + + +* Cancelled + + +# Mar 21 {#mar-21} + +**Discuss dynamic kubelet configuration proposal (mtaufen):** https://github.com/kubernetes/kubernetes/pull/29459 + +I'd like the purpose of the meeting today to be to communicate the status of the proposal, communicate some of the pros/cons of ideas that have been floating around, get feedback from the community on the proposed solution, and come out with action items for improvements to the proposal. I would like the proposal to be approved and merged by the end of this quarter, and I would like the implementation to be complete for 1.7. + +There are a number of items in this proposal, and any of them are fair game for discussion today. That said, there are a couple I want to make sure we talk about. So I'd like to discuss each of those first, one at a time. I'll provide a brief intro to each topic, and then open up for questions on that topic. Then once we're through those we can open up for arbitrary questions. + +**Key Topics:** + +**cluster-level representation of the configuration** + +Are there any sensitive fields that should be in secrets instead? + +Config map could still tell you where to find these secrets, etc. + +We have way too many manually set flags right now, we should be looking at peeling things out of the configurations. + +Configuration should be orthogonal to provisioning and node topologies. Software vs. hardware. + +given that we are getting rid of componentconfig and moving to each-component-exposes-its-own, we should have a policy for the "right" way to do this, and talk about how to make these per-component "API" types discoverable - there has been some discussion about discussing this more (how meta…), the likely correct home is sig-cluster-lifecycle + +**Kubelet config checkpointing and recovery mechanism** + +Note: config uptake is defined as a restart + +configmap use cases feel like a nodeclass - you _could_ build nodeclass config on top of the proposed model + +kubelet could post a minimal status very early before potential crashes - need to clarify precisely what this looks like + +how should we define kubelet health/bad config? Any perfect indicator (couldn't deserialize, failed validation, failed checksum, etc.) should be used. The only imperfect indicator at the kubelet level we've converged on thus far is that the kubelet crashloops before the configuration is out of it's probationary period. + +**MEETING ENDED HERE, TBC ~~NEXT WEEK~~ next week conflicts with KubeCon, please continue discussion on GitHub** + +**~~nodeConfig volume idea~~ - ended up pulling back from this after GitHub discussion** + +~~The biggest reason behind nodeConfig was to allow easy coordination of configuration rollouts between multiple components. In recent discussions, we've tended toward the idea that interdependence between component configurations which would require such coordination is an antipattern. This greatly lessens the need for nodeConfig.~~ + +~~There has been discussion around how to plumb configuration into the pods that run on the node. It recently became apparent that we may be conflating two needs:~~ + + + +* ~~getting knob values to things in daemonish Pods~~ +* ~~informing Pods about node-specific settings, e.g. the node CIDR~~ + +~~I had originally proposed this idea with the former intention, though it seems that others have interpreted it as being useful for the latter intention. The latter case still seems useful, but in my mind these are separate functions and shouldn't be served by the same mechanism. Thoughts?~~ + +**composable configuration** + +Thus far, we have been discussing config in terms of a specific object type associated with a specific key name in the ConfigMap. In my last meeting with Brian Grant about this feature, we briefly touched on the idea to, instead, have the ConfigMap contain an arbitrary set of keys, each associated with an arbitrary blob representing an arbitrary set of configuration objects. The Kubelet could slurp all of these blobs to compose a set of objects, and search for the objects that it needs in this set. This may make the configuration more composable and reduce the need to adhere to a specific schema in the ConfigMap. I'd like to see what everyone thinks of this idea. + +**Open to arbitrary questions** + + +# Mar 14 {#mar-14} + + + +* Logging integration: https://github.com/kubernetes/kubernetes/issues/42718 + * Log rotation: + * We may want to revisit the logging SLOs. In reality, dropping logs is unavoidable sometimes. Should we define log missing rate instead? + * Current copy-truncate loses logs. + * If log rotation is handled by the same daemon which writes the log, it is mostly fine. However, the problem is when log rotation and log writing is handled by different daemons, some coordination between them is needed. We may add api in CRI to signal runtime to reopen log file, although Docker won't react to the signal now. + * Log enrichment: + * Getting log metadata from apiserver may introduce extra overhead to apiserver. Is it possible to add corresponding kubelet api? - Same problem with standalone cadvisor. This is in sig-node's queue, just need prioritize. + * dawn@: Discuss with the api machinery team about the api library. +* Please continue to move non blocking issues out of the 1.6 milestone and into the 1.6.1, 1.7 or next-candidate milestones **please don't **simply remove a milestone from an issue. Please ensure all non release blocking issues are out of the 1.6 milestone by the end of today (14 March 2017) + * [flakes](https://github.com/kubernetes/kubernetes/issues?utf8=%E2%9C%93&q=is%3Aissue%20is%3Aopen%20label%3Akind%2Fflake%20milestone%3Av1.6%20label%3Asig%2Fnode) + * [all issues in milestone](https://github.com/kubernetes/kubernetes/issues?utf8=%E2%9C%93&q=is%3Aissue%20is%3Aopen%20milestone%3Av1.6) + + +# Mar 07 {#mar-07} + + + +* 1.6 release discussion: + * CRI logging with journald: PR was merged + * Is someone looking at [kublet serial GC failues](https://k8s-testgrid.appspot.com/google-node#kubelet-serial-gce-e2e)? + * Thanks for closing all those flakes :) + * GPU support: accepted + * Release notes + + +# Feb 28 {#feb-28} + + + +* containerd summit updates + * containerd summit videos: + * [containerd deep dive](https://www.youtube.com/watch?v=UUDDCetB7_A) + * [containerd and CRI](https://www.youtube.com/watch?v=cudJotS97zE) + * [driving containerd oprations with gRPC](https://www.youtube.com/watch?v=sG9hxz4-hIA) + * [containerd goverhttps://www.youtube.com/watch?v=cudJotS97zEnance and integration with other systems](https://www.youtube.com/watch?v=PGEXMuBeo1A) + * [Discuss sesson notes](https://github.com/docker/containerd/blob/master/reports/2017-02-24.md) + * Proposed deadline for 1.0 containerd: end of Q2 + * Containerd contains: runtime + image + * Containerd-shim manages the lifecycle of container. + * No network support in containerd + * Image distribution management is new, but by design it is very flexible: split image handling into 3 phases + * Align with CRI from the high level + * Not release any binary. Every vendor builds its own binary. + * Multi-tendency: create different namespaces. +* Please fill out feature repo checklists [Caleb] + * Container runtime interface + * By defautl enabled for 1.6 for 2 weeks + * Found several small issues, such as kubectl log+journald support + * The fixes are in or pending + * Discussions about requirements to deprecate support for journald: https://github.com/kubernetes/kubernetes/issues/42188 + * Non-CRI test coverage:https://k8s-testgrid.appspot.com/google-non-cri + * Evictions + * Delete pod objects only after all pod level resources are reclaimed + * TODO: + * Avoid evicting multiple pods unnecessarily - This is a bug in eviction manager which is causing frequent eviction node e2e test flakes. https://github.com/kubernetes/kubernetes/issues/31362 + * Preemption & eviction for critical pods handling on node + * All prs are merged required for 1.6 + * Alpha support for multiple GPUs + * PR #42116 was merged on 2/28 + * Node allocatable, QoS cgroups and pod level cgroups. + * PR to enforce the node allocatable was merged, but not enforced by default yet. Gated on pod level cgroups. + * Qos level update PRs are ready. Waiting to be merged. Delayed due to dependency on node allocatable. + * TODO: + * Merge QoS cgroup PRs + * Enable QoS and Node allocatable cgroups by default - set `--cgroups-per-qos=true` + * Enable Node Allocatable enforcement on pods - set `--enforce-node-allocatable=pods` + * Evict based on Node Allocatable (more of a bug fix) + * PR: https://github.com/kubernetes/kubernetes/pull/42204 + * Node problem detector + * Announce it as beta for OSS k8s + * Support journaltd + * extended to support arbitrary logs + * standalone NPD mode + * Enable it for GKE GCI nodes by default for + + +# Feb 21 {#feb-21} + + + +* Cancelled due to no meeting week + + +# Feb 14 {#feb-14} + + + +* Vish: node allocatable phase 2 for v1.6 + * https://github.com/kubernetes/community/pull/348 +* Euan: shared mount propagation + * https://github.com/kubernetes/community/pull/193 +* Position on Docker CVE + * https://github.com/kubernetes/kubernetes/issues/40061 + * Docker 1.12.6 is tested in node e2e against 1.6 w/o CRI. + * Plan to have docker 1.12.6 test included in 1.5 branch + * Not validate overlay2 graph driver + * Some tests at node-e2e +* Explicit Service Links [proposal](https://github.com/kubernetes/community/pull/176) + + +# Feb7 {#feb7} + + + +* [virtlet](https://github.com/Mirantis/virtlet) & CRI proxy demo (with CNI and out-of-process docker-shim) + * Uses CRI to run docker shim and virtlet to allow both docker images and virtlet images to be run. + * Started using daemonset + * Allows "plain" pods to communicate with VMs, VMs can access services + + +# Jan31 {#jan31} + + + +* k8s 1.6 - docker version requirement + * Derek put here since he cannot make 1/24, but I am fine with 1.12+ + * Redhat validates overlay. Thinking about switch to overlayfs from devicemapper later this year, so validation may not happen right away. + * node-e2e has docker 1.12 tests for both CRI and non-CRI implementations. + * deprecating docker 1.10.x? support policy? + * Manual triage the issue with docker 1.13.x. Who? +* pod level cgroup rollout plan + * See: https://github.com/kubernetes/community/pull/314 +* Containerd presentation by Patrick Chanezon & Stephen Day + * [slides](http://www.slideshare.net/chanezon/docker-containerd-kubernetes-sig-node) + * [containerd summit Feb 23](https://docs.google.com/forms/d/e/1FAIpQLSeYK9_DaFJvF8PtyykUzm3awV3e1xHwuonxbKvak9UYS8VnqQ/viewform?c=0&w=1) at Docker office in San Francisco + * [containerd livestream recap](https://blog.docker.com/2017/01/containerd-livestream-recap/) + * [containerd dev reports](https://github.com/docker/containerd/tree/master/reports) +* Update critical pods handling + * 1.4 & 1.5 changes: + * critical pods annotation can only apply to kube-system pods, and feature gate flag is added and default is off + * critical pods won't be evicted. + * 1.6 improvements: + * Introduce Node-level preemption +* Shared host volumes (again) + * [related proposal](https://github.com/kubernetes/community/pull/193) +* Explicit Service Links [proposal](https://github.com/kubernetes/community/pull/176) +* [Local storage proposal](https://github.com/kubernetes/community/pull/306) + * please look at high level PR and comment on use cases + + +# Jan24 {#jan24} + + + +* Shared PID Rollout Plan (@verb) + * [Proposed](https://github.com/kubernetes/community/pull/207) implementing by default in CRI + * Everyone ok with kubelet flag for rollback? + * Agreed. + * Proposal is approved. +* Pod Troubleshooting update (@verb) + * [Updated Proposal PR](https://github.com/kubernetes/kubernetes/pull/35584) with items discussed in Jan10 meeting + * Open Questions: pod cgroup, security policy, admission control +* k8s 1.6 - docker version requirement +* rktlet status: approaching the point where it can be merged back as rkt-shim and replace current rktnetes + * Ref: https://github.com/kubernetes/community/blob/master/contributors/design-proposals/kubelet-rkt-runtime.md#design + * Reasons: according to initial design proposal, circular dependency of grpc proto (in kubelet) -> rktlet -> kubelet, node-e2e testing setup and pr-blockers almost impossible to do out-of-repo + * Alternate solution to the originally accepted proposal of vendoring rktlet into kubelet due to above reasons + * Shoutout to Vish for being on point with his concerns in the original proposal (link missing) +* community/proposal: change default propagation for host volumes \ +https://github.com/kubernetes/community/pull/151 \ +https://github.com/kubernetes/community/pull/193 + * Vish, Dawn added as reviewers + * Euan: will add a summary comment for basically the current status/thoughts +* Benchmarking protobuf3 / grpc / without + * protobuf3 did reduce mem usage by a small amount + * Further results still pending + + +# Jan17 {#jan17} + + + +* CRI rollout plan review + * https://docs.google.com/document/d/1cvMhah42TvmjANu2rRCFD3JUT0eu83oXapc_6BqE5X8/edit?usp=sharing + * The newer, more succinct [rollout plan](https://docs.google.com/a/google.com/document/d/1b6MLZWUjKV7uJ8BxK-4ciFJl3IPYpsPPfiaHVKV4j-0/edit?usp=sharing) + * CRI Integration Performance Benchmark Result: https://docs.google.com/a/google.com/document/d/1srQe6i4XowcykJQCXUs5fFkNRADl_nR4ffDKEVXmRX8/edit?usp=sharing +* CRI: upgrade to proto3? + * https://github.com/kubernetes/kubernetes/issues/38854 +* Plan to revert recent changes for eviction/qos + * https://github.com/kubernetes/kubernetes/issues/40024 + * + + +# Jan10 {#jan10} + + + +* Update on shared-pid namespace for docker (@verb) + * Question from PR: Do we want the CRI to require a shared pid namespace for all run times? + * Dawn: Agreed. The only concern is docker 1.11 usage, docker being the only alpha impl right now + * Euan: we should just mandate docker 1.12 to make docker compliant somewhere in the timeline then + * Outcome: PR to make it mandatory in CRI api, update on the proposal the plan a bit +* Update on "pod troubleshooting" proposal (verb) + * Update to a running pod with a run-once container + * https://github.com/kubernetes/kubernetes/pull/35584 + * Timeline? + * Not part of the 1.6 roadmap, this is longer term + * This also isn't wholly sig-node; if there's a kubectl-only one, it's not really a node-thing as much (api-machinery? apps?) +* [virtlet](https://github.com/Mirantis/virtlet) demo +* "core metrics" proposal: https://github.com/kubernetes/community/pull/252 + * Per-pod usage and container are included + * Pod overhead: whether that's shown or how is an open question + * Derek: no overall aggregate usage (e.g. all BestEffort) + * Just aggregate / cross-reference with the podspec for that usecase + * This should be minimal/low-level + * Comment on the proposal! +* CRI rollout proposal + * https://docs.google.com/document/d/1cvMhah42TvmjANu2rRCFD3JUT0eu83oXapc_6BqE5X8/edit?usp=sharing + * We need to get proper feedback on this for 1.6 to make sure the timeline is sane and start the rollout +* Euan - Going to be less involved in sig-node (so long and thanks for all the fish!) + * I will finish up existing open issues / prs I have in flight + * Luca (@lucab) is the best point-of-contact for rktlet/CRI work + * Dan (@ethernetdan) will be getting more involved in sig-node as well (including helping with the 1.6 release and helping keep track of features we're involved in, etc). +* Caleb: PM-rep for a sig + * A new role responsible for tracking features for a given SIG, work + * https://docs.google.com/document/d/1ZElyRqNsGebvMpikEBhSuf2Q6WrEYHB2PgWIlGgwt78/edit# (perms to be updated by Aparna) + + +# Jan 03 {#jan-03} + + + +* [Q1, 2017 node planing](https://docs.google.com/spreadsheets/d/1-hADEbGEUrW04QP4bVk7xf1CWuqbFU6ulMAH1jqZQrU/edit?ts=585c5905#gid=1369358219&vpid=A1) +* diff --git a/sig-node/archive/meeting-notes-2018.md b/sig-node/archive/meeting-notes-2018.md new file mode 100644 index 00000000..4cae4e36 --- /dev/null +++ b/sig-node/archive/meeting-notes-2018.md @@ -0,0 +1,702 @@ +# SIG Node Meeting Notes + +# Future {#future} + + + +* Monitoring pipeline proposal +* [Virtual Kubelet (@rbitia)](https://docs.google.com/document/d/1MAn_HMZScni89hDwI4nQMk_SWTx9oi16PuOREhpVJJI/edit?usp=sharing) +* PR Review Request - [#63170](https://github.com/kubernetes/kubernetes/pull/63170) - @micahhausler +* Image name inconsistency in node status - [#64413](https://github.com/kubernetes/kubernetes/issues/64413) - @resouer +* [Jess Frazelle / Kent Rancourt]: Proposal for kubelet feature to freeze pods placed in a hypothetical "frozen" state by the replication controller in response to a scale-down event. Enable pods to be thawed by socket activation. + + +# Dec 18 + + + +* Graduate SupportPodPidsLimit to beta (Derek Carr) + * https://github.com/kubernetes/kubernetes/pull/72076 +* UserNS remapping: [updated proposal](https://github.com/kubernetes/community/pull/2595),[ implementation PR](https://github.com/kubernetes/kubernetes/pull/64005) (vikasc): + + Question raised at Zoom Chat: + +* From Mike Danese to Everyone: (10:42 AM) + +
I would like to reconcile the "Motivation" section with how we want people to use users and groups in general even without remapping. We want tenants to run workloads as different users and groups to segment their disk access to improve security, and we want userns to support compatibility with images that expect to be running as a given uid/gid. The current proposal uses a file in /etc/... which makes getting both hard.
Any path towards both these goals?
yup
exactly
+ +* From Patrick Lang to Everyone: (10:44 AM) + +
https://github.com/kubernetes/community/blob/bf48175c42fb71141f83071ce42f178d475b0bad/contributors/design-proposals/node/node-usernamespace-remapping.md#sandbox-type-runtimes - this proposes using nodeselectors to select nodes that can support this. Should this be designed to work with RuntimeClass instead to avoid using node selectors? + +* From Mike Danese to Everyone: (10:45 AM) + +
SIngle mapping on a node seems problematic IMO.
Namespace would be better.
Per pod would be best.
I'd like to see that explored
thanks
+ +* From 刘澜涛 to Everyone: (10:46 AM) + +
Just some background, it is possible to support per pod user namespace mapping in containerd.
It is already supported, we just need to add corresponding CRI implementation and kubelet support. + + + +# Dec 11 + +Cancelled due to KubeCon Seattle + + +# Dec 4 + + + +* Proposal: node shut down handling (Jing Xu) + * https://docs.google.com/document/d/1V7y9Boagev2LwQTSI1SaatMOdRC2gSQ9PBcESaITVEA +* Mechanism to apply/monitor quotas for ephemeral storage: https://docs.google.com/presentation/d/1I9yYACVSBOO0SGB0ohvpTImJSCc2AaOTuO6Kgsg4IIU/edit?usp=sharing + + +# Nov 27 + + + +* **Adding Termination Reason (OOMKilled/etc) Event or Counter (Brian)** + * Following discussion in sig-instrumentation, discuss either + * A. generating a logline / event for Pod termination reason or + * B. adding a termination reason counter in cadvisor/kubelet that can export to Prometheus + * Want to get better metrics when a container is killed. + * Want to get a count of all the termination reasons. + * Pod has last state, which has a termination reason + * kube-state-metrics has a termination reason, but as soon as the pod restarts it's gone + * would be nice to have a counter for how many containers have been killed for each possible reason (OOM, etc.) + * at minimum would be nice to have an event with the termination reason + * on top of this would be nice if cadvisor or kubelet could export a counter (e.g. for these events) + * Relevant Issue is: _"Log something about OOMKilled containers"_ https://github.com/kubernetes/kubernetes/issues/69676 +* **Node-Problem-Detector updates (wangzhen127)** + * adding a few new plugins: + * kubelet, docker, containerd crashlooping issues + * more checks on filesystem and docker overlay2 issues + * plans on making NPD easier to configure within k8s clusters +* **wg-lts a new working group. [**Dhawal Yogesh Bhanushali dbhanushali@vmware.com**] \ +**Mailing list: https://groups.google.com/forum/#!forum/kubernetes-wg-lts \ +PR: https://github.com/kubernetes/community/pull/2911 \ +Survey: https://docs.google.com/document/d/13lOGSSY7rz3yHMjrPq6f1kkRnzONxQeqUEbLOOOG-f4/edit + + slack: #wg-lts + +* + + +# Nov 20 + +Meeting canceled due to USA holiday (Thanksgiving) + + +# Nov 13 + +no agenda + + +# Nov 6 + + + +* Windows progress update [@patricklang] + * Tracking remaining v1.13 [stable] work at https://github.com/PatrickLang/k8s-project-management/projects/1 + + As of 11/6, down to test & docs + + * Release criteria doc, includes test case list: https://docs.google.com/document/d/1YkLZIYYLMQhxdI2esN5PuTkhQHhO0joNvnbHpW68yg8/edit# +* Change of node label name(space) of NFD labels (@marquiz) \ +https://github.com/kubernetes-incubator/node-feature-discovery/issues/176#issuecomment-436166692 \ +(related to NFD repo migration: https://github.com/kubernetes-incubator/node-feature-discovery/issues/175) + + +# Oct 30 + + + +* Node GPU Monitoring Demo ([dashpole@google.com](mailto:dashpole@google.com)) https://github.com/dashpole/example-gpu-monitor#example-gpu-monitor + * Feedback from Swati from Nvidia: the initial proposal with gRPC is very difficult to integrate. The new version with socket is doble. + * Next step: dashpole: Open PR to add and test the new socket endpoint in 1.13 \ +Swati from Nvidia: work on integrating [DCGM exporter](https://github.com/NVIDIA/gpu-monitoring-tools/blob/master/exporters/prometheus-dcgm/dcgm-exporter/dcgm-exporter) with the endpoint. +* Topology Manager (formerly known as NUMA Manager) proposal [cdoyle] \ +https://github.com/kubernetes/community/pull/1680 + * _TL;DR: Align topology-dependent resource binding within Kubelet._ + * Agreed to move forward with the proposal. + + +# Oct 9 + + + +* Move Node Feature Discovery to kubernetes-sigs (Markus Lehtonen) \ +https://github.com/kubernetes-incubator/node-feature-discovery/issues/175 +* [RuntimeClass scheduling](https://docs.google.com/document/d/1W51yBNTvp0taeEss56GTk8jczqFJ2d6jBeN6sCSlYZU/edit#) (tallclair) + + +# Oct 02 + + + +* 1.13 release +* Q4 planning discussion: + + https://docs.google.com/document/d/1HU6Ytm378IIw_ES3_6oQBoKxzYlt02DvOoOcQFkBcy4/edit?usp=sharing + + + +# \ +Sept 18 + + + +* Discuss release notes for 1.12 (Derek Carr) + * https://docs.google.com/document/d/1ZZkcIqDwUZiC77rhjA_XUGGsmAa9hyQ6nc5PXSX3LGI/edit +* NUMA Manager Proposal Demo & Update (Louise Daly, Connor Doyle, Balaji Subramaniam) + * https://github.com/kubernetes/community/blob/4793277c981e7c7a5d9cbf1b2ab1003fc68384d3/contributors/design-proposals/node/numa-manager.md + * https://github.com/lmdaly/kubernetes/tree/dev/numa_manager - Demo Code + + +# Sept 11 + + + +* Kubelet Devices Endpoint (device monitoring v2) + * [Slides](https://docs.google.com/presentation/d/1xz-iHs8Ec6PqtZGzsmG1e68aLGCX576j_WRptd2114g/edit?usp=sharing) + * [KEP](https://github.com/kubernetes/community/pull/2454) +* Fix port-forward fo non-namespaced Pods (@Xu) + * https://docs.google.com/presentation/d/1x1B2_DFZ9VI2E_-pB2pzeYKtUxrK4XLXd7I0JWD1Z40/edit#slide=id.g4189217af3_0_58 + * Related to: containerd shimv2 + kata etc + * some comments in the meeting: + * Kubelet doesn't want to see any containers/images visible to but not managed by kubelet. So if we want a solution like method 3, it should not visible to kubelet at all. And method 1 looks good to kubelet. + * There is a debug-container under discussion, which works quite similar to the method 3. + * Sig-node likes to revisit the port-forward method itself from the architecture level(?), however, the feature is required by many use cases, such as OpenShift, and it is essential. +* RFC Improve node health tracking - https://github.com/kubernetes/community/pull/2640 + * No discussion necessary at the moment + + +# Sept 4 + + + +* Ephemeral storage (part 2) + * Discussion of moving ephemeral storage & volume management to the CRI: \ +https://groups.google.com/d/topic/kubernetes-sig-storage/v2DKu8kNIgo/discussion +* Interested mentor/mentees post 1.12? + + +# Aug 28 + + + +* Discuss ephemeral storage quota enhancement (Robert Krawitz @Red Hat) https://docs.google.com/document/d/1ETuraEnA4UcMezNxSaEvxc_ZNF3ow-WyWlosAsqGsW0/edit?usp=sharing_eil&ts=5b7effb9 + + +# Aug 21 proposed agenda + + + +* Windows GA updates (Patrick Lang @Microsoft) + * Just finished sig-windows , and meeting notes is at https://docs.google.com/document/d/1Tjxzjjuy4SQsFSUVXZbvqVb64hjNAG5CQX8bK7Yda9w/edit# + * Discussed with sig-network and sig-storage as suggested at sig-node several weeks ago + * Based on the current quality and testing, plan to have GA for 1.13. +* Sidecar-container proposal: https://github.com/kubernetes/community/pull/2148 (Joseph) +* Kata & Container shim v2 Integration updates ([Xu@hyper.sh](mailto:Xu@hyper.sh), https://docs.google.com/presentation/d/1icEJ77idnXrRSj-mSpAkmy9cCpD1rIefPFrbKQnyB3Q/edit?usp=sharing) + * Shim V2 Proposal: https://github.com/containerd/containerd/issues/2426 +* Device Assignment Proposal: https://github.com/kubernetes/community/pull/2454 + + +# Aug 14 + + + +* Summary on the [New Resource API](https://github.com/kubernetes/community/pull/2265) [Follow-up](https://docs.google.com/document/d/1iWlfyYG781UVXCLEnpcFGNecutD4kALVcUJ0jqY7-_k) offline discussions (@vikaschoudhary16, @jiayingz, @connor, @renaud): + * We will start with multiple-matching model without priority field. + * We will start with allowing ResourceClass mutation. +* Summary on the [New Resource API](https://github.com/kubernetes/community/pull/2265) [user feedbacks](https://docs.google.com/document/d/1syxE8dwsUde5BuHuVibrJIxH6GfhTp2JYOM_cgNVrAI/) (@vikaschoudhary16, @jiayingz): + * tldr: we received feedbacks from multiple HW vendors, admins who manage large enterprise clusters, and k8s providers that represent large customer sets that the new Resource API a useful feature that will help unblock some of their important use cases + + +# Aug 07 + + + +* [Follow-up](https://docs.google.com/document/d/1iWlfyYG781UVXCLEnpcFGNecutD4kALVcUJ0jqY7-_k) [New Resource API ](https://github.com/kubernetes/community/pull/2265)KEP proposal (@vikaschoudhary16 and @jiayingz): + * Using `Priority` field for handling overlapping res classes + * Should res classes with same priority be allowed? + * non-mutable or mutable +* Device plugin 1.12 enhancement plan: (@jiayingz) + * https://docs.google.com/document/d/1evJDu6H-LowS5saVmwGKOpEWA8DtDn0QpbSmzX3iihY +* Add list of CSI drivers to Node.Status (@jsafrane) + * https://github.com/kubernetes/community/pull/2487 + * sig-node will review + * Major concern right away: node object is already too big. Jan will get API approval + sig-architecture approval to remove the old annotation (to save space). + * Will continue next week. + + +# July 31 + + + +* [tstclair & jsb] CRI versions and validation + * Docker 18.03-ce is the only installable version for Ubuntu 18.04 + * Validate against a docker API version: https://github.com/kubernetes/kubernetes/issues/53221 + * Sig-cluster-lifecycle requirements: + * Test dashboard to show the container runtime status: + * Docker: https://k8s-testgrid.appspot.com/sig-node-kubelet, https://k8s-testgrid.appspot.com/sig-node-cri + * Containerd: https://k8s-testgrid.appspot.com/sig-node-containerd + * CRI-O: https://k8s-testgrid.appspot.com/sig-node-cri-o + * Pending work to move to the newly defined test jobs for CRI + * A central place of document to tell users how to configure each container runtime. + * Follow up here - https://github.com/kubernetes/website/issues/9692 +* 1.12 Feature Freeze Review (Dawn/Derek) + * Planning doc for discussion + * https://docs.google.com/document/d/1m4Jzcd2p364kt2aMLSHZys7_02_oI3uwNkObrRXBO3k/edit + * List of feature issues opened + * https://github.com/kubernetes/features/issues?q=is%3Aissue+is%3Aopen+label%3Asig%2Fnode + + +# July 24 + + + +* Agenda Topic (owner) +* [tstclair & jsb] CRI versions and validation + * Docker 18.03-ce is the only installable version for Ubuntu 18.04 + + +# July 17 + + + +* [Device "Scheduling](https://docs.google.com/document/d/1Gad4s8BaFmUQ0JeYdJcXr6sek_q-1mjoKkFPaRQL6oA/edit?usp=sharing)" proposal (@dashpole) [slides](https://docs.google.com/presentation/d/1xz-iHs8Ec6PqtZGzsmG1e68aLGCX576j_WRptd2114g/edit?usp=sharing) +* [New Resource API ](https://github.com/kubernetes/community/pull/2265)KEP proposal (@vikaschoudhary16 and @jiayingz) + * Discussed the user stories we would like to enable through this proposal + * Discussed goals and non-goals. In particular, discussed why we proposed to use ResourceClass to express resource property matching constraints instead of directly expressing such constraints in Pod/Container spec. These reasons are documented in the non-goal section + * Question on whether it is common for people to create a cluster with multiple types of GPU. Or do people usually create multiple clusters, each maps to one GPU type. + * Answer: Yes. We have seen such customers from Openshift, GKE, and on-prem. + * Bobby: When we start to support group resources, and ComputeResource to ResourceClass matching behavior becomes many-to-many, that would bring quite a lot complexity and scaling concerns on scheduler. People have already seen scaling problems on scheduler when they use very big number of extended resources. + * Answer: scaling is definitely a concern. We don't plan to support group resource in the initial phase. We need to collect scaling and performance numbers and carefully evaluate them after the initial phase before moving forward. We should publish performance numbers and best-practice guidelines to advise people from mis-using the building blocks we put in. We should also think about the necessary mechanism to prevent people from getting into "bad" situation. + + +# July 10 + + + +* Follow-up from June 26 + * Move cri-tools to kubernetes-sigs. https://github.com/kubernetes-incubator/cri-tools/issues/331 + * Move cri-o to kubernetes-sigs (Mrunal/Antonio). \ +https://github.com/kubernetes-incubator/cri-o/issues/1639 + * Follow-up + * no disagreement in sig to move out of incubator org into sigs org + * dawn/derek to ack on issues and initiate transfer +* Future of Node feature discovery (Markus): \ +https://docs.google.com/document/d/1TXvveLiA_ByQoHTlFWtWCxz_kwlXD0EO9TqbRLTiOSQ/edit# +* [RuntimeClass](https://github.com/kubernetes/community/pull/2290) follow up: s/Parameters/RuntimeHandler/ (tallclair) +* Float [command refactor](https://github.com/kubernetes/kubernetes/issues/26093#issuecomment-403581393) idea (tallclair) +* [New Resource API ](https://github.com/kubernetes/community/pull/2265)KEP proposal (@vikaschoudhary16 and @jiayingz) +* [Device "Scheduling](https://docs.google.com/document/d/1Gad4s8BaFmUQ0JeYdJcXr6sek_q-1mjoKkFPaRQL6oA/edit?usp=sharing)" proposal (@dashpole) + + +# July 3 + +Cancelled + + +# June 26 {#june-26} + + + +* Move cri-tools to kubernetes-sigs. https://github.com/kubernetes-incubator/cri-tools/issues/331 +* Move cri-o to kubernetes-sigs (Mrunal/Antonio). \ +https://github.com/kubernetes-incubator/cri-o/issues/1639 +* RuntimeClass KEP: https://github.com/kubernetes/community/pull/2290 + * Derek: How will RuntimeClass will be used? What is the scope? + * Derek: Can we have a concrete example to prove the api works? + * Derek: Should unqualified image name normalization problem solved by RuntimeClass? - If yes, it is not only container runtime need to handle it, both Kubelet and scheduler need to normalize image name based on the RuntimeClass. + * How is storage and networking going to work in this sandbox model? +* Future of Node feature discovery (Markus): \ +https://docs.google.com/document/d/1TXvveLiA_ByQoHTlFWtWCxz_kwlXD0EO9TqbRLTiOSQ/edit# + + +# June 19 {#june-19} + + + +* [User Capabilities](https://github.com/filbranden/kubernetes-community/blob/usercap1/contributors/design-proposals/node/user-capabilities.md) (using ambient capabilities for non-root in container) [@filbranden](https://github.com/filbranden) \ +PR [kubernetes/community#2285](https://github.com/kubernetes/community/pull/2285) + + +# June 12 {#june-12} + + + +* Sandboxes API Decision (@tallclair) + * Proposed to move forward with RuntimeClass, instead of sandbox boolean + * Some rationale behind the proposal + * Unblocks runtime extensions beyond sandboxes like kata vs. gVisor, for example, windows + * Provides a clean way of specifying pod overhead and sandbox configuration + * +* APIsnoop e2e test to API mapping (@hh and @rohfle) + * https://apisnoop.cncf.io + * https://github.com/cncf/apisnoop/issues/17 + * https://github.com/cncf/apisnoop/tree/master/dev/e2e-audit-correlate/results/20180605-e2e-kubetest/ + * [slides](https://docs.google.com/presentation/d/1wrdBlLtHb_z5qmNwDDPrc9DRDs3Klpac83v8h5iAqjE/edit?usp=sharing) + + +# June 5 {#june-5} + +Cancelled + + +# May 29 {#may-29} + + + +* [Sandboxes API follow-up discussion](https://groups.google.com/d/topic/kubernetes-sig-node/rHLECuaXGJs/discussion) + + +# May 22 {#may-22} + + + +* [Sandboxes API Proposal](https://docs.google.com/document/d/1WzO_QjJFfedhsiBtfcVB2QzTWRXHEPX1xOyqDGXxO-0/edit#) (@tallclair) + * Q: Do we expect cluster admin to label nodes which support sandbox? + * It's not clear whether we should require all container runtimes to support sandbox, or this is an optional feature. + * This doesn't block alpha. + * Ideally each node will only support one sandbox technology, but if the user want to use different sandbox technology, they may need to have different node setup and label the node. + * Q: Will this be included in conformance test? + * Q: Can the sandbox be a runc implementation which meets the conformance? + * It is hard to define a conformance test suite to validate whether the sandbox meets the requirement. + * We can only provide guideline for now. + * We are still working on the definition, but one probably definition: idependent kernel, and 2 layers of security? + * If the sandbox definition is not clear, derek is worried that: + * User may never or always set Sandbox=true. + * If sandbox is only a boolean, when `sandbox=true`, a workload may work for some sandbox technology, but not work for another sandbox technology. + * Q: Why not just expose OCI compatible runtime? So that we can get similar thing with sandbox boolean, and more flexibility. + * For example, we can have an admission controller to do the validation and set defaults for an OCI compatible runtime, which is not necessarily to be in core Kubernetes. + * Q: What is the criteria to graduate this to beta/GA? + * The alpha version is mainly to get user feedback, it is not finalized api. + * Q: Is this a Kubernetes 1.13 evolution? Or a longer term project? + * A: 1.13 would be good if there is strong signal that this works, but expect this to take longer. Would like to see this gets into alpha in 1.12. + * Q: Is Selinux blocked when sandbox=true? + * AppArmor, Selinux are not blocked, guestOS can support to provide better isolation between containers inside the sandbox. + * Q: Instead of changing all "Sandbox" in CRI, can we rename to a more specific name "Kernel-Isolated" or something, which also makes the api less confusing? "Sandbox" is too vague. + + +# May 15 {#may-15} + + + +* [Monitoring proposal update](https://docs.google.com/document/d/1NYnqw-HDQ6Y3L_mk85Q3wkxDtGNWTxpsedsgw4NgWpg/edit?usp=sharing) (@dashpole) +* CFS quota change discussion : https://github.com/kubernetes/kubernetes/pull/63437 + * Dawn to sync with Vish, we seemed good with a knob at node level to tune period on call. +* Cheaper heartbeats: https://github.com/kubernetes/community/pull/2090 +* Update on 1.11 development feature progress (reviews, etc.) + * https://docs.google.com/document/d/1rtdcp4n3dTTxjplkNvPDgAW4bbB5Ve4ZsMXBGOYKiP0/edit + * sysctls to Beta fields need reviews (Jan) + * https://github.com/kubernetes/kubernetes/pull/63717 +* CRI container log path: + * Only add pod name and namespace into the log path to avoid regression: https://github.com/kubernetes/kubernetes/issues/58638 + * For more metadata support e.g. namespace uid, need more thoughts, let's do it in the future. + * e.g. We can potentially use fluentd output filter plugin (e.g. https://github.com/bwalex/fluent-plugin-docker-format) + node metadata endpoint proposed by @dashpole. + + +# May 8 {#may-8} + + + +* Cancelled due to KubeCon and OpenShift + + +# May 1 {#may-1} + + + +* Sysctl move to beta (@ingvagabund) (Seth) + * https://github.com/kubernetes/community/pull/2093 + * [Discuss KEP for user namespace (proposal, @mrunal, @vikas, @adelton)](https://github.com/kubernetes/community/pull/2067/files) + * Add probe based mechanism for kubelet plugin discovery: (@vikasc) + + https://github.com/kubernetes/kubernetes/pull/63328 + + + +# April 24 {#april-24} + + + +* [Discuss KEP for user namespace (proposal, @mrunal, @vikas, @adelton)](https://github.com/kubernetes/community/pull/2067/files) +* Online volume resizing ([proposal](https://github.com/kubernetes/community/pull/1535), @gnufied) +* Maximum volume limit ([proposal](https://github.com/kubernetes/community/pull/2051), gnufied) +* Sig-Node Initial Charter: (https://github.com/kubernetes/community/pull/2065, @derekwaynecarr) +* Add probe based mechanism for kubelet plugin discovery: (@vikasc) + + https://github.com/kubernetes/kubernetes/pull/59963 + + + +# April 17 {#april-17} + + + +* Re-categorize/tag Node E2E tests ([proposal](https://docs.google.com/document/d/1BdNVUGtYO6NDx10x_fueRh_DLT-SVdlPC_SsXjYCHOE/edit?usp=sharing), @yujuhong) + * We need to consider whether to include test like eviction test into conformance suite. They are essential features, but the test itself has some special requirement 1) it needs to be run in serial; 2) it does make some assumptions such as the node size. +* [Virtual Kubelet (@rbitia)](https://docs.google.com/document/d/1MAn_HMZScni89hDwI4nQMk_SWTx9oi16PuOREhpVJJI/edit?usp=sharing) Updates +* Add probe based mechanism for kubelet plugin discovery: https://github.com/kubernetes/kubernetes/pull/59963 + + +# April 10 {#april-10} + + + +* Windows Service Accounts in Kubernetes (@patricklang) + * https://github.com/kubernetes/kubernetes/issues/62038 + * How to support Windows Group Managed Service Account (gMSA) + * WindowsCredentialSpec is windows specific identity + * Option 1: WindowsCredentialSpec references a secret + * Option 2: WindowsCredentialSpec contains the JSON passed as-is in the OCI spec + * No password information inside, only some information about the host. +* Sandbox resource management - [discussion questions](https://docs.google.com/document/d/1FtPFc8hhAVBWwTvU0BWrn34Salh2zZk0GXJz8ilN8bw/edit#) + * Pod resources design proposal: https://docs.google.com/document/d/1EJKT4gyl58-kzt2bnwkv08MIUZ6lkDpXcxkHqCvvAp4/edit + * If user set the overhead, how should runtime controller deal with it? + * Reject the pod, may cause problem on upgrade. + * Overwrite it. + * Derek: Can the node advertise/register the overhead to the control plane? Can node report per pod overhead in the node status? And let scheduler make decision based on that? + * Derek: Can we make kata as a node resource? And pod wants to run on kata needs to request that resource? + * Derek: Pod level limit is easier to set by user. - Dawn: It may not be true in the future based on the trend of Kubernetes today, e.g. Sidecar containers injected by Istio. - Derek: We can resolve the istio issue by dynamically resizing the limit. + * Dawn: Do we expect mixed sandbox and non-sandbox containers on a node? - Tim: Yes. The reason is that we do have system containers can't run in sandbox, e.g. fluentd, kube-proxy etc. + * Derek: Do we expect multiple runtimes on a single node? e.g. containerd, cri-o etc. - Dawn: No. Each node should have one container runtime, but it supports running container with different OCI runtimes, e.g. runc, kata etc. + * Derek: What happens to the best effort pod? - Dawn: This is an issue for frakti today, so they still set fixed limit. +* Container runtime stream server authentication https://github.com/kubernetes/kubernetes/issues/36666#issuecomment-378440458 + * +* Add probe based mechanism for kubelet plugin discovery: https://github.com/kubernetes/kubernetes/pull/59963 +* Node-level checkpointing: + +https://github.com/kubernetes/kubernetes/pull/56040 + + + +* sig-node charter discussion + * wip: https://github.com/derekwaynecarr/community/blob/sig-node-charter/sig-node/governance.md + + +# April 3 {#april-3} + + + +* Resource Management F2F update + * [Meeting notes](https://docs.google.com/document/d/1Df6uGCzGleAhRQYZ20v55U1YCBLfxV4PJh0CIWcMHkk/edit#heading=h.lc5c2ccmeh4i) +* Q2 planning draft (@dawnchen) + * https://docs.google.com/document/d/1rtdcp4n3dTTxjplkNvPDgAW4bbB5Ve4ZsMXBGOYKiP0/edit?usp=sharing +* Node Controller question - https://github.com/kubernetes/kubernetes/issues/61921 + * Node deduplicates multiple InternalIP addresses on a node + * ask Walter Fender (@cheftako) + + +# March 27 {#march-27} + + + +* [reduce scope of node on taints/labels (@mikedanese)](https://github.com/kubernetes/community/pull/911) +* [Node TLS bootstrap via TPM](https://docs.google.com/document/d/12UgErB_46iHBOi0YEKbaFbEXv9E5O6XWQihwPkwB_7s) (@awly) + + +# March 20 {#march-20} + + + +* Secure Container Use Cases updates (@tallclair) + * Strongly suggesting pod as the security boundary, @tallclair to write up decision (welcoming feedback and/or disagreement) + * Extra overhead of sandbox. Charge to who? pod-level resource request / limits? See updates on the [solution space doc](https://docs.google.com/document/d/1QQ5u1RBDLXWvC8K3pscTtTRThsOeBSts_imYEoRyw8A/edit#heading=h.6xmg7pt28mrd) + * Need to think more about volume model in the sandboxes to enforce 2 security boundaries (and prevent issues like CVE-2017-100210[12] + * sysctl: need to review the support list. + * device plugins - needs more consideration, [contributions welcome](https://docs.google.com/document/d/1QQ5u1RBDLXWvC8K3pscTtTRThsOeBSts_imYEoRyw8A/edit#heading=h.uqm3zexmm967) +* [Proposal: Pod Ready++](https://docs.google.com/document/d/1VFZbc_IqPf_Msd-jul7LKTmGjvQ5qRldYOFV0lGqxf8/edit?ts=5aa1d7b4#heading=h.wvpldvtvh8u4) (@freehan) +* [containerd f2f meeting update (@lantaol)](https://docs.google.com/document/d/1MrgDYOSTjysMPcc6D7OnaeIDEc48lMjQAB10EIKQ5Go/edit?usp=sharing) +* [Virtual Kubelet (@rbitia)](https://docs.google.com/document/d/1MAn_HMZScni89hDwI4nQMk_SWTx9oi16PuOREhpVJJI/edit?usp=sharing) + * Doc posted during the meeting. Please read for next week. + * https://docs.google.com/document/d/1vhrbB6vytFJxU6CrlMqlH6wC3coQOwtsIj-aFzBbPXI/edit#heading=h.dshaptx6acc from Ben VMWare. + * Review the design next week +* [RawProc options for CRI (@jessfraz)](https://github.com/kubernetes/community/pull/1934) + * Add use cases and how this will be incorporated in the kubernetes API + + +# March 13 {#march-13} + + + +* Containerd F2F meeting notes: https://docs.google.com/document/d/1MrgDYOSTjysMPcc6D7OnaeIDEc48lMjQAB10EIKQ5Go/edit?usp=sharing + + +# Mar 6 {#mar-6} + + + +* CRI-O status updates (@mrunal) + * slide update: https://docs.google.com/presentation/d/1TQ6sBo63AXt6QF3LxA3jtnNzT330MpPDODORhLVcDH0/edit?ts=5a9ed52a#slide=id.g3134b94d16_0_0 + * pr help needed for dashboard: https://github.com/kubernetes/test-infra/pull/5943 + * +* Official Contributor Experience Reach-Out (@jberkus) + * Process for label and PR workflow changes + * [Issue Triage](https://github.com/kubernetes/community/blob/master/contributors/guide/issue-triage.md) + * Mentoring! + * [Meet our contributors](https://github.com/kubernetes/community/blob/master/mentoring/meet-our-contributors.md) + * [Group mentoring](https://github.com/kubernetes/community/blob/master/mentoring/group-mentoring.md) + * why: https://k8s.devstats.cncf.io/d/QQN85o3zk/pr-workload-table?orgId=1 + * https://k8s.devstats.cncf.io/d/46/user-reviews?orgId=1 + * [Contributor Guide](https://github.com/kubernetes/community/tree/master/contributors/guide) + * [Governance and charters](https://github.com/kubernetes/community/tree/master/committee-steering/governance) + * What can Contribex do for you? +* + + +# Feb 27 {#feb-27} + + + +* Virtual Kubelet implementation doc: (@rbitia) https://docs.google.com/document/d/1tu27_BquhUAmYLaJznjbbdLJKYrH5-stm1i0OcxRgE8/edit?usp=sharing + + +# Feb 20 {#feb-20} + + + +* Secure isolation discussion continue (@tallclair) + * https://docs.google.com/document/d/1QQ5u1RBDLXWvC8K3pscTtTRThsOeBSts_imYEoRyw8A/edit?usp=sharing + + +# Feb 13 {#feb-13} + + + +* Secure isolation update + * Owner: @tallclair + * https://docs.google.com/document/d/1QQ5u1RBDLXWvC8K3pscTtTRThsOeBSts_imYEoRyw8A/edit?usp=sharing + * Introduce solution space doc, goals & high-level overview + * Discuss in a future meeting, after folks have time to digest the document +* Collection of options to integrate kata with CRI + * Owner: @resouer, ~20 mins + * Doc: https://docs.google.com/document/d/1PgXJpzSfhR_1idkNtcZNuncfUYV-U-syO-HSGxwQrVw/edit?usp=sharing + * Introduce the existing & proposed ways of integrating Kata with CRI, a brief evaluation will also be included. + + +# Feb 6 {#feb-6} + + + +* CRI: testing policy + * Owner: yujuhong@ + * https://github.com/kubernetes/community/pull/1743 +* CRI: container runtime cgroup detection (or not) + * Owner: yujuhong@, lantaol@ + * https://github.com/kubernetes/kubernetes/issues/30702 + * The runtime stats is only used for monitoring today. + * Do not change CRI for this. Instead, the runtime cgroup can be passed to the kubelet through the existing flag `runtime-cgroups`. +* CRI: Image Filesystem Storage Identifier + * Owner: yujuhong@, lantaol@ + * https://github.com/kubernetes/kubernetes/issues/57356 + * Slides: https://docs.google.com/presentation/d/1BbXgmEVhH0p2cgoojN36Q6SMHZ8tl00kfjITmYEf9vM/edit?usp=sharing + * Kubelet can use `statfs` to get fs stats. It is simple enough. + * Kubelet get image filesystem path through CRI vs. Configure image filesystem path through kubelet flag? The latter seems to be preferred. +* Release reminder: sig-node's v1.10 marked issues and features need scrubbed. We're officially past feature freeze and code freeze is coming soon. [More details in email on kubernetes-dev today](https://groups.google.com/forum/#!topic/kubernetes-dev/Jt4hbwzZrbA) and [new issue list](https://docs.google.com/document/d/11u91ypj8Gt8PlTincWuQ3iB2X3tITBxqn6JMkTduEZw/edit#heading=h.30996wkvo4bo) this week is largely SIG-Node's. +* Reminder: We are close (hopefully this week) to graduating the Kubelet's componentconfig API to beta status, as our existing TODO list is complete. Please take a look at https://github.com/kubernetes/kubernetes/pull/53833. +* Reminder: NamespaceOption is changing in the CRI with [#58973](https://github.com/kubernetes/kubernetes/pull/58973), runtimes will break. + * + + +# Jan 30 {#jan-30} + + + +* Revised Windows container resource management (CPU & memory): + * **Owner**: Jiangtian Li (jiangtli@microsoft.com), Patrick Lang (plang@microsoft.com) + * https://github.com/kubernetes/community/pull/1510 + * What about container without resource request/limit? - Zero value passed to Docker, and Docker will set an value based on the current node usage. + * CPU assignment? - Not supported in windows platform now. + * Is in place resource update supported on Windows? - Immutable today. +* Virtual Kubelet introduction + * **Owner:** Ria Bhatia, [ria.bhatia@microsoft.com](mailto:ria.bhatia@microsoft.com) + * **Context:** + + [Virtual Kubelet](https://github.com/Virtual-Kubelet/Virtual-Kubelet) is an interface for plugging in _anything, _into a K8 cluster_, _to replicate the lifecycle of a kubernetes pod. I'd like to kick off discussion for the concepts of VK, the design of the project itself, networking scenarios and also come up with a formal provider definition. + +* Node e2e conformance tests: https://github.com/kubernetes/kubernetes/issues/59001 + * [Spreadsheet](https://docs.google.com/spreadsheets/d/1Af7zSLDEDM3i5g-8MZi0oZ7zPt85rWI1kHMQybhS9H0/edit?usp=sharing) to re-categorize tests: please comment if you think a test should be added to or removed from the conformance suite. Please also suggest feature tags for tests. + + +# Jan 23 {#jan-23} + + + +* [Invoke Allocate RPC call at container creation time#58282](https://github.com/kubernetes/kubernetes/pull/58282) + * **Owner:** @RenaudWasTaken <rgaubert@nvidia.com> + * **Context:** \ +Last week the resource-management workgroup tried to tackle a design issue related to the device plugin. \ +After much discussion we agreed that we wanted more opinions on the different approaches that we currently have. \ +I've created a [document that captures the different approaches as well as the pros and cons](https://docs.google.com/document/d/1xfmakZZ_0Pq6OpLTXZTqOD20tj_hQuQAXUuVujxL1Rw/edit?usp=sharing) + * We should have define a separate per container operation in device plugin. + * https://github.com/kubernetes/kubernetes/pull/58172 -- yuju to review +* Logging improvements + * Derek Carr, Peter Portante + * https://github.com/kubernetes/kubernetes/issues/58638 + * Namespace name can duplicate overtime, we may need namespace UUID in container log path. +* CRI container log stats + * Slides: https://docs.google.com/presentation/d/1BKbTa7RBVMTjZlk_6CV5SZfV4fen3bzk5PoqiXYIUK4/edit#slide=id.p + * Issue: https://github.com/kubernetes/kubernetes/issues/55905 +* Container log rotation + * yujuhong@ + * https://docs.google.com/document/d/1oQe8dFiLln7cGyrRdholMsgogliOtpAzq6-K3068Ncg/edit?usp=sharing +* Node auto repair repository discussion + * Derek Carr + * desire is to have an open source node remedy system that watches node conditions and/or taints to coordinate remedy response across multiple clouds (i.e. for example, reboot node) + + +# Jan 16 {#jan-16} + + + +* Review Pending v1.Pod API Changes [@verb] + * Debug Containers: [Command in PodStatus](https://github.com/verb/community/blob/ff62a6a15f094f00acf5e5dfe972b73197f04042/contributors/design-proposals/node/troubleshoot-running-pods.md#debug-container-status) + * Requires keeping track of v1.Container in kubelet + * Shared PID: [ShareProcessNamespace in PodSpec](https://github.com/verb/community/blob/080397194824d19d72b8c50f92667e93e7a244af/contributors/design-proposals/node/pod-pid-namespace.md#kubernetes-api-changes) +* Windows container roadmap and Windows configuration in CRI (Patrick.Lang, jiangtli @microsoft.com) + * https://trello.com/b/rjTqrwjl/windows-k8s-roadmap + * Windows container configuration in CRI: + * https://github.com/kubernetes/community/pull/1510 + * Questions: + * Question: Is there pod level sandbox isolation mechanism on Windows? + * Answer: Currently only kernel level isolation for per-container, no hierarchy isolation like cgroups. They will work with containerd and cri-containerd to add the hypervisor level isolation at pod level. + * Dawn: Can we add e2e test for the change we make? + * Dawn & Yuju: Can we have an overview of how resource management will work on windows? + * Dawn: Let's start with CPU and memory, from your current proposal, currently there is no storage resource define there now. +* Review request: Node-level Checkpointing Manager (Vikas, vichoudh@redhat.com) + * https://github.com/kubernetes/kubernetes/pull/56040 + * Just requires approval + * It's good to have a common library instead of having separate implementations in different part. However, we should make it clear that we prefer not to add checkpoint. + * Derek: Maybe we can have a document to track all the checkpoint done by kubelet, and whenever people add new checkpoint they need to update the document. +* Short discussion about the node-fencing progress (bronhaim@, redhat) + * https://github.com/kubernetes/community/pull/1416 + * https://github.com/rootfs/node-fencing + + +# Jan 9 {#jan-9} + + + +* crictl v1.0.0-alpha.0 demo. @lantaol + * Slides: https://docs.google.com/presentation/d/1jnInpUccKCRxCuXWC141gjipRs0Y5_0XSELDpUEXz5E/edit?usp=sharing +* [kube-spawn](https://github.com/kinvolk/kube-spawn/) local multi-node cluster tool + * Using kubeadm and systemd-nspawn + * Useful for: testing CRI or other Kubernetes patches + * Demo: https://asciinema.org/a/152314 +* deprecating rktnetes in k8s v1.10 (tracking issue: https://github.com/kubernetes/kubernetes/issues/53601) +* sig-node Q1, 2018 plan + * https://docs.google.com/document/d/15F3nWPPG3keP0pzxgucPjA7UBj3C31VsFElO7KkDU04/edit + * much of the work carried from last quarter + * draft includes priority and possible owners + * please review and suggest changes before sharing with SIG PM +* + + +# Jan 2 {#jan-2} + +Cancelled. |
