summaryrefslogtreecommitdiff
path: root/wg-batch
diff options
context:
space:
mode:
authorAldo Culquicondor <acondor@google.com>2021-12-15 16:41:05 -0500
committerAldo Culquicondor <acondor@google.com>2022-02-09 12:28:47 -0500
commit0cf4239bae398fdb055a115511772f6a7b059427 (patch)
tree47b46adf269d16fe8c38759552aeb2543b414727 /wg-batch
parent879546644f6fc5c8c6206683c12afdb6d5e54c0d (diff)
Add WG Batch with charter
Diffstat (limited to 'wg-batch')
-rw-r--r--wg-batch/README.md42
-rw-r--r--wg-batch/charter.md96
2 files changed, 138 insertions, 0 deletions
diff --git a/wg-batch/README.md b/wg-batch/README.md
new file mode 100644
index 00000000..9af539ec
--- /dev/null
+++ b/wg-batch/README.md
@@ -0,0 +1,42 @@
+<!---
+This is an autogenerated file!
+
+Please do not edit this file directly, but instead make changes to the
+sigs.yaml file in the project root.
+
+To understand how this file is generated, see https://git.k8s.io/community/generator/README.md
+--->
+# Batch Working Group
+
+Discuss and enhance the support for Batch (eg. HPC, AI/ML, data analytics, CI) workloads in core Kubernetes. We want to unify the way users deploy batch workloads to improve portability and to simplify supportability for Kubernetes providers.
+
+The [charter](charter.md) defines the scope and governance of the Batch Working Group.
+
+## Stakeholder SIGs
+* [SIG Apps](/sig-apps)
+* [SIG Autoscaling](/sig-autoscaling)
+* [SIG Node](/sig-node)
+* [SIG Scheduling](/sig-scheduling)
+
+## Meetings
+*Joining the [mailing list](TBD) for the group will typically add invites for the following meetings to your calendar.*
+* Regular Meeting: [TBDs at TBD UTC](TBD) (biweekly). [Convert to your timezone](http://www.thetimezoneconverter.com/?t=TBD&tz=UTC).
+ * [Meeting notes and Agenda](TBD).
+ * [Meeting recordings](TBD).
+
+## Organizers
+
+* Wei Huang (**[@Huang-Wei](https://github.com/Huang-Wei)**), Apple
+* Abdullah Gharaibeh (**[@ahg-g](https://github.com/ahg-g)**), Google
+* Danielle Lancashire (**[@endocrimes](https://github.com/endocrimes)**), VMware
+* Maciej Szulik (**[@soltysh](https://github.com/soltysh)**), Red Hat
+* Swati Sehgal (**[@swatisehgal](https://github.com/swatisehgal)**), Intel
+
+## Contact
+- Slack: [#wg-batch](https://kubernetes.slack.com/messages/wg-batch)
+- [Mailing list](TBD)
+- [Open Community Issues/PRs](https://github.com/kubernetes/community/labels/wg%2Fbatch)
+- Steering Committee Liaison: Bob Killen (**[@mrbobbytables](https://github.com/mrbobbytables)**)
+<!-- BEGIN CUSTOM CONTENT -->
+
+<!-- END CUSTOM CONTENT -->
diff --git a/wg-batch/charter.md b/wg-batch/charter.md
new file mode 100644
index 00000000..924d6bb8
--- /dev/null
+++ b/wg-batch/charter.md
@@ -0,0 +1,96 @@
+# WG Batch Charter
+
+This charter adheres to the conventions described in the [Kubernetes Charter README] and uses
+the Roles and Organization Management outlined in [wg-governance].
+
+[Kubernetes Charter README]: /committee-steering/governance/README.md
+
+## Scope
+
+Discuss and enhance the support for Batch (eg. HPC, AI/ML, data analytics, CI)
+workloads in core Kubernetes. We want to unify the way users deploy batch
+workloads to improve portability and to simplify supportability for Kubernetes
+providers.
+
+### In scope
+
+- To reduce fragmentation in the k8s batch ecosystem: congregate leads and users from
+ different external and internal projects and user groups (CNCF TAGs, k8s sub-projects
+ focused on batch-related features such as topology-aware scheduling) in the batch ecosystem to
+ gather requirements, validate designs and encourage reutilization of core kubernetes APIs.
+- The following recommendations for enhancements:
+ - Additions to the batch API group, currently including Job and CronJob resources
+ that benefit batch use cases such as HPC, AI/ML, data analytics and CI.
+ - Primitives for job-level queueing, not limited to the k8s Job resource. Long-term,
+ this could include multi-cluster support.
+ - Primitives to control and maximize utilization of resources in fixed-size clusters
+ (on-prem) and elastic clusters (cloud).
+ - Runtime and scheduling support for specialized hardware (GPUs, NUMA, RDMA, etc.)
+
+### Out of scope
+
+- Addition of new API kinds that serve a specialized type of workload. The focus
+ should be on general APIs that specialized controllers can build on top of.
+- Uses of the batch APIs as support for serving workloads (eg. backups,
+ upgrades, migrations). These can be served by existing SIGs.
+- Proposals that duplicate the functionality of core kubernetes components
+ (job-controller, kube-scheduler, cluster-autoscaler).
+- Job workflows or pipelines. Mature third party frameworks serve these
+ use cases with the current kubernetes primitives. But additional primitives
+ to support these frameworks could be in scope.
+
+## Stakeholders
+
+Stakeholders in this working group span multiple SIGs that own parts of the
+code in core kubernetes components and addons.
+
+- Apps
+- Autoscaling
+- Node
+- Scheduling
+
+## Deliverables
+
+The list of deliverables include the following high level features:
+
+- To SIG Apps:
+ - Updated Job API that fulfills the needs of a wider range of batch applications.
+ - A performant job controller that can scale to thousands of pods per minute.
+- To SIG Scheduling and Autoscaling
+ - A set of APIs to support job queueing, a framework to support different
+ queueing policies and a ready-to-use implementation as a subproject.
+ - Scheduling plugin(s) to support different batch needs.
+- To SIG Autoscaling:
+ - Capabilities for job-level provisioning.
+- To SIG Node:
+ - Runtime support for specialized hardware.
+
+## Roles and Organization Management
+
+This wg adheres to the Roles and Organization Management outlined in [wg-governance]
+and opts-in to updates and modifications to [wg-governance].
+
+[wg-governance]: /committee-steering/governance/wg-governance.md
+
+Additionally, the wg commits to:
+
+- maintain a solid communication line between the Kubernetes groups and the wider CNCF community;
+- submit a proposal to the KubeCon/CloudNativeCon maintainers track; if not selected, a video update will be recorded and listed below.
+
+## Timelines and Disbanding
+
+As a first mandate, the wg will define a roadmap in the first quarter
+of operation. We envision three timelines for the exit criteria, the focus will
+be on early exit, but a determination on whether or not to go beyond
+that is left until we reach that milestone.
+
+1. Early exit: define "recommendations" for the deliverables mentioned above, those
+ recommendations would be left to the respective sigs to implement. The WG could
+ start implementing those recommendations in the context of the owning sig to generate
+ some momentum.
+2. Mileston 2, Late exit: The WG continues the implementation of the recommendations until they reach GA,
+ and then disband.
+2. Convert to SIG: The WG observes a constant influx of requirements for the artifacts and there
+ is the risk that the SIGs don't have enough capacity to maintain them.
+ Then, the wg will propose the graduation into a SIG, taking ownership of the
+ APIs, controllers and scheduling plugins.