diff options
| author | Aldo Culquicondor <acondor@google.com> | 2021-12-15 16:41:05 -0500 |
|---|---|---|
| committer | Aldo Culquicondor <acondor@google.com> | 2022-02-09 12:28:47 -0500 |
| commit | 0cf4239bae398fdb055a115511772f6a7b059427 (patch) | |
| tree | 47b46adf269d16fe8c38759552aeb2543b414727 /wg-batch | |
| parent | 879546644f6fc5c8c6206683c12afdb6d5e54c0d (diff) | |
Add WG Batch with charter
Diffstat (limited to 'wg-batch')
| -rw-r--r-- | wg-batch/README.md | 42 | ||||
| -rw-r--r-- | wg-batch/charter.md | 96 |
2 files changed, 138 insertions, 0 deletions
diff --git a/wg-batch/README.md b/wg-batch/README.md new file mode 100644 index 00000000..9af539ec --- /dev/null +++ b/wg-batch/README.md @@ -0,0 +1,42 @@ +<!--- +This is an autogenerated file! + +Please do not edit this file directly, but instead make changes to the +sigs.yaml file in the project root. + +To understand how this file is generated, see https://git.k8s.io/community/generator/README.md +---> +# Batch Working Group + +Discuss and enhance the support for Batch (eg. HPC, AI/ML, data analytics, CI) workloads in core Kubernetes. We want to unify the way users deploy batch workloads to improve portability and to simplify supportability for Kubernetes providers. + +The [charter](charter.md) defines the scope and governance of the Batch Working Group. + +## Stakeholder SIGs +* [SIG Apps](/sig-apps) +* [SIG Autoscaling](/sig-autoscaling) +* [SIG Node](/sig-node) +* [SIG Scheduling](/sig-scheduling) + +## Meetings +*Joining the [mailing list](TBD) for the group will typically add invites for the following meetings to your calendar.* +* Regular Meeting: [TBDs at TBD UTC](TBD) (biweekly). [Convert to your timezone](http://www.thetimezoneconverter.com/?t=TBD&tz=UTC). + * [Meeting notes and Agenda](TBD). + * [Meeting recordings](TBD). + +## Organizers + +* Wei Huang (**[@Huang-Wei](https://github.com/Huang-Wei)**), Apple +* Abdullah Gharaibeh (**[@ahg-g](https://github.com/ahg-g)**), Google +* Danielle Lancashire (**[@endocrimes](https://github.com/endocrimes)**), VMware +* Maciej Szulik (**[@soltysh](https://github.com/soltysh)**), Red Hat +* Swati Sehgal (**[@swatisehgal](https://github.com/swatisehgal)**), Intel + +## Contact +- Slack: [#wg-batch](https://kubernetes.slack.com/messages/wg-batch) +- [Mailing list](TBD) +- [Open Community Issues/PRs](https://github.com/kubernetes/community/labels/wg%2Fbatch) +- Steering Committee Liaison: Bob Killen (**[@mrbobbytables](https://github.com/mrbobbytables)**) +<!-- BEGIN CUSTOM CONTENT --> + +<!-- END CUSTOM CONTENT --> diff --git a/wg-batch/charter.md b/wg-batch/charter.md new file mode 100644 index 00000000..924d6bb8 --- /dev/null +++ b/wg-batch/charter.md @@ -0,0 +1,96 @@ +# WG Batch Charter + +This charter adheres to the conventions described in the [Kubernetes Charter README] and uses +the Roles and Organization Management outlined in [wg-governance]. + +[Kubernetes Charter README]: /committee-steering/governance/README.md + +## Scope + +Discuss and enhance the support for Batch (eg. HPC, AI/ML, data analytics, CI) +workloads in core Kubernetes. We want to unify the way users deploy batch +workloads to improve portability and to simplify supportability for Kubernetes +providers. + +### In scope + +- To reduce fragmentation in the k8s batch ecosystem: congregate leads and users from + different external and internal projects and user groups (CNCF TAGs, k8s sub-projects + focused on batch-related features such as topology-aware scheduling) in the batch ecosystem to + gather requirements, validate designs and encourage reutilization of core kubernetes APIs. +- The following recommendations for enhancements: + - Additions to the batch API group, currently including Job and CronJob resources + that benefit batch use cases such as HPC, AI/ML, data analytics and CI. + - Primitives for job-level queueing, not limited to the k8s Job resource. Long-term, + this could include multi-cluster support. + - Primitives to control and maximize utilization of resources in fixed-size clusters + (on-prem) and elastic clusters (cloud). + - Runtime and scheduling support for specialized hardware (GPUs, NUMA, RDMA, etc.) + +### Out of scope + +- Addition of new API kinds that serve a specialized type of workload. The focus + should be on general APIs that specialized controllers can build on top of. +- Uses of the batch APIs as support for serving workloads (eg. backups, + upgrades, migrations). These can be served by existing SIGs. +- Proposals that duplicate the functionality of core kubernetes components + (job-controller, kube-scheduler, cluster-autoscaler). +- Job workflows or pipelines. Mature third party frameworks serve these + use cases with the current kubernetes primitives. But additional primitives + to support these frameworks could be in scope. + +## Stakeholders + +Stakeholders in this working group span multiple SIGs that own parts of the +code in core kubernetes components and addons. + +- Apps +- Autoscaling +- Node +- Scheduling + +## Deliverables + +The list of deliverables include the following high level features: + +- To SIG Apps: + - Updated Job API that fulfills the needs of a wider range of batch applications. + - A performant job controller that can scale to thousands of pods per minute. +- To SIG Scheduling and Autoscaling + - A set of APIs to support job queueing, a framework to support different + queueing policies and a ready-to-use implementation as a subproject. + - Scheduling plugin(s) to support different batch needs. +- To SIG Autoscaling: + - Capabilities for job-level provisioning. +- To SIG Node: + - Runtime support for specialized hardware. + +## Roles and Organization Management + +This wg adheres to the Roles and Organization Management outlined in [wg-governance] +and opts-in to updates and modifications to [wg-governance]. + +[wg-governance]: /committee-steering/governance/wg-governance.md + +Additionally, the wg commits to: + +- maintain a solid communication line between the Kubernetes groups and the wider CNCF community; +- submit a proposal to the KubeCon/CloudNativeCon maintainers track; if not selected, a video update will be recorded and listed below. + +## Timelines and Disbanding + +As a first mandate, the wg will define a roadmap in the first quarter +of operation. We envision three timelines for the exit criteria, the focus will +be on early exit, but a determination on whether or not to go beyond +that is left until we reach that milestone. + +1. Early exit: define "recommendations" for the deliverables mentioned above, those + recommendations would be left to the respective sigs to implement. The WG could + start implementing those recommendations in the context of the owning sig to generate + some momentum. +2. Mileston 2, Late exit: The WG continues the implementation of the recommendations until they reach GA, + and then disband. +2. Convert to SIG: The WG observes a constant influx of requirements for the artifacts and there + is the risk that the SIGs don't have enough capacity to maintain them. + Then, the wg will propose the graduation into a SIG, taking ownership of the + APIs, controllers and scheduling plugins. |
