summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorNiklas Q. Nielsen <niklas.nielsen@intel.com>2018-06-01 13:11:21 -0700
committerNiklas Q. Nielsen <nik@qni.dk>2018-06-01 13:11:51 -0700
commit151d810e84b00307b9db07f0c1e935ebac42dc90 (patch)
tree7ecd8b29475fa0dc25395c5a6805fec630d9b221
parentd4bd6d7b9886da9c5dee183b5a5084e4d03a6d67 (diff)
Adding a bit more detail to the mlwg charter
-rwxr-xr-xwg-machine-learning/README.md16
1 files changed, 12 insertions, 4 deletions
diff --git a/wg-machine-learning/README.md b/wg-machine-learning/README.md
index eaeaf835..b9c6efbd 100755
--- a/wg-machine-learning/README.md
+++ b/wg-machine-learning/README.md
@@ -30,9 +30,17 @@ A working group dedicated towards making Kubernetes work best for Machine Learni
The charter for this working group as [proposed](https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/kubernetes-dev/lOeMjOLilxI/wuQayFDvCQAJ) is as follows:
- * Assess the state of the art for ML workloads on K8s
- * Identify pain points users currently have with ML on k8s
- * Identify, prioritize and execute on improving k8s to better support ML workloads in the near, medium, and long term.
+ - Assess the state of the art for ML workloads on K8s
+ - Identify pain points users currently have with ML on k8s
+ - Identify, prioritize and execute on improving k8s to better support ML workloads in the near, medium, and long term.
+
+## Goals:
+
+Topics include, but are not limited to:
+
+ - Ease source changes to execution workflows, as they are a common barrier to entry.
+ - Scheduler enhancements such as improved bin packing for accelerators, job queueing, fair sharing and gang scheduling.
+ - Runtime enhancements such as job data loading (common data set sizes in the tens of gigabytes to terabytes), accelerator support, persisting job output (ML workloads can run for days and rely heavily on checkpointing) and multi-tenancy and job isolation (dealing with potential sensitive data sets).
+ - Job management such as experiment tracking (including enabling hyperparameter tuning systems) and scaling and deployment aspects of inference workloads.
-TODO: Finalize and update the charter after the initial kick off meeting on 3/1/2018.
<!-- END CUSTOM CONTENT -->