From 122985bc00bf83d0fa4ef2b4f8cc85245eb17998 Mon Sep 17 00:00:00 2001 From: Erik Erlandson Date: Mon, 25 Feb 2019 14:59:00 -0700 Subject: Port the Big Data SIG to new User Group format --- OWNERS_ALIASES | 8 ++--- sig-big-data/OWNERS | 8 ----- sig-big-data/README.md | 56 -------------------------------- sig-big-data/resources.md | 61 ----------------------------------- sig-list.md | 7 +++- sigs.yaml | 82 +++++++++++++++++++++-------------------------- ug-big-data/OWNERS | 8 +++++ ug-big-data/README.md | 53 ++++++++++++++++++++++++++++++ ug-big-data/resources.md | 61 +++++++++++++++++++++++++++++++++++ 9 files changed, 168 insertions(+), 176 deletions(-) delete mode 100644 sig-big-data/OWNERS delete mode 100644 sig-big-data/README.md delete mode 100644 sig-big-data/resources.md create mode 100644 ug-big-data/OWNERS create mode 100644 ug-big-data/README.md create mode 100644 ug-big-data/resources.md diff --git a/OWNERS_ALIASES b/OWNERS_ALIASES index a598428f..4cd6a126 100644 --- a/OWNERS_ALIASES +++ b/OWNERS_ALIASES @@ -30,10 +30,6 @@ aliases: - dstrebel - khenidak - feiskyer - sig-big-data-leads: - - foxish - - erikerlandson - - liyinan926 sig-cli-leads: - soltysh - seans3 @@ -158,6 +154,10 @@ aliases: - joelsmith - cji - jaybeale + ug-big-data-leads: + - foxish + - erikerlandson + - liyinan926 committee-code-of-conduct: - jdumars - parispittman diff --git a/sig-big-data/OWNERS b/sig-big-data/OWNERS deleted file mode 100644 index 045d7c13..00000000 --- a/sig-big-data/OWNERS +++ /dev/null @@ -1,8 +0,0 @@ -# See the OWNERS docs at https://go.k8s.io/owners - -reviewers: - - sig-big-data-leads -approvers: - - sig-big-data-leads -labels: - - sig/big-data diff --git a/sig-big-data/README.md b/sig-big-data/README.md deleted file mode 100644 index 762af01a..00000000 --- a/sig-big-data/README.md +++ /dev/null @@ -1,56 +0,0 @@ - -# Big Data Special Interest Group - -Covers deploying and operating big data applications (Spark, Kafka, Hadoop, Flink, Storm, etc) on Kubernetes. We focus on integrations with big data applications and architecting the best ways to run them on Kubernetes. - -## Meetings -* Regular SIG Meeting: [Wednesdays at 17:00 UTC](https://docs.google.com/document/d/1FQx0BPlkkl1Bn0c9ocVBxYIKojpmrS1CFP5h0DI68AE/edit) (biweekly). [Convert to your timezone](http://www.thetimezoneconverter.com/?t=17:00&tz=UTC). - * [Meeting notes and Agenda](https://docs.google.com/document/d/1pnF38NF6N5eM8DlK088XUW85Vms4V2uTsGZvSp8MNIA/edit). - * [Meeting recordings](https://docs.google.com/document/d/1pnF38NF6N5eM8DlK088XUW85Vms4V2uTsGZvSp8MNIA/edit). - -## Leadership - -### Chairs -The Chairs of the SIG run operations and processes governing the SIG. - -* Anirudh Ramanathan (**[@foxish](https://github.com/foxish)**), Rockset -* Erik Erlandson (**[@erikerlandson](https://github.com/erikerlandson)**), Red Hat -* Yinan Li (**[@liyinan926](https://github.com/liyinan926)**), Google - -## Contact -* [Slack](https://kubernetes.slack.com/messages/sig-big-data) -* [Mailing list](https://groups.google.com/forum/#!forum/kubernetes-sig-big-data) -* [Open Community Issues/PRs](https://github.com/kubernetes/community/labels/sig%2Fbig-data) - -## GitHub Teams - -The below teams can be mentioned on issues and PRs in order to get attention from the right people. -Note that the links to display team membership will only work if you are a member of the org. - -| Team Name | Details | Description | -| --------- |:-------:| ----------- | -| @kubernetes/sig-big-data-api-reviews | [link](https://github.com/orgs/kubernetes/teams/sig-big-data-api-reviews) | API Changes and Reviews | -| @kubernetes/sig-big-data-bugs | [link](https://github.com/orgs/kubernetes/teams/sig-big-data-bugs) | Bug Triage and Troubleshooting | -| @kubernetes/sig-big-data-feature-requests | [link](https://github.com/orgs/kubernetes/teams/sig-big-data-feature-requests) | Feature Requests | -| @kubernetes/sig-big-data-misc | [link](https://github.com/orgs/kubernetes/teams/sig-big-data-misc) | General Discussion | -| @kubernetes/sig-big-data-pr-reviews | [link](https://github.com/orgs/kubernetes/teams/sig-big-data-pr-reviews) | PR Reviews | -| @kubernetes/sig-big-data-proposals | [link](https://github.com/orgs/kubernetes/teams/sig-big-data-proposals) | Design Proposals | -| @kubernetes/sig-big-data-test-failures | [link](https://github.com/orgs/kubernetes/teams/sig-big-data-test-failures) | Test Failures and Triage | - - -## Goals -* Design and architect ways to run big data applications effectively on Kubernetes -* Discuss ongoing implementation efforts -* Discuss resource sharing and multi-tenancy (in the context of big data applications) -* Suggest Kubernetes features where we see a need - -## Non-goals -* Endorsing any particular tool/framework - diff --git a/sig-big-data/resources.md b/sig-big-data/resources.md deleted file mode 100644 index c7b3b2d0..00000000 --- a/sig-big-data/resources.md +++ /dev/null @@ -1,61 +0,0 @@ -# Resources - -## Kubernetes integration status by big data product - -### Spark - -[Apache Spark](https://spark.apache.org) is a distributed data processing framework. - -##### Status - -Kubernetes is supported as a mainline Spark scheduler since [release 2.3](https://spark.apache.org/releases/spark-release-2-3-0.html), see [the detailed documentation](https://spark.apache.org/docs/latest/running-on-kubernetes.html). -That work was done after the [Spark on Kubernetes original Design Proposal](https://docs.google.com/document/d/1_bBzOZ8rKiOSjQg78DXOA3ZBIo_KkDJjqxVuq0yXdew/edit#) -in the [apache-spark-on-k8s git repo](https://github.com/apache-spark-on-k8s/spark). - -##### Activities - -Enhancements are under development, with a good overview given [in this blog post](https://databricks.com/blog/2018/09/26/whats-new-for-apache-spark-on-kubernetes-in-the-upcoming-apache-spark-2-4-release.html). - -* Work is underway for Spark 2.4 to improve support and integration with HDFS. - * Design Document: [How Spark on Kubernetes will access Secure HDFS](https://docs.google.com/document/d/1RBnXD9jMDjGonOdKJ2bA1lN4AAV_1RwpU_ewFuCNWKg/edit#heading=h.verdza2f4fyd) -* Shuffle service design - * Design Document [Improving Spark Shuffle Reliability](https://docs.google.com/document/d/1uCkzGGVG17oGC6BJ75TpzLAZNorvrAU3FRd2X-rVHSM/edit) - * JIRA issue [SPARK-25299: Use remote storage for persisting shuffle data](https://issues.apache.org/jira/browse/SPARK-25299) - -### HDFS - -[Apache Hadoop HDFS](https://hadoop.apache.org/hdfs) is a distributed file system, the persistence layer for Hadoop. - -##### Status - -TODO, e.g. "No release yet." - -##### Activities - -* [Data Locality Doc](https://docs.google.com/document/d/1TAC6UQDS3M2sin2msFcZ9UBBQFyyz4jFKWw5BM54cQo/edit) -* ["HDFS on Kubernetes" git repository including Helm charts](https://github.com/apache-spark-on-k8s/kubernetes-HDFS) - -### Airflow - -[Apache Airflow](https://airflow.apache.org) is a platform to programmatically author, schedule and monitor workflows. - -##### Status - -The [Kubernetes executor](https://airflow.apache.org/kubernetes.html) has been introduced with Airflow [release 1.10.0](https://github.com/apache/incubator-airflow/blob/master/CHANGELOG.txt) with support of Kubernetes 1.10. - -##### Activities - -* [Airflow roadmap](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=71013666) - -### Flink - -[Apache Flink](https://flink.apache.org) is a distributed data processing framework. - -##### Status - -Flink 1.6 supports [running a session or job cluster on Kubernetes](https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/kubernetes.html). - -##### Activities - -* [Native support for Kubernetes as a Flink runtime](https://issues.apache.org/jira/browse/FLINK-9953) -* [Lyft is working on an operator](https://lists.apache.org/thread.html/aa941030440c1d9e34c35c0caf5ddd2456755337fc34a4edebb32929@%3Cdev.flink.apache.org%3E) diff --git a/sig-list.md b/sig-list.md index 7cd64747..6a7c50f8 100644 --- a/sig-list.md +++ b/sig-list.md @@ -29,7 +29,6 @@ When the need arises, a [new SIG can be created](sig-wg-lifecycle.md) |[Autoscaling](sig-autoscaling/README.md)|autoscaling|* [Marcin Wielgus](https://github.com/mwielgus), Google
|* [Slack](https://kubernetes.slack.com/messages/sig-autoscaling)
* [Mailing List](https://groups.google.com/forum/#!forum/kubernetes-sig-autoscaling)|* Regular SIG Meeting: [Mondays at 14:00 UTC (biweekly/triweekly)](https://docs.google.com/document/d/1FQx0BPlkkl1Bn0c9ocVBxYIKojpmrS1CFP5h0DI68AE/edit)
|[AWS](sig-aws/README.md)|aws|* [Justin Santa Barbara](https://github.com/justinsb)
* [Kris Nova](https://github.com/kris-nova), VMware
* [Nishi Davidson](https://github.com/d-nishi), AWS
|* [Slack](https://kubernetes.slack.com/messages/sig-aws)
* [Mailing List](https://groups.google.com/forum/#!forum/kubernetes-sig-aws)|* Regular SIG Meeting: [Fridays at 9:00 PT (Pacific Time) (biweekly 2019 start date: Jan. 11th)](https://docs.google.com/document/d/1FQx0BPlkkl1Bn0c9ocVBxYIKojpmrS1CFP5h0DI68AE/edit)
|[Azure](sig-azure/README.md)|azure|* [Stephen Augustus](https://github.com/justaugustus), VMware
* [Dave Strebel](https://github.com/dstrebel), Microsoft
|* [Slack](https://kubernetes.slack.com/messages/sig-azure)
* [Mailing List](https://groups.google.com/forum/#!forum/kubernetes-sig-azure)|* Regular SIG Meeting: [Wednesdays at 16:00 UTC (biweekly)](https://docs.google.com/document/d/1FQx0BPlkkl1Bn0c9ocVBxYIKojpmrS1CFP5h0DI68AE/edit)
-|[Big Data](sig-big-data/README.md)|big-data|* [Anirudh Ramanathan](https://github.com/foxish), Rockset
* [Erik Erlandson](https://github.com/erikerlandson), Red Hat
* [Yinan Li](https://github.com/liyinan926), Google
|* [Slack](https://kubernetes.slack.com/messages/sig-big-data)
* [Mailing List](https://groups.google.com/forum/#!forum/kubernetes-sig-big-data)|* Regular SIG Meeting: [Wednesdays at 17:00 UTC (biweekly)](https://docs.google.com/document/d/1FQx0BPlkkl1Bn0c9ocVBxYIKojpmrS1CFP5h0DI68AE/edit)
|[CLI](sig-cli/README.md)|cli|* [Maciej Szulik](https://github.com/soltysh), Red Hat
* [Sean Sullivan](https://github.com/seans3), Google
|* [Slack](https://kubernetes.slack.com/messages/sig-cli)
* [Mailing List](https://groups.google.com/forum/#!forum/kubernetes-sig-cli)|* Regular SIG Meeting: [Wednesdays at 09:00 PT (Pacific Time) (biweekly)](https://docs.google.com/document/d/1FQx0BPlkkl1Bn0c9ocVBxYIKojpmrS1CFP5h0DI68AE/edit)
|[Cloud Provider](sig-cloud-provider/README.md)|cloud-provider|* [Andrew Sy Kim](https://github.com/andrewsykim), VMware
* [Chris Hoge](https://github.com/hogepodge), OpenStack Foundation
* [Jago Macleod](https://github.com/jagosan), Google
|* [Slack](https://kubernetes.slack.com/messages/sig-cloud-provider)
* [Mailing List](https://groups.google.com/forum/#!forum/kubernetes-sig-cloud-provider)|* Regular SIG Meeting: [Wednesdays at 1:00 PT (Pacific Time) (biweekly)](https://docs.google.com/document/d/1FQx0BPlkkl1Bn0c9ocVBxYIKojpmrS1CFP5h0DI68AE/edit)
* (cloud-provider-extraction-migration) Weekly Sync removing the in-tree cloud providers led by @cheftako and @andrewsykim: [Thursdays at 13:30 PT (Pacific Time) (weekly)](https://docs.google.com/document/d/1KLsGGzNXQbsPeELCeF_q-f0h0CEGSe20xiwvcR2NlYM/edit)
|[Cluster Lifecycle](sig-cluster-lifecycle/README.md)|cluster-lifecycle|* [Robert Bailey](https://github.com/roberthbailey), Google
* [Lucas Käldström](https://github.com/luxas), Luxas Labs (occasionally contracting for Weaveworks)
* [Timothy St. Clair](https://github.com/timothysc), VMware
|* [Slack](https://kubernetes.slack.com/messages/sig-cluster-lifecycle)
* [Mailing List](https://groups.google.com/forum/#!forum/kubernetes-sig-cluster-lifecycle)|* Regular SIG Meeting: [Tuesdays at 09:00 PT (Pacific Time) (biweekly)](https://docs.google.com/document/d/1FQx0BPlkkl1Bn0c9ocVBxYIKojpmrS1CFP5h0DI68AE/edit)
* kubeadm Office Hours: [Wednesdays at 09:00 PT (Pacific Time) (weekly)](https://docs.google.com/document/d/1FQx0BPlkkl1Bn0c9ocVBxYIKojpmrS1CFP5h0DI68AE/edit)
* Cluster API office hours: [Wednesdays at 10:00 PT (Pacific Time) (weekly)](https://docs.google.com/document/d/1FQx0BPlkkl1Bn0c9ocVBxYIKojpmrS1CFP5h0DI68AE/edit)
* Cluster API Provider Implementers' office hours (EMEA): [Wednesdays at 15:00 CEST (Central European Summer Time) (weekly)](https://docs.google.com/document/d/1FQx0BPlkkl1Bn0c9ocVBxYIKojpmrS1CFP5h0DI68AE/edit)
* Cluster API Provider Implementers' office hours (US West Coast): [Tuesdays at 12:00 PT (Pacific Time) (weekly)](https://docs.google.com/document/d/1FQx0BPlkkl1Bn0c9ocVBxYIKojpmrS1CFP5h0DI68AE/edit)
* Cluster API (AWS implementation) office hours: [Mondays at 10:00 PT (Pacific Time) (biweekly)](https://docs.google.com/document/d/1FQx0BPlkkl1Bn0c9ocVBxYIKojpmrS1CFP5h0DI68AE/edit)
* kops Office Hours: [Fridays at 09:00 PT (Pacific Time) (biweekly)](https://docs.google.com/document/d/1FQx0BPlkkl1Bn0c9ocVBxYIKojpmrS1CFP5h0DI68AE/edit)
* Kubespray Office Hours: [Wednesdays at 08:00 PT (Pacific Time) (biweekly)](https://docs.google.com/document/d/1FQx0BPlkkl1Bn0c9ocVBxYIKojpmrS1CFP5h0DI68AE/edit)
@@ -68,6 +67,12 @@ When the need arises, a [new SIG can be created](sig-wg-lifecycle.md) |[Resource Management](wg-resource-management/README.md)||* [Vishnu Kannan](https://github.com/vishh), Google
* [Derek Carr](https://github.com/derekwaynecarr), Red Hat
|* [Slack](https://kubernetes.slack.com/messages/wg-resource-mgmt)
* [Mailing List](https://groups.google.com/forum/#!forum/kubernetes-wg-resource-management)|* Regular WG Meeting: [Wednesdays at 11:00 PT (Pacific Time) (biweekly (On demand))](https://docs.google.com/document/d/1FQx0BPlkkl1Bn0c9ocVBxYIKojpmrS1CFP5h0DI68AE/edit)
|[Security Audit](wg-security-audit/README.md)||* [Aaron Small](https://github.com/aasmall), Google
* [Joel Smith](https://github.com/joelsmith), Red Hat
* [Craig Ingram](https://github.com/cji), Salesforce
* [Jay Beale](https://github.com/jaybeale), InGuardians
|* [Slack](https://kubernetes.slack.com/messages/wg-security-audit)
* [Mailing List](https://groups.google.com/forum/#!forum/kubernetes-wg-security-audit)|* Regular WG Meeting: [Mondays at 13:00 PT (Pacific Time) (weekly)](https://docs.google.com/document/d/1RbC4SBZBlKth7IjYv_NaEpnmLGwMJ0ElpUOmsG-bdRA/edit)
+### Master User Group List + +| Name | Organizers | Contact | Meetings | +|------|------------|---------|----------| +|[Big Data](ug-big-data/README.md)|* [Anirudh Ramanathan](https://github.com/foxish), Rockset
* [Erik Erlandson](https://github.com/erikerlandson), Red Hat
* [Yinan Li](https://github.com/liyinan926), Google
|* [Slack](https://kubernetes.slack.com/messages/ug-big-data)
* [Mailing List](https://groups.google.com/forum/#!forum/kubernetes-ug-big-data)|* Regular User Group Meeting: [Wednesdays at 18:00 UTC (biweekly)](https://docs.google.com/document/d/1FQx0BPlkkl1Bn0c9ocVBxYIKojpmrS1CFP5h0DI68AE/edit)
+ ### Master Committee List | Name | Label | Members | Contact | diff --git a/sigs.yaml b/sigs.yaml index 7f743a29..28627e8d 100644 --- a/sigs.yaml +++ b/sigs.yaml @@ -606,52 +606,6 @@ sigs: owners: - https://raw.githubusercontent.com/kubernetes-sigs/azuredisk-csi-driver/master/OWNERS - https://raw.githubusercontent.com/kubernetes-sigs/azurefile-csi-driver/master/OWNERS - - name: Big Data - dir: sig-big-data - mission_statement: > - Covers deploying and operating big data applications (Spark, Kafka, - Hadoop, Flink, Storm, etc) on Kubernetes. We focus on integrations with - big data applications and architecting the best ways to run them on Kubernetes. - charter_link: - label: big-data - leadership: - chairs: - - name: Anirudh Ramanathan - github: foxish - company: Rockset - - name: Erik Erlandson - github: erikerlandson - company: Red Hat - - name: Yinan Li - github: liyinan926 - company: Google - meetings: - - description: Regular SIG Meeting - day: Wednesday - time: "17:00" - tz: "UTC" - frequency: biweekly - url: https://docs.google.com/document/d/1FQx0BPlkkl1Bn0c9ocVBxYIKojpmrS1CFP5h0DI68AE/edit - archive_url: https://docs.google.com/document/d/1pnF38NF6N5eM8DlK088XUW85Vms4V2uTsGZvSp8MNIA/edit - recordings_url: https://docs.google.com/document/d/1pnF38NF6N5eM8DlK088XUW85Vms4V2uTsGZvSp8MNIA/edit - contact: - slack: sig-big-data - mailing_list: https://groups.google.com/forum/#!forum/kubernetes-sig-big-data - teams: - - name: sig-big-data-api-reviews - description: API Changes and Reviews - - name: sig-big-data-bugs - description: Bug Triage and Troubleshooting - - name: sig-big-data-feature-requests - description: Feature Requests - - name: sig-big-data-misc - description: General Discussion - - name: sig-big-data-pr-reviews - description: PR Reviews - - name: sig-big-data-proposals - description: Design Proposals - - name: sig-big-data-test-failures - description: Test Failures and Triage - name: CLI dir: sig-cli mission_statement: > @@ -2514,6 +2468,42 @@ workinggroups: contact: slack: wg-k8s-infra mailing_list: https://groups.google.com/forum/#!forum/kubernetes-wg-k8s-infra +usergroups: + - name: Big Data + dir: ug-big-data + mission_statement: > + Serve as a community resource for advising big data and data science related software projects + on techniques and best practices for integrating with Kubernetes. + Represents the concerns of users from big data communities to Kubernetes for the purposes of + driving new features and other enhancements, based on big data use cases. + charter_link: + label: big-data + leadership: + chairs: + - name: Anirudh Ramanathan + github: foxish + company: Rockset + - name: Erik Erlandson + github: erikerlandson + company: Red Hat + - name: Yinan Li + github: liyinan926 + company: Google + meetings: + - description: Regular User Group Meeting + day: Wednesday + time: "18:00" + tz: "UTC" + frequency: biweekly + url: https://docs.google.com/document/d/1FQx0BPlkkl1Bn0c9ocVBxYIKojpmrS1CFP5h0DI68AE/edit + archive_url: https://docs.google.com/document/d/1pnF38NF6N5eM8DlK088XUW85Vms4V2uTsGZvSp8MNIA/edit + recordings_url: https://docs.google.com/document/d/1pnF38NF6N5eM8DlK088XUW85Vms4V2uTsGZvSp8MNIA/edit + contact: + slack: ug-big-data + mailing_list: https://groups.google.com/forum/#!forum/kubernetes-ug-big-data + teams: + - name: ug-big-data + description: General Discussion committees: - name: Steering dir: committee-steering diff --git a/ug-big-data/OWNERS b/ug-big-data/OWNERS new file mode 100644 index 00000000..cef3c5d3 --- /dev/null +++ b/ug-big-data/OWNERS @@ -0,0 +1,8 @@ +# See the OWNERS docs at https://go.k8s.io/owners + +reviewers: + - ug-big-data-leads +approvers: + - ug-big-data-leads +labels: + - ug/big-data diff --git a/ug-big-data/README.md b/ug-big-data/README.md new file mode 100644 index 00000000..f00b3421 --- /dev/null +++ b/ug-big-data/README.md @@ -0,0 +1,53 @@ + +# Big Data User Group + +Serve as a community resource for advising big data and data science related software projects on techniques and best practices for integrating with Kubernetes. Represents the concerns of users from big data communities to Kubernetes for the purposes of driving new features and other enhancements, based on big data use cases. + +## Meetings +* Regular User Group Meeting: [Wednesdays at 18:00 UTC](https://docs.google.com/document/d/1FQx0BPlkkl1Bn0c9ocVBxYIKojpmrS1CFP5h0DI68AE/edit) (biweekly). [Convert to your timezone](http://www.thetimezoneconverter.com/?t=18:00&tz=UTC). + * [Meeting notes and Agenda](https://docs.google.com/document/d/1pnF38NF6N5eM8DlK088XUW85Vms4V2uTsGZvSp8MNIA/edit). + * [Meeting recordings](https://docs.google.com/document/d/1pnF38NF6N5eM8DlK088XUW85Vms4V2uTsGZvSp8MNIA/edit). + +## Organizers + +* Anirudh Ramanathan (**[@foxish](https://github.com/foxish)**), Rockset +* Erik Erlandson (**[@erikerlandson](https://github.com/erikerlandson)**), Red Hat +* Yinan Li (**[@liyinan926](https://github.com/liyinan926)**), Google + +## Contact +* [Slack](https://kubernetes.slack.com/messages/ug-big-data) +* [Mailing list](https://groups.google.com/forum/#!forum/kubernetes-ug-big-data) +* [Open Community Issues/PRs](https://github.com/kubernetes/community/labels/ug%2Fbig-data) + + +## GitHub Teams + +The below teams can be mentioned on issues and PRs in order to get attention from the right people. +Note that the links to display team membership will only work if you are a member of the org. + +| Team Name | Details | Description | +| --------- |:-------:| ----------- | +| @kubernetes/ug-big-data | [link](https://github.com/orgs/kubernetes/teams/ug-big-data) | General Discussion | + + + +### Goals + +- Promoting best practices for Kubernetes integrations +- Advising big data communities on Kubernetes features +- Shepherding issues and pull requests from community members +- Hosting demos and discussions of big data integrations for Kubernetes + +### Non Goals + +- Promoting or otherwise advocating for any specific big data project +- Software and tooling communities that have no intersection with data science or big data + + diff --git a/ug-big-data/resources.md b/ug-big-data/resources.md new file mode 100644 index 00000000..c7b3b2d0 --- /dev/null +++ b/ug-big-data/resources.md @@ -0,0 +1,61 @@ +# Resources + +## Kubernetes integration status by big data product + +### Spark + +[Apache Spark](https://spark.apache.org) is a distributed data processing framework. + +##### Status + +Kubernetes is supported as a mainline Spark scheduler since [release 2.3](https://spark.apache.org/releases/spark-release-2-3-0.html), see [the detailed documentation](https://spark.apache.org/docs/latest/running-on-kubernetes.html). +That work was done after the [Spark on Kubernetes original Design Proposal](https://docs.google.com/document/d/1_bBzOZ8rKiOSjQg78DXOA3ZBIo_KkDJjqxVuq0yXdew/edit#) +in the [apache-spark-on-k8s git repo](https://github.com/apache-spark-on-k8s/spark). + +##### Activities + +Enhancements are under development, with a good overview given [in this blog post](https://databricks.com/blog/2018/09/26/whats-new-for-apache-spark-on-kubernetes-in-the-upcoming-apache-spark-2-4-release.html). + +* Work is underway for Spark 2.4 to improve support and integration with HDFS. + * Design Document: [How Spark on Kubernetes will access Secure HDFS](https://docs.google.com/document/d/1RBnXD9jMDjGonOdKJ2bA1lN4AAV_1RwpU_ewFuCNWKg/edit#heading=h.verdza2f4fyd) +* Shuffle service design + * Design Document [Improving Spark Shuffle Reliability](https://docs.google.com/document/d/1uCkzGGVG17oGC6BJ75TpzLAZNorvrAU3FRd2X-rVHSM/edit) + * JIRA issue [SPARK-25299: Use remote storage for persisting shuffle data](https://issues.apache.org/jira/browse/SPARK-25299) + +### HDFS + +[Apache Hadoop HDFS](https://hadoop.apache.org/hdfs) is a distributed file system, the persistence layer for Hadoop. + +##### Status + +TODO, e.g. "No release yet." + +##### Activities + +* [Data Locality Doc](https://docs.google.com/document/d/1TAC6UQDS3M2sin2msFcZ9UBBQFyyz4jFKWw5BM54cQo/edit) +* ["HDFS on Kubernetes" git repository including Helm charts](https://github.com/apache-spark-on-k8s/kubernetes-HDFS) + +### Airflow + +[Apache Airflow](https://airflow.apache.org) is a platform to programmatically author, schedule and monitor workflows. + +##### Status + +The [Kubernetes executor](https://airflow.apache.org/kubernetes.html) has been introduced with Airflow [release 1.10.0](https://github.com/apache/incubator-airflow/blob/master/CHANGELOG.txt) with support of Kubernetes 1.10. + +##### Activities + +* [Airflow roadmap](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=71013666) + +### Flink + +[Apache Flink](https://flink.apache.org) is a distributed data processing framework. + +##### Status + +Flink 1.6 supports [running a session or job cluster on Kubernetes](https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/kubernetes.html). + +##### Activities + +* [Native support for Kubernetes as a Flink runtime](https://issues.apache.org/jira/browse/FLINK-9953) +* [Lyft is working on an operator](https://lists.apache.org/thread.html/aa941030440c1d9e34c35c0caf5ddd2456755337fc34a4edebb32929@%3Cdev.flink.apache.org%3E) -- cgit v1.2.3