From f7ae48beadc4c215d6b7a16a215c6948bd76e49d Mon Sep 17 00:00:00 2001 From: Bob Killen Date: Mon, 16 Jan 2023 13:55:40 -0600 Subject: Archive big data user group --- OWNERS_ALIASES | 4 --- archive/ug-big-data/OWNERS | 8 +++++ archive/ug-big-data/README.md | 46 ++++++++++++++++++++++++ archive/ug-big-data/resources.md | 61 ++++++++++++++++++++++++++++++++ communication/slack-config/channels.yaml | 1 + liaisons.md | 1 - sig-list.md | 1 - sigs.yaml | 38 -------------------- ug-big-data/OWNERS | 8 ----- ug-big-data/README.md | 46 ------------------------ ug-big-data/resources.md | 61 -------------------------------- 11 files changed, 116 insertions(+), 159 deletions(-) create mode 100644 archive/ug-big-data/OWNERS create mode 100644 archive/ug-big-data/README.md create mode 100644 archive/ug-big-data/resources.md delete mode 100644 ug-big-data/OWNERS delete mode 100644 ug-big-data/README.md delete mode 100644 ug-big-data/resources.md diff --git a/OWNERS_ALIASES b/OWNERS_ALIASES index 480d77da..47493ff4 100644 --- a/OWNERS_ALIASES +++ b/OWNERS_ALIASES @@ -142,10 +142,6 @@ aliases: wg-structured-logging-leads: - pohly - serathius - ug-big-data-leads: - - erikerlandson - - foxish - - liyinan926 ug-vmware-users-leads: - brysonshepherd - cantbewong diff --git a/archive/ug-big-data/OWNERS b/archive/ug-big-data/OWNERS new file mode 100644 index 00000000..cef3c5d3 --- /dev/null +++ b/archive/ug-big-data/OWNERS @@ -0,0 +1,8 @@ +# See the OWNERS docs at https://go.k8s.io/owners + +reviewers: + - ug-big-data-leads +approvers: + - ug-big-data-leads +labels: + - ug/big-data diff --git a/archive/ug-big-data/README.md b/archive/ug-big-data/README.md new file mode 100644 index 00000000..9fd546e8 --- /dev/null +++ b/archive/ug-big-data/README.md @@ -0,0 +1,46 @@ + +# Big Data User Group + +Serve as a community resource for advising big data and data science related software projects on techniques and best practices for integrating with Kubernetes. Represents the concerns of users from big data communities to Kubernetes for the purposes of driving new features and other enhancements, based on big data use cases. + +## Meetings +*Joining the [mailing list](https://groups.google.com/forum/#!forum/kubernetes-ug-big-data) for the group will typically add invites for the following meetings to your calendar.* +* Regular User Group Meeting: [Wednesdays at 18:00 UTC](https://zoom.us/my/ug.big.data) (biweekly). [Convert to your timezone](http://www.thetimezoneconverter.com/?t=18:00&tz=UTC). + * [Meeting notes and Agenda](https://docs.google.com/document/d/1pnF38NF6N5eM8DlK088XUW85Vms4V2uTsGZvSp8MNIA/edit). + * [Meeting recordings](https://docs.google.com/document/d/1pnF38NF6N5eM8DlK088XUW85Vms4V2uTsGZvSp8MNIA/edit). + +## Organizers + +* Erik Erlandson (**[@erikerlandson](https://github.com/erikerlandson)**), Red Hat +* Anirudh Ramanathan (**[@foxish](https://github.com/foxish)**), Rockset +* Yinan Li (**[@liyinan926](https://github.com/liyinan926)**), Google + +## Contact +- Slack: [#ug-big-data](https://kubernetes.slack.com/messages/ug-big-data) +- [Mailing list](https://groups.google.com/forum/#!forum/kubernetes-ug-big-data) +- [Open Community Issues/PRs](https://github.com/kubernetes/community/labels/ug%2Fbig-data) +- GitHub Teams: + - [@kubernetes/ug-big-data](https://github.com/orgs/kubernetes/teams/ug-big-data) - General Discussion +- Steering Committee Liaison: Tim Pepper (**[@tpepper](https://github.com/tpepper)**) + + +### Goals + +- Promoting best practices for Kubernetes integrations +- Advising big data communities on Kubernetes features +- Shepherding issues and pull requests from community members +- Hosting demos and discussions of big data integrations for Kubernetes + +### Non Goals + +- Promoting or otherwise advocating for any specific big data project +- Software and tooling communities that have no intersection with data science or big data + + diff --git a/archive/ug-big-data/resources.md b/archive/ug-big-data/resources.md new file mode 100644 index 00000000..c7b3b2d0 --- /dev/null +++ b/archive/ug-big-data/resources.md @@ -0,0 +1,61 @@ +# Resources + +## Kubernetes integration status by big data product + +### Spark + +[Apache Spark](https://spark.apache.org) is a distributed data processing framework. + +##### Status + +Kubernetes is supported as a mainline Spark scheduler since [release 2.3](https://spark.apache.org/releases/spark-release-2-3-0.html), see [the detailed documentation](https://spark.apache.org/docs/latest/running-on-kubernetes.html). +That work was done after the [Spark on Kubernetes original Design Proposal](https://docs.google.com/document/d/1_bBzOZ8rKiOSjQg78DXOA3ZBIo_KkDJjqxVuq0yXdew/edit#) +in the [apache-spark-on-k8s git repo](https://github.com/apache-spark-on-k8s/spark). + +##### Activities + +Enhancements are under development, with a good overview given [in this blog post](https://databricks.com/blog/2018/09/26/whats-new-for-apache-spark-on-kubernetes-in-the-upcoming-apache-spark-2-4-release.html). + +* Work is underway for Spark 2.4 to improve support and integration with HDFS. + * Design Document: [How Spark on Kubernetes will access Secure HDFS](https://docs.google.com/document/d/1RBnXD9jMDjGonOdKJ2bA1lN4AAV_1RwpU_ewFuCNWKg/edit#heading=h.verdza2f4fyd) +* Shuffle service design + * Design Document [Improving Spark Shuffle Reliability](https://docs.google.com/document/d/1uCkzGGVG17oGC6BJ75TpzLAZNorvrAU3FRd2X-rVHSM/edit) + * JIRA issue [SPARK-25299: Use remote storage for persisting shuffle data](https://issues.apache.org/jira/browse/SPARK-25299) + +### HDFS + +[Apache Hadoop HDFS](https://hadoop.apache.org/hdfs) is a distributed file system, the persistence layer for Hadoop. + +##### Status + +TODO, e.g. "No release yet." + +##### Activities + +* [Data Locality Doc](https://docs.google.com/document/d/1TAC6UQDS3M2sin2msFcZ9UBBQFyyz4jFKWw5BM54cQo/edit) +* ["HDFS on Kubernetes" git repository including Helm charts](https://github.com/apache-spark-on-k8s/kubernetes-HDFS) + +### Airflow + +[Apache Airflow](https://airflow.apache.org) is a platform to programmatically author, schedule and monitor workflows. + +##### Status + +The [Kubernetes executor](https://airflow.apache.org/kubernetes.html) has been introduced with Airflow [release 1.10.0](https://github.com/apache/incubator-airflow/blob/master/CHANGELOG.txt) with support of Kubernetes 1.10. + +##### Activities + +* [Airflow roadmap](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=71013666) + +### Flink + +[Apache Flink](https://flink.apache.org) is a distributed data processing framework. + +##### Status + +Flink 1.6 supports [running a session or job cluster on Kubernetes](https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/kubernetes.html). + +##### Activities + +* [Native support for Kubernetes as a Flink runtime](https://issues.apache.org/jira/browse/FLINK-9953) +* [Lyft is working on an operator](https://lists.apache.org/thread.html/aa941030440c1d9e34c35c0caf5ddd2456755337fc34a4edebb32929@%3Cdev.flink.apache.org%3E) diff --git a/communication/slack-config/channels.yaml b/communication/slack-config/channels.yaml index 20e3881f..f67561d7 100644 --- a/communication/slack-config/channels.yaml +++ b/communication/slack-config/channels.yaml @@ -462,6 +462,7 @@ channels: - name: tw-users - name: ug-big-data id: C0ELB338T + archived: true - name: ug-vmware - name: undistro - name: vcluster diff --git a/liaisons.md b/liaisons.md index 65558da9..f621a9b2 100644 --- a/liaisons.md +++ b/liaisons.md @@ -61,7 +61,6 @@ members will assume one of the departing members groups. | [WG Multitenancy](wg-multitenancy/README.md) | Benjamin Elder (**[@BenTheElder](https://github.com/BenTheElder)**) | | [WG Policy](wg-policy/README.md) | Christoph Blecker (**[@cblecker](https://github.com/cblecker)**) | | [WG Structured Logging](wg-structured-logging/README.md) | Nabarun Pal (**[@palnabarun](https://github.com/palnabarun)**) | -| [UG Big Data](ug-big-data/README.md) | Tim Pepper (**[@tpepper](https://github.com/tpepper)**) | | [UG VMware Users](ug-vmware-users/README.md) | Tim Pepper (**[@tpepper](https://github.com/tpepper)**) | | [Committee Code of Conduct](committee-code-of-conduct/README.md) | Tim Pepper (**[@tpepper](https://github.com/tpepper)**) | | [Committee Security Response](committee-security-response/README.md) | Stephen Augustus (**[@justaugustus](https://github.com/justaugustus)**) | diff --git a/sig-list.md b/sig-list.md index d2293520..27ca0c7e 100644 --- a/sig-list.md +++ b/sig-list.md @@ -74,7 +74,6 @@ When the need arises, a [new SIG can be created](sig-wg-lifecycle.md) | Name | Label |Organizers | Contact | Meetings | |------|-------|------------|--------|----------| -|[Big Data](ug-big-data/README.md)|[big-data](https://github.com/kubernetes/kubernetes/labels/ug%2Fbig-data)|* [Erik Erlandson](https://github.com/erikerlandson), Red Hat
* [Anirudh Ramanathan](https://github.com/foxish), Rockset
* [Yinan Li](https://github.com/liyinan926), Google
|* [Slack](https://kubernetes.slack.com/messages/ug-big-data)
* [Mailing List](https://groups.google.com/forum/#!forum/kubernetes-ug-big-data)|* Regular User Group Meeting: [Wednesdays at 18:00 UTC (biweekly)](https://zoom.us/my/ug.big.data)
|[VMware Users](ug-vmware-users/README.md)|[vmware-users](https://github.com/kubernetes/kubernetes/labels/ug%2Fvmware-users)|* [Steve Wong](https://github.com/cantbewong), VMware
* [Myles Gray](https://github.com/mylesagray), VMware
|* [Slack](https://kubernetes.slack.com/messages/ug-vmware)
* [Mailing List](https://groups.google.com/forum/#!forum/kubernetes-ug-vmware)|* Regular User Group Meeting: [Thursdays at 11:00 PT (Pacific Time) (monthly)](https://docs.google.com/document/d/1ujpqj4hhcIBrSCK2qn6J1r--3QyD96rfDjXTZQ7n7Mw/edit)
### Committees diff --git a/sigs.yaml b/sigs.yaml index b173b2a7..539752f8 100644 --- a/sigs.yaml +++ b/sigs.yaml @@ -3314,44 +3314,6 @@ workinggroups: github: palnabarun name: Nabarun Pal usergroups: -- dir: ug-big-data - name: Big Data - mission_statement: > - Serve as a community resource for advising big data and data science related software - projects on techniques and best practices for integrating with Kubernetes. Represents - the concerns of users from big data communities to Kubernetes for the purposes - of driving new features and other enhancements, based on big data use cases. - - label: big-data - leadership: - chairs: - - github: erikerlandson - name: Erik Erlandson - company: Red Hat - - github: foxish - name: Anirudh Ramanathan - company: Rockset - - github: liyinan926 - name: Yinan Li - company: Google - meetings: - - description: Regular User Group Meeting - day: Wednesday - time: "18:00" - tz: UTC - frequency: biweekly - url: https://zoom.us/my/ug.big.data - archive_url: https://docs.google.com/document/d/1pnF38NF6N5eM8DlK088XUW85Vms4V2uTsGZvSp8MNIA/edit - recordings_url: https://docs.google.com/document/d/1pnF38NF6N5eM8DlK088XUW85Vms4V2uTsGZvSp8MNIA/edit - contact: - slack: ug-big-data - mailing_list: https://groups.google.com/forum/#!forum/kubernetes-ug-big-data - teams: - - name: ug-big-data - description: General Discussion - liaison: - github: tpepper - name: Tim Pepper - dir: ug-vmware-users name: VMware Users mission_statement: > diff --git a/ug-big-data/OWNERS b/ug-big-data/OWNERS deleted file mode 100644 index cef3c5d3..00000000 --- a/ug-big-data/OWNERS +++ /dev/null @@ -1,8 +0,0 @@ -# See the OWNERS docs at https://go.k8s.io/owners - -reviewers: - - ug-big-data-leads -approvers: - - ug-big-data-leads -labels: - - ug/big-data diff --git a/ug-big-data/README.md b/ug-big-data/README.md deleted file mode 100644 index 9fd546e8..00000000 --- a/ug-big-data/README.md +++ /dev/null @@ -1,46 +0,0 @@ - -# Big Data User Group - -Serve as a community resource for advising big data and data science related software projects on techniques and best practices for integrating with Kubernetes. Represents the concerns of users from big data communities to Kubernetes for the purposes of driving new features and other enhancements, based on big data use cases. - -## Meetings -*Joining the [mailing list](https://groups.google.com/forum/#!forum/kubernetes-ug-big-data) for the group will typically add invites for the following meetings to your calendar.* -* Regular User Group Meeting: [Wednesdays at 18:00 UTC](https://zoom.us/my/ug.big.data) (biweekly). [Convert to your timezone](http://www.thetimezoneconverter.com/?t=18:00&tz=UTC). - * [Meeting notes and Agenda](https://docs.google.com/document/d/1pnF38NF6N5eM8DlK088XUW85Vms4V2uTsGZvSp8MNIA/edit). - * [Meeting recordings](https://docs.google.com/document/d/1pnF38NF6N5eM8DlK088XUW85Vms4V2uTsGZvSp8MNIA/edit). - -## Organizers - -* Erik Erlandson (**[@erikerlandson](https://github.com/erikerlandson)**), Red Hat -* Anirudh Ramanathan (**[@foxish](https://github.com/foxish)**), Rockset -* Yinan Li (**[@liyinan926](https://github.com/liyinan926)**), Google - -## Contact -- Slack: [#ug-big-data](https://kubernetes.slack.com/messages/ug-big-data) -- [Mailing list](https://groups.google.com/forum/#!forum/kubernetes-ug-big-data) -- [Open Community Issues/PRs](https://github.com/kubernetes/community/labels/ug%2Fbig-data) -- GitHub Teams: - - [@kubernetes/ug-big-data](https://github.com/orgs/kubernetes/teams/ug-big-data) - General Discussion -- Steering Committee Liaison: Tim Pepper (**[@tpepper](https://github.com/tpepper)**) - - -### Goals - -- Promoting best practices for Kubernetes integrations -- Advising big data communities on Kubernetes features -- Shepherding issues and pull requests from community members -- Hosting demos and discussions of big data integrations for Kubernetes - -### Non Goals - -- Promoting or otherwise advocating for any specific big data project -- Software and tooling communities that have no intersection with data science or big data - - diff --git a/ug-big-data/resources.md b/ug-big-data/resources.md deleted file mode 100644 index c7b3b2d0..00000000 --- a/ug-big-data/resources.md +++ /dev/null @@ -1,61 +0,0 @@ -# Resources - -## Kubernetes integration status by big data product - -### Spark - -[Apache Spark](https://spark.apache.org) is a distributed data processing framework. - -##### Status - -Kubernetes is supported as a mainline Spark scheduler since [release 2.3](https://spark.apache.org/releases/spark-release-2-3-0.html), see [the detailed documentation](https://spark.apache.org/docs/latest/running-on-kubernetes.html). -That work was done after the [Spark on Kubernetes original Design Proposal](https://docs.google.com/document/d/1_bBzOZ8rKiOSjQg78DXOA3ZBIo_KkDJjqxVuq0yXdew/edit#) -in the [apache-spark-on-k8s git repo](https://github.com/apache-spark-on-k8s/spark). - -##### Activities - -Enhancements are under development, with a good overview given [in this blog post](https://databricks.com/blog/2018/09/26/whats-new-for-apache-spark-on-kubernetes-in-the-upcoming-apache-spark-2-4-release.html). - -* Work is underway for Spark 2.4 to improve support and integration with HDFS. - * Design Document: [How Spark on Kubernetes will access Secure HDFS](https://docs.google.com/document/d/1RBnXD9jMDjGonOdKJ2bA1lN4AAV_1RwpU_ewFuCNWKg/edit#heading=h.verdza2f4fyd) -* Shuffle service design - * Design Document [Improving Spark Shuffle Reliability](https://docs.google.com/document/d/1uCkzGGVG17oGC6BJ75TpzLAZNorvrAU3FRd2X-rVHSM/edit) - * JIRA issue [SPARK-25299: Use remote storage for persisting shuffle data](https://issues.apache.org/jira/browse/SPARK-25299) - -### HDFS - -[Apache Hadoop HDFS](https://hadoop.apache.org/hdfs) is a distributed file system, the persistence layer for Hadoop. - -##### Status - -TODO, e.g. "No release yet." - -##### Activities - -* [Data Locality Doc](https://docs.google.com/document/d/1TAC6UQDS3M2sin2msFcZ9UBBQFyyz4jFKWw5BM54cQo/edit) -* ["HDFS on Kubernetes" git repository including Helm charts](https://github.com/apache-spark-on-k8s/kubernetes-HDFS) - -### Airflow - -[Apache Airflow](https://airflow.apache.org) is a platform to programmatically author, schedule and monitor workflows. - -##### Status - -The [Kubernetes executor](https://airflow.apache.org/kubernetes.html) has been introduced with Airflow [release 1.10.0](https://github.com/apache/incubator-airflow/blob/master/CHANGELOG.txt) with support of Kubernetes 1.10. - -##### Activities - -* [Airflow roadmap](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=71013666) - -### Flink - -[Apache Flink](https://flink.apache.org) is a distributed data processing framework. - -##### Status - -Flink 1.6 supports [running a session or job cluster on Kubernetes](https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/kubernetes.html). - -##### Activities - -* [Native support for Kubernetes as a Flink runtime](https://issues.apache.org/jira/browse/FLINK-9953) -* [Lyft is working on an operator](https://lists.apache.org/thread.html/aa941030440c1d9e34c35c0caf5ddd2456755337fc34a4edebb32929@%3Cdev.flink.apache.org%3E) -- cgit v1.2.3