Apache Spark
Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.
Here are 328 public repositories matching this topic...
50+ DockerHub public images for Docker & Kubernetes - DevOps, CI/CD, GitHub Actions, CircleCI, Jenkins, TeamCity, Alpine, CentOS, Debian, Fedora, Ubuntu, Hadoop, Kafka, ZooKeeper, HBase, Cassandra, Solr, SolrCloud, Presto, Apache Drill, Nifi, Spark, Consul, Riak
-
Updated
Oct 8, 2024 - Shell
[PROJECT IS NO LONGER MAINTAINED] Wirbelsturm is a Vagrant and Puppet based tool to perform 1-click local and remote deployments, with a focus on big data tech like Kafka.
-
Updated
Feb 21, 2022 - Shell
基于Docker构建的Hadoop开发测试环境,包含Hadoop,Hive,HBase,Spark
-
Updated
May 26, 2019 - Shell
Large Tech Knowledge Base from 20 years in DevOps, Linux, Cloud, Big Data, AWS, GCP etc - gradually porting my large private knowledge base to public
-
Updated
Oct 13, 2024 - Shell
A Docker container with a full Hadoop cluster setup with Spark and Zeppelin
-
Updated
Feb 2, 2020 - Shell
🪐 1-click Kubeflow using ArgoCD
-
Updated
Aug 8, 2024 - Shell
The goal of this project is to build a docker cluster that gives access to Hadoop, HDFS, Hive, PySpark, Sqoop, Airflow, Kafka, Flume, Postgres, Cassandra, Hue, Zeppelin, Kadmin, Kafka Control Center and pgAdmin. This cluster is solely intended for usage in a development environment. Do not use it to run any production workloads.
-
Updated
Feb 27, 2023 - Shell
Created by Matei Zaharia
Released May 26, 2014
- Followers
- 427 followers
- Repository
- apache/spark
- Website
- spark.apache.org
- Wikipedia
- Wikipedia