Setting up Spark cluster in Standalone mode

Objective:

This project helps setting-up spark cluster in standalone mode on mac or windows inside docker terminal. Idea is to have a playground for learning Spark using pyspark or other Interpretors that can be setup on this image.

Here is the referenced article used to create the functional version of the same docker:

Please ensure to set below shown property in docker desktop before starting wtih running commands. Buildkit property must be set to false

Step by Step Process:

Clone the repo.

Run build.bat or build.sh depending upon windows or linux on which you are running the command.
```
build.bat 
```
Depending upon size of images, speed of your connection, it may take some time to download all the images for the first time.
Run docker compose up command to start the container once step 1 is complete

Once above steps are done, you can access spark cluster using following links

URL for accessing UI of Spark nodes:

JupyterLab at localhost:8888;
Spark master at localhost:8080;
Spark worker I at localhost:8081;
Spark worker II at localhost:8082;

Optionally you can make an entry into /etc/hosts file to replace localhost names with corresponding node names

Pyspark notebook

Open the notebook JupyterLab at localhost:8888 and paste the below code to see Jupyter in action

from pyspark.sql import SparkSession
spark = SparkSession.\
        builder.\
        appName("pyspark-notebook").\
        master("spark://spark-master:7077").\
        config("spark.executor.memory", "512m").\
        getOrCreate()

import wget
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
wget.download(url)
data = spark.read.csv("iris.data")
data.show(n=5)

Areas to explore

Working with delta lake
Playing around with - Job can be submitted in master-node using below code. We are submitting the job in client mode.

spark-submit \
 --class com.sparkTutorial.input \
 --deploy-mode client \
 --master "spark://master-node:7077" \
 target/scala-2.12/sql-mongo-validation-assembly-0.1.jar

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
dockfiles		dockfiles
project		project
src/main/scala/com/sparkTutorial		src/main/scala/com/sparkTutorial
.gitignore		.gitignore
README.md		README.md
build.bat		build.bat
build.sbt		build.sbt
build.sh		build.sh
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Setting up Spark cluster in Standalone mode

Objective:

Step by Step Process:

URL for accessing UI of Spark nodes:

Pyspark notebook

Areas to explore

About

Releases

Packages

Languages

shantanugupta/spark-mongo-sqlserver-cluster-mode

Folders and files

Latest commit

History

Repository files navigation

Setting up Spark cluster in Standalone mode

Objective:

Step by Step Process:

URL for accessing UI of Spark nodes:

Pyspark notebook

Areas to explore

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages