PySpark is a unified analytics engine. For documentation you should check Spark and PySpark.
This meant to be a platform for development and testing.
PySpark version 2.4.3
docker run -ti 5thempire/pyspark:latest spark-submit /opt/pyspark/pi.py
Example docker-compose.yml
for pyspark
version: '3.4'
services:
spark:
image: 5thempire/pyspark:latest
container_name: spark
stdin_open: true
tty: true
volumes:
- ./code:/opt/pyspark
ports:
- "8080:8080"
- "8888:8888"
The Makefile is meant to automate with ease the typical tasks in the project.
To set it up, you should run
make setup
For a pi sample, you should run
make pi-sample
To explore pdb, run the following
make pi-debug
All the samples are based upon pi.py.