Skip to content

PySpark 2.4.3 Docker environment for development and testing

License

Notifications You must be signed in to change notification settings

5thempire/pyspark

Repository files navigation

PySpark

Docker Cloud Automated build Docker Cloud Build Status GitHub

PySpark is a unified analytics engine. For documentation you should check Spark and PySpark.

This meant to be a platform for development and testing.

PySpark version 2.4.3

alt text

How to use this image

Start a pyspark instance

docker run -ti 5thempire/pyspark:latest spark-submit /opt/pyspark/pi.py   

...via docker-compose

Example docker-compose.yml for pyspark

version: '3.4'

services:
  spark:
    image: 5thempire/pyspark:latest
    container_name: spark
    stdin_open: true
    tty: true
    volumes:
      - ./code:/opt/pyspark
    ports:
      - "8080:8080"
      - "8888:8888"

How to use the Makefile

The Makefile is meant to automate with ease the typical tasks in the project.

To set it up, you should run

make setup

For a pi sample, you should run

make pi-sample

To explore pdb, run the following

make pi-debug

Samples

All the samples are based upon pi.py.