Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
bin		bin
job_code		job_code
loader		loader
stack_bootstrap		stack_bootstrap
stack_job		stack_job
tester		tester
.python-version		.python-version
README.md		README.md

Repository files navigation

gluejob-cf

Directory Structure

bin - helper utilities written in bash

bootstrap.sh - create bootstrap cf stack based in stack_bootstrap folder
create_job.sh - initial aws glue job cf stack creation
delete_job.sh - deletes aws glue job cf stack
push_code.sh - push aws glue pyspark code to s3 bucket for later usage.
update_job.sh - if there any changes to stack_job folder template or parameters it will redeploy cf stack.

job_code - aws glue pyspark code which is uploaded to s3 bucket

main.py - source code of pyspark code which reads from glue data catalog table and writes output to s3 in parquet file format.

loader - test data upload python tool. writes csv to s3 with aws glue data catalog metadata with aws wrangler

sample_data - contains sample dataset
main.py - source code that writes test data to s3 and glue data catalog
poetry and pyenv files - python version manager and python dependency manager.

stack_bootstrap - cf stack template with parameters to bootstrap main stack located in stack_job folder. it creates glue code s3 bucket, aws glue database for table.

paramas.json - parameter files used in create of cf stack.
stack.yaml - cf template file

stack_job - main cf template which creates glue job. can eb updated if config changes.

paramas.json - parameter files used in create or update of cf stack.
stack.yaml - cf template file

tester - testing utility of test data set to check output of parquet files. only usable for small datasets because limited by local device memory size to fit data but reads whole data set and all partitions.

main.py - source code which reads test parquet data from s3.
poetry and pyenv files - python version manager and python dependency manager.

Local env dependencies

aws cli
jq
python 3.10
pip
poetry

Steps to Create Local Env

./bin/bootstrap.sh
cd loader; poetry run python main.py
./bin/push_code.sh
./bin/create_job.sh
./bin/update_job.sh
run glue job in console or aws cli
use tester with cd tester; poetry run python main.py
./bin/delete_job.sh

About

No description, website, or topics provided.

Report repository

Releases

No releases published

Packages

No packages published

Languages