data.world-nasa

This is an example Spark job that runs in AWS EMR. It utilizes the following technologies

Spark
Scala
Gradle
AWS EMR
Terraform

There are two main sections in this project: the EMR infrastructure as code and the Spark job jar itself.

Spark Job Jar

To set up the the jar, go to the data.world-nasa main directory and run gradle build

The jar will then be located under app/build/libs/app.jar.

If you want to run this job in EMR,

Upoad app.jar into an S3 bucket. Let's assume it is called s3-bucket-name
SSH into the parent node and simply run `spark-submit --class nasa.App s3a://s3-bucket-name/app.jar

Terraform for AWS EMR

To get it up and running, go into aws/terraform directory and run

terraform apply

When you want your code to be destroyed, run terraform destroy.

This project sets up appropriate VPC, IGW, subnets, SSH keys, EMR cluster, and opens up the EMR cluster to SSH from my IP + to access Spark History UI.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
app		app
aws/terraform		aws/terraform
gradle/wrapper		gradle/wrapper
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle		settings.gradle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

data.world-nasa

Spark Job Jar

Terraform for AWS EMR

About

Releases

Packages

Languages

hannahkamundson/data.world-nasa

Folders and files

Latest commit

History

Repository files navigation

data.world-nasa

Spark Job Jar

Terraform for AWS EMR

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages