- This repository contains batch processing code for generating customer complaints report. Code is written in scala 2.12.10 and spark 3.0.1
Author of the code is Tushar Kesarwani tushar.kesarwani2@gmail.com
Develop locally. Tests are good to create mock data and run tests against it. Developing against Hadoop clusters is possible but generally not necessary and reduces the velocity.
- Use intelliJ run application to run the program locally
- Use the below CLI Input in intelliJ run configuration as it is :
ComplaintReport
local[1]
file:///absoluteFilePath/complaints.jsonl
file:///absoluteFilePath/category_names.json
file:///absoluteFilePath/service_names.json
file:///absoluteFilePath/output
mvn test
mvn clean compile install
Use the below unix script to execute code in cluster
sh deploy/run.sh ComplaintReport local[1] file:///absoluteFilePath/complaints.jsonl file:///absoluteFilePath/category_names.json file:///absoluteFilePath/service_names.json file:///absoluteFilePath/output