The repo. consists of basic scripts which convert csv files to parq and vice versa. I have used fastparquet python api for this . There is also a script which uses Spark to create the Parquet files and then query it using SparkSql. The roles folder contains tasks folder with a main.yml. This is a Ansible playbook whcih installs pandas, fastparquet on a spark cluster you have created. It downloads the adult.data from UCI website and then creates a parquet file adult.parquet and then uses SparkSql to query the parquet file.
-
Notifications
You must be signed in to change notification settings - Fork 0
SalilShenoy/Spark-App
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published