Final Project for de-zoomcamp 2022 (1st cohort)
This project will aggregate historical Divvy Bike Data from the City of Chicago.
- GCP VM Instance (Processing)
- Terraform (Infrascructure as a Service)
- Airflow (Data Pipeline - ETL)
- GCP Storage Bucket (Data Lake)
- Big Query (Data Warehouse)
- DBT (Creating Analytical Views)
While this data is freely available from the City of Chicago it is divided by month and is in csv format.
- By combining this data there may be trends that can be identified which may otherwise be missed looking at a smaller subset of the data.
- Creating a resilient data pipeline to facilitate the importing and aggregation of the data this project should be of utility for someone who wishes to perform the same task while eliminating the need for repetitive data cleaning and importing.
The data to be used for this project can be found here - Divvy Bike Data
Below is a sample of the data to be used:
ride_id - Unique ID Assigned to Each Divvy Trip
rideable_type - Type of Vehicle Used
started_at - Start of Trip Date and Time
ended_at - End of Trip Date and Time
start_station_name - Name Assigned to Station the Trip Started at
start_station_id - Unique Identification Number of Station the Trip Started at
end_station_name - Name Assigned to Station the Trip Ended at
end_station_id - Unique Identification Number of Station the Trip Ended at
start_lat - Latitude of the Start Station
start_lng - Longitude of the Start Station
end_lat - Latitude of the End Station
end_lng - Longitude of the End Station
member_casual - Field with Two Values Indicating Whether the Rider has a Divvy Membership or Paid with Credit Card
Data Visualizations for this project ccan be found here. https://datastudio.google.com/reporting/ea3f603a-f8f5-4d0c-9664-7608835b8ddb
A video walkthrough of the finished project
Follow the instructions here