This repository analyzes NYC Citi Bike trip data, focusing on trip duration, distance distribution, and visualization.
It uses PySpark for efficient data processing and Matplotlib/Folium for visualization.
Download Citi Bike trip data from:
🔗 Citi Bike System Data
Files to Download:
Place the extracted CSV files into the correct folder (202501-citibike-tripdata/
).
brew install openjdk@11
echo 'export PATH="/opt/homebrew/opt/openjdk@11/bin:$PATH"' >> ~/.zshrc
brew install apache-spark
python -m venv venv
source venv/bin/activate # Mac/Linux
pip install -r requirements.txt
jupyter notebook
- Open the Jupyter Notebook and execute the cells.
- The notebook will:
- Load and clean the data.
- Aggregate trip distances.
- Generate interactive maps and bar charts for visualization.
To export the executed notebook as an HTML report:
jupyter nbconvert --execute analysis.ipynb --to html
This will generate a analysis.html
file containing all visualizations and results.