Paper: Large scale traffic forecasting with gradient boosting
Please refer to competition homepage for downloading the data.
This repo should roughly replicate the solution of the Bolt team (2nd place in core competition). Code is (slightly) refactored compared to the messy notebooks that were used during competition.
Python version: any new-ish Python 3.8+ should work, we used 3.10.4
Change data_dir
in conf.py
to where all the competition data (train
, test
, road_graph
etc) is.
pip install -r requirements.txt
bash preprocess_all.sh melbourne
An alternative to running this and task specific preprocessing is unarchiving the traffic.zip
file and moving resulting traffic
directory
to data_dir
.
Preprocessing:
Run the notebook core_create_target_encodings.ipynb
Now you're ready to run the notebook core_final.ipynb
! Note that it raises an intentional error after training and
before creating submissions, but can be resumed manually after choosing the desired number of iterations to use.
To only generate a submission from a trained model, run:
python core_generate_submission.py -c melbourne -p model_path -m model_name
This will create a submission in data_dir
(more
specifically, data_dir / "submissions" / model_name / city_name / "labels" / "cc_labels_test.parquet"
)
Preprocessing:
Run the notebook extended_generate_supersegment_speed_feats.ipynb
Now run the notebook extended_final.ipynb
.
We store the trained model artifacts together with submission parquet files in S3:
- preprocessed
traffic
folder - core submissions
- core models
- extended submissions
- extended models