We use cucumber specifications to describe and execute modelling scenarios, and systematically produce corresponding causal graphs that can be used to test causal relationships.
This repository is currently in an experimental phase.
- Clone this repository
git clone https://github.com/CITCOM-project/causcumber.git
- Change to the folder containing the repository
cd causcumber
- Create a virtual environment e.g.:
- In
./causcumber
, runpython3 -m venv causcumber_venv
- To activate the virtual environment, run
source causcumber_venv/bin/activate
- In
- Install
GraphViz
- Install
causcumber
using the commandpip install -e .
Due to the current experimental nature of this work, contributions are currently limited to the core citcom team. Once key architectural decisions are finalised, we will open to a broader community. The current process for making changes to the code (e.g. adding new features or fixing bugs) in this repository are:
- Install as above
- Make a branch and check it out
- Make your changes
- Make a pull request against the
main
branch and request a review from one of the citcom team - On an approving review, merge your changes into
main
The scenarios
directory contains different example scenarios implemented in the Covasim model. For each scenario in the directory, a separate sub-directory should be created that contains the simulation and a cucumber specification. Within each scenario sub-directory, three directories should be created:
dags/
: this directory should contain any causal graphs as.dot
files. This is where CauseCumber will place causal graphs too.features/
: this directory should contain all of the elements for behave, including.feature
files, anenvironment.py
file, and a directorysteps/
containing python scripts to implement step definitions for each.feature
file.observational_data/
: this directory should contain any observational data that you wish to use instead of running the model. This is optional.
- Create a
.feature
file specifying desired causal properties as scenarios in Gherkin language. - Specify a
Background
scenario that lists the inputs and outputs of interest. - Transform each scenario into a causal question
- Infer a fully-connected causal DAG from the
Background
and prune manually. - Run the system to get data for each causal question or, alternatively, select previous execution data to achieve the same.
- Write step definitions (AKA hooks) into the data with Cucumber and use DoWhy to calculate causal estimates for each scenario and check that these match the specified behaviour in the
Then
clauses.
We work with CSV files produced by Covasim simulations. These have 164 columns, the headings of which is as follows:
t
(time step)date
- Cumulative (
cum_
) and new (new_
)infections
reinfections
infectious
symptomatic
severe
critical
recoveries
deaths
tests
diagnoses
known_deaths
quarantined
vaccinations
vaccinated
n_susceptible
n_exposed
n_infectious
n_symptomatic
n_severe
n_critical
n_recovered
n_dead
n_diagnosed
n_known_dead
n_quarantined
n_vaccinated
n_alive
n_naive
n_preinfectious
n_removed
prevalence
incidence
r_eff
doubling_time
test_yield
rel_test_yield
frac_vaccinated
pop_nabs
pop_protection
pop_symp_protection
Each row in the CSV represents a single time step (day) in the model. The outputs are stored in compare_interventions/results
, which is ignored by Git during the development process. We will make our results publicly available via ORDA when it is appropriate to do so.