Code for the paper titled "Inferring Sensitive Attributes from Model Explanations" published in ACM CIKM 2022.
You need conda. Create a virtual environment and install requirements:
conda env create -f environment.yml
To activate:
conda activate attinf-explanations
To update the env:
conda env update --name attinf-explanations --file environment.yml
or
conda activate attinf-explanations
conda env update --file environment.yml
Link to datasets: https://drive.google.com/drive/folders/1bUH02Y9I6_NVrfo5_8PwWtdklk15rXPJ
python -m src.attribute_inference --dataset {LAW,MEPS,CENSUS,CREDIT,COMPAS} --explanations {IntegratedGradients,smoothgrad,DeepLift,GradientShap} --attfeature {both,expl}
attfeature evaluates the attacks on only explanations (expl) or both predictions and explanations (both)
python -m src.attribute_inference --dataset {LAW,MEPS,CENSUS,CREDIT,COMPAS} --explanations {IntegratedGradients,smoothgrad,DeepLift,GradientShap} --attfeature expl --with_sattr True
python -m src.infer_s_from_phis --dataset {LAW,MEPS,CENSUS,CREDIT,COMPAS} --explanations {IntegratedGradients,smoothgrad,DeepLift,GradientShap}
There was a bug in one of the parameters for generating explanations: "target" was initially set to 0 but it has to be set to the class for the input. This has been updated. The attack accuracies are different and results in some cases better than what was reported in the paper since the gradients are computed with respect to the correct class. The conclusions in the paper that model explanations leak sensitive attributes is still valid.