This repository contains the code and models necessary to replicate the results of our recent paper:
Provably robust classification of adversarial examples with detection, Fatemeh Sheikholeslami, Ali Lotfi Rezaabad, Zico Kolter (, ICLR 2021
In this work, we jointly train a classification-detection model, which enables additional robustness capability by adaptively flagging adversarial inputs in order to prevent misclassification.
In a system enhanced with such detection capability, the natural images must be classified as clean by the detector, and correctly classified in the output class.
An adversarial image on the other hand, can be (a) flagged as adversarial - hence rejected- , or (b) correctly classified if marked as clean by the detector.
By leveraging inteval bound propagation techniques, we train a provably robust augmented classifier with a dedicated detection class, and test on benchmark datasets, demonestrating the effectiveness of the enhanced capability in better performance trade-off between natural versus verified robust accuracy.
Our code is based on the open source code of Zhang et al. available at
This code has been tested with python 3.8.5 and PyTorch 1.6.0.
Trainnig parameters are setup through the JSON files which can be found under the config folder.
In order to train a joint classifier-detector for cifar dataset with epsilon=8/255, run the following:
python "training_params:method=robust_natural" "training_params:method_params:bound_type=interval" --config config/cifar_dm-large_8_255.json
The joint classifiecation-detection is only tested for Interval Bound Propagation (IBP) method, and other propagation techniques (including CROWN-IBP) are not tested. Thus, the code will not produce results for any bound_type parameter other than interl, that is:
Other config files to reproduce the results in the paper can be found in the config folder.
