Deep Learning Project ETHZ HS23 from Jannek Ulm, Leander Diaz-Bone, Alexander Bayer, and Dennis Jüni. The final report can be found here.
We provide two different python environments for training the models and running the experiments. The "environment_cuda" is working for us on a Linux Machine with CUDA 12, the "environment_mps" works on new Apple Silicon Macbooks and uses the Apples MPS GPU acceleration. For specific device/cuda versions one might need to adapt the environment files. When running the training file all the pre-trained models required for the experiments are automatically saved in the models folder.
Most experiments require pretrained models on subsets of CIFAR-100. Due to the size of a pretrained model, they are not included in this repository, however they can easily be recalculated using this training file (for the original ResNet-18 model and also for the custom ResNet-18 model). All models were trained on 10 classes, which were chosen from 2 superclasses with indices between 0 and 9.
All experiments from the paper are shown in final_experiments.ipynb.
- Testing the accuracy a randomly initialized model on random tuples of superclasses with indices between 10 and 19.
- Testing the accuracy of a pretrained model (on some superclasses with indices between 0 and 9) and then re-trained on a random tuple of superclasses with indices between 10 and 19.
- Testing the initialization with Gabor filters with 1,2,6,10, and 17 layers being initialized.
- Testing the number of pretrained models used for clustering (with 10 clusters and 17 layers being initialized) with Euclidean and Fourier distance.
- Testing the number of clusters used for clustering (with 10 models and 17 layers being initialized) with Euclidean and Fourier distance.
- Testing the number of layers initialized for clustering (with 10 models and 10 cllusters) with Euclidean and Fourier distance.
- Testing a randomly initialized custom ResNet-18.
- Testing a custom ResNet-18 which was clustered and permuted according to the alignment algorithm.
- Testing a random ResNet-18 model on the Tiny ImageNet dataset.
- Testing a pretrained ResNet-18 (pretrained on a subset of 10 CIFAR-100 superclasse) on the Tiny ImageNet dataset.
- Testing a clustered ResNet-18 model on the Tiny ImageNet dataset.
All the validation and training accuracies during the run of these experiments were saved in tracked_params.
All plots from the paper were generated using parameters in tracked_params. The specific code used can be found in final_plotting.ipynb.
Model | Epoch 5 | Epoch 15 |
---|---|---|
Random initialization | 28.246 | 30.924 |
Pre-trained on CIFAR-100 | 28.338 | 30.268 |
Clustered initialization | 31.1 | 35.374 |