This is the code for the paper "Vision Transformer for Contrastive Clustering".
The code was trained on Ubuntu 18.04, including:
- python==3.7
- pytorch==1.7.0
- torchvision==0.8.0
- CUDA==11.0
- timm==0.5.4
- scikit-learn==1.0.1
- opencv-python==4.5.1
- pyyaml==6.0
- numpy==1.21.2
-
[Optional but recommended] create a new conda environment
conda create -n VTCC python=3.7
And activate the environment
conda activate VTCC
-
Clone this repository:
git clone https://github.com/JackKoLing/VTCC.git
-
Install necessary packages (other common packages installed if need):
pip install torch==1.7.0 torchvision==0.8.0 opencv-python==4.5.1 timm==0.5.4 scikit-learn==1.0.1 numpy pyyaml
Eight datasets can be downloaded from the url provided by their corresponding papers or official websites.
Make sure to put the files in the following structure:
|-- datasets
| |-- RSOD
| |-- UC-Merced
| |-- ...
There is a configuration file "config/config.yaml", where one can edit both the training and test options.
After setting the configuration, to start training, simply run
python train.py
Once the training is completed, there will be a saved model in the "model_path" specified in the configuration file. To test the trained model, run
python cluster.py
If you find VTCC useful in your research, please consider citing:
@article{ling2022vision,
title={Vision Transformer for Contrastive Clustering},
author={Ling, Hua-Bao and Zhu, Bowen and Huang, Dong and Chen, Ding-Hua and Wang, Chang-Dong and Lai, Jian-Huang},
journal={arXiv preprint arXiv:2206.12925},
year={2022}
}
The code is developed based on the architecture of CC and MoCoV3. We sincerely thank the authors for the excellent works!