This folder contains the implementation of the ConvMAE finetuning for image classification.
Models | #Params(M) | Supervision | Encoder Ratio | Pretrain Epochs | FT acc@1(%) | FT logs/weights |
---|---|---|---|---|---|---|
ConvMAE-B | 88 | RGB | 25% | 1600 | 85.0 | log/weight |
- Clone this repo:
git clone https://github.com/Alpha-VL/ConvMAE
cd ConvMAE
- Create a conda environment and activate it:
conda create -n convmae python=3.7
conda activate convmae
- Install
Pytorch==1.8.0
andtorchvision==0.9.0
withCUDA==11.1
conda install pytorch==1.8.0 torchvision==0.9.0 cudatoolkit=11.1 -c pytorch -c conda-forge
- Install
timm==0.3.2
pip install timm==0.3.2
You can download the ImageNet-1K here and prepare the ImageNet-1K follow this format:
imagenet
├── train
│ ├── class1
│ │ ├── img1.jpeg
│ │ ├── img2.jpeg
│ │ └── ...
│ ├── class2
│ │ ├── img3.jpeg
│ │ └── ...
│ └── ...
└── val
├── class1
│ ├── img4.jpeg
│ ├── img5.jpeg
│ └── ...
├── class2
│ ├── img6.jpeg
│ └── ...
└── ...
Download the finetuned model from here.
Evaluate ConvViT-Base by running:
python -m torch.distributed.launch --nproc_per_node=8 main_finetune.py --batch_size 128 --model convvit_base_patch16 --resume ${FINETUNE_CHKPT} --dist_eval --data_path ${IMAGENET_DIR} --eval
This shoud give:
* Acc@1 84.982 Acc@5 97.152 loss 0.695
Accuracy of the network on the 50000 test images: 85.0%
Download the pretrained model from here.
To finetune with multi-node distributed training, run the following on 4 nodes with 8 GPUs each:
python submitit_finetune.py \
--job_dir ${JOB_DIR} \
--nodes 4 \
--batch_size 32 \
--model convvit_base_patch16 \
--finetune ${PRETRAIN_CHKPT} \
--epochs 100 \
--blr 5e-4 --layer_decay 0.65 \
--weight_decay 0.05 --drop_path 0.1 --reprob 0.25 --mixup 0.8 --cutmix 1.0 \
--dist_eval --data_path ${IMAGENET_DIR}
To finetune with single-node training, run the following on single node with 8 GPUs:
python -m torch.distributed.launch --nproc_per_node=8 main_finetune.py \
--batch_size 128 \
--model convvit_base_patch16 \
--finetune ${PRETRAIN_CHKPT} \
--epochs 100 \
--blr 5e-4 --layer_decay 0.65 \
--weight_decay 0.05 --drop_path 0.1 --mixup 0.8 --cutmix 1.0 --reprob 0.25 \
--dist_eval --data_path ${IMAGENET_DIR}
- There are chances that loss is nan during finetuning process, if so, just delete the line to use fp32 type to resume the finetuning from where it broke down.
- How to resume: just add
--resume
into above scripts as:
--resume ${CHKPT_RESUME}
- Also, we are still working to solve the possible gradient vanish caused by fp16 mixed-precision finetuning. Feeling free to contact us if you have any suggestions.
Download the pretrained model from here.
To finetune with multi-node distributed training, run the following on 4 nodes with 8 GPUs each:
python submitit_linprobe.py \
--job_dir ${JOB_DIR} \
--nodes 4 \
--batch_size 128 \
--model convvit_base_patch16 \
--global_pool \
--finetune ${PRETRAIN_CHKPT} \
--epochs 90 \
--blr 0.1 --weight_decay 0.0 \
--dist_eval --data_path ${IMAGENET_DIR}
To finetune with single-node training, run the following on single node with 8 GPUs:
python -m torch.distributed.launch --nproc_per_node=8 main_linprobe.py \
--batch_size 512 \
--model convvit_base_patch16 \
--finetune ${PRETRAIN_CHKPT} \
--epochs 90 \
--blr 0.1 --weight_decay 0.0 \
--dist_eval --data_path ${IMAGENET_DIR}