Zhen Zhu · Yiming Gong · Derek Hoiem
We propose an approach for anytime continual learning (AnytimeCL) for open vocabulary image classification. The AnytimeCL problem aims to break away from batch training and rigid models by requiring that a system can predict any set of labels at any time and efficiently update and improve when receiving one or more training samples at any time. Despite the challenging goal, we achieve substantial improvements over recent methods. We propose a dynamic weighting between predictions of a partially fine-tuned model and a fixed open vocabulary model that enables continual improvement when training samples are available for a subset of a task's labels. We also propose an attention-weighted PCA compression of training features that reduces storage and computation with little impact to model accuracy. Our methods are validated with experiments that test flexibility of learning and inference.
We test our code on a single NVIDIA RTX 3090Ti GPU.
- Anaconda or Miniconda
- Git
-
Clone the repository:
git clone https://github.com/jessemelpolio/AnytimeCL.git cd AnytimeCL
-
Create and activate the Conda environment:
conda env create -f environment.yml conda activate AnytimeCL
-
Clone the DINOv2 repository:
git clone https://github.com/facebookresearch/dinov2.git
data/
: Dataset handling and preprocessingencode_features/
: Scripts for encoding features using CLIP and DINOengines/
: Engine implementations for training and evaluationmodels/
: Model architectures and componentsoptions/
: Command-line argument parsingscripts/
: Utility scriptsmain.py
: Main entry point for running experiments
-
Prepare datasets: Our project uses various datasets for target tasks and zero-shot tasks.
Click to expand dataset details
Target Tasks: CIFAR100, SUN397, EuroSAT, OxfordIIITPet, Flowers102, FGVCAircraft, StanfordCars, Food101
Zero-shot Tasks: ImageNet, UCF101, DTD
Note: SUN397, EuroSAT, UCF101, and ImageNet require manual downloading from their original sources. Please follow the instructions in
tutorials/download_data.md
to obtain these datasets. Other datasets can be easily downloaded using thetorchvision.datasets
package. We also provide additional datasets in thedata/
folder for your convenience but be aware that they are not tested rigorously and may not work with the codebase.To encode the intermediate image representations of these datasets to speed up training, check the script in
scripts/encode_features.sh
. After setting the correct data root in the script, you can run the script with:bash scripts/encode_features.sh
-
Train: Example scripts for task, data, and class-incremental learning:
Click to expand example scripts
bash scripts/task_incremental.sh
bash scripts/data_incremental.sh
bash scripts/class_incremental.sh
-
(Optional) Compress: To compress the features, run the script in
scripts/compress_features.sh
.bash scripts/compress_features.sh
This codebase is only tested under a single GPU. If you want to use multiple GPUs, you need to modify the codebase.
We'd appreciate it if you could report any issues you encounter.
Our approach offers various customization options to create different experimental settings. Refer to tutorials/configuration_options.md
for more details.
If you use this code for your research, please consider citing:
@inproceedings{zhu2024anytimecl,
title={Anytime Continual Learning for Open Vocabulary Classification},
author={Zhu, Zhen and Gong, Yiming and Hoiem, Derek},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2024}
}