Skip to content

This is the pytorch implement of the paper "RSRefSeg: Referring Remote Sensing Image Segmentation with Foundation Models"

License

Notifications You must be signed in to change notification settings

KyanChen/RSRefSeg

Repository files navigation

RSRefSeg: Referring Remote Sensing Image Segmentation with Foundation Models



Homepage      arXiv      PDF


GitHub stars license arXiv



English | 简体中文

Introduction

This repository contains the code implementation for the paper RSRefSeg: Referring Remote Sensing Image Segmentation with Foundation Models, developed based on the MMSegmentation project.

The current branch has been tested on Linux with PyTorch 2.x and CUDA 12.1, supports Python 3.10+, and can be compatible with most CUDA versions.

If you find this project helpful, please give us a star ⭐️. Your support is our greatest motivation.

Main Features
  • Consistent API and usage with MMSegmentation
  • Open-sourced different versions of the RSRefSeg model as mentioned in the paper
  • Supported training and testing with multiple datasets

Changelog

🌟 2025.01.12 Released the RSRefSeg project with APIs consistent with MMSegmentation.

Contents

Installation

Dependencies

  • Linux system, Windows is also supported
  • Python 3.10+, recommended to use 3.11
  • PyTorch 2.0 or higher, recommended to use 2.3
  • CUDA 11.7 or higher, recommended to use 12.1
  • MMCV 2.0 or higher, recommended to use 2.2

Environment Setup

We recommend setting up the environment using Miniconda. The following commands will create a virtual environment named rsrefseg and install PyTorch and MMCV. The default installation steps assume CUDA version 12.1. If your CUDA version is different, please adjust accordingly.

Note: If you are experienced with PyTorch and already have it installed, you can skip to the next section. Otherwise, follow the steps below for preparation.

Step 0: Install Miniconda.

Step 1: Create a virtual environment named rsrefseg and activate it.

conda create -n rsrefseg python=3.11 -y
conda activate rsrefseg

Step 2: Install PyTorch2.3.x.

Linux/Windows:

pip install torch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 --index-url https://download.pytorch.org/whl/cu121

or

conda install pytorch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 pytorch-cuda=12.1 -c pytorch -c nvidia

Step 3: Install MMCV2.1.x.

pip install -U openmim
mim install mmcv==2.2.0
# or
pip install mmcv==2.2.0 -f https://download.openmmlab.com/mmcv/dist/cu121/torch2.3/index.html

Step 4: Install other dependencies.

pip install modelindex ipdb ms-swift transformers peft modelscope accelerate qwen_vl_utils pycocotools ftfy prettytable -U

Step 5: [Optional] Install DeepSpeed.

If you want to use DeepSpeed to train models, you need to install DeepSpeed and uncomment the DeepSpeed training config in the Config file. The installation method for DeepSpeed can be found in the DeepSpeed official documentation.

pip install deepspeed

Note: DeepSpeed support on Windows is not yet complete, and we recommend using DeepSpeed on Linux systems. Windows systems can only use AMP training, and we recommend uncommenting the AMP training config in the Config file.

Install RSRefSeg

Download or clone the RSRefSeg repository.

git clone git@github.com:KyanChen/RSRefSeg.git
cd RSRefSeg

Dataset Preparation

Referring Remote Sensing Image Segmentation Datasets

We provide methods to prepare the Referring Remote Sensing Image Segmentation datasets used in the paper.

RRSIS-D Dataset

Organization Format

You can download datasets from other sources as well but need to organize them in the following format:

${DATASET_ROOT} # Dataset root directory, e.g., /home/username/data
├── rrsisd
│   ├── refs(unc).p
│   └── instances.json
├── images
    └── rrsisd
        ├── JPEGImages
        └── ann_split

Dataset Conversion

We provide a script to convert the dataset into the required format and generate JSONL files.

Note: We have already provided the converted JSONL files in the datainfo folder. You can directly use them. Additionally, we provide a Python script for dataset conversion.

Other Datasets

If you wish to use other datasets, refer to this Python script for dataset preparation.

Model Training

RSRefSeg Model

Config Files and Key Parameter Explanations

We provide configuration files for RSRefSeg models of different parameter sizes as mentioned in the paper, which you can find in the configs_RSRefSeg folder. Config files maintain consistency with the MMSegmentation API and usage. Below are some key parameter explanations. For more detailed descriptions of parameters, refer to the MMSegmentation documentation.

Parameter Explanations:

  • work_dir: Output path for model training, usually does not need modification.
  • data_root: Dataset root directory, modify to the absolute path of the dataset root directory.
  • batch_size: Batch size per GPU, needs adjustment depending on VRAM size.
  • max_epochs: Maximum number of training epochs, usually does not need modification.
  • val_interval: Interval for validation set, usually does not need modification.
  • vis_backends/WandbVisBackend: Configuration for network-side visualization tools, open the comment if needed, requires registering an account on wandb website to view visual results of the training process in a web browser.
  • resume: Whether to resume from checkpoint, usually does not need modification.
  • load_from: Pre-trained checkpoint path for the model, usually does not need modification.
  • init_from: Pre-trained checkpoint path for the model, usually keep as None unless resuming from a checkpoint, in which case modify it accordingly.
  • default_hooks/CheckpointHook: Configuration for checkpoint saving during model training, usually does not need modification.
  • model/lora_cfg: Configuration for efficient model tuning, usually does not need modification.
  • model/backbone: Visual backbone of SAM model, adjust according to actual needs, base corresponds to sam-vit-base, large corresponds to sam-vit-large, huge corresponds to sam-vit-huge.
  • model/clip_vision_encoder: Visual encoder of the CLIP model, usually does not need modification.
  • model/clip_text_encoder: Text encoder of the CLIP model, usually does not need modification.
  • model/sam_prompt_encoder: Prompt encoder of the SAM model, usually does not need modification.
  • model/sam_mask_decoder: Decoder of the SAM model, usually does not need modification.
  • model/decode_head: Pseudo decode head of the RSRefSeg model, usually does not need modification.
  • AMP training config: Configuration for mixed precision training. If not using DeepSpeed training, uncomment this section. Usually does not need modification.
  • DeepSpeed training config: Configuration for DeepSpeed training. If using DeepSpeed training, uncomment this section and comment out AMP training config. Note that DeepSpeed training is not supported on Windows.
  • dataset_type: Dataset type, usually does not need modification.
  • data_preprocessor/mean/std: Mean and standard deviation for data preprocessing, usually does not need modification.

Single-GPU Training

python tools/train.py configs_RSRefSeg/name_to_config.py  # Replace 'name_to_config.py' with the desired config file

Multi-GPU Training

sh tools/dist_train.sh configs_RSRefSeg/name_to_config.py ${GPU_NUM}  # Replace 'name_to_config.py' with the desired config file, GPU_NUM with the number of GPUs to use

Model Testing

Single-GPU Testing

python tools/test.py configs_RSRefSeg/name_to_config.py ${CHECKPOINT_FILE}  # Replace 'name_to_config.py' with the desired config file, 'CHECKPOINT_FILE' with the desired checkpoint file

Multi-GPU Testing

sh tools/dist_test.sh configs_RSRefSeg/name_to_config.py ${CHECKPOINT_FILE} ${GPU_NUM}  # Replace 'name_to_config.py' with the desired config file, 'CHECKPOINT_FILE' with the desired checkpoint file, and GPU_NUM with the number of GPUs to use

Image Prediction

Predict a Single Image

python demo/image_demo.py ${IMAGE_FILE}  configs_RSRefSeg/name_to_config.py --checkpoint ${CHECKPOINT_FILE} --show-dir ${OUTPUT_DIR}  # Replace 'IMAGE_FILE' with the image file to predict, 'name_to_config.py' with the desired config file, 'CHECKPOINT_FILE' with the desired checkpoint file, and 'OUTPUT_DIR' with the output directory for prediction results

Predict Multiple Images

python demo/image_demo.py ${IMAGE_DIR}  configs_RSRefSeg/name_to_config.py --checkpoint ${CHECKPOINT_FILE} --show-dir ${OUTPUT_DIR}  # Replace 'IMAGE_DIR' with the image directory to predict, 'name_to_config.py' with the desired config file, 'CHECKPOINT_FILE' with the desired checkpoint file, and 'OUTPUT_DIR' with the output directory for prediction results

FAQ

We have listed some common problems and their solutions for usage. If you find any issue missing here, feel free to open a PR to enrich this list. If you cannot find help here, please use issue to seek help. Please fill out all required information in the template, which helps us locate the problem faster.

1. Do I need to install MMSegmentation?

We recommend not installing MMSegmentation as we have made some modifications to the code of MMSegmentation, and installing it might cause errors. If you encounter a 'Module not registered' error, please check:

  • If the module is a package that needs to be installed, install it if necessary.
  • Whether MMSegmentation is installed; if so, uninstall it.
  • Whether @MODELS.register_module() is added before class names; add it if not.
  • Whether from .xxx import xxx is added in __init__.py; add it if not.
  • Whether custom_imports = dict(imports=['rsris'], allow_failed_imports=False) is added to the config file; add it if not.

2. Solution for 'Bad substitution' error in dist_train.sh

If you encounter a 'Bad substitution' error while running dist_train.sh, use bash dist_train.sh to execute the script.

Acknowledgments

This project was developed based on MMSegmentation, thanks to the developers of the MMSegmentation project.

Citation

If you use the code or benchmarks from this project in your research, please cite RSRefSeg using the following bibtex.

@article{chen2025rsrefseg,
  title={RSRefSeg: Referring Remote Sensing Image Segmentation with Foundation Models},
  author={Chen, Keyan and Zhang, Jiafan and Liu, Chenyang and Zou, Zhengxia and Shi, Zhenwei},
  journal={arXiv preprint arXiv:2501.06809},
  year={2025}
}

License

This project uses the Apache 2.0 open source license.

Contact

For further questions❓, feel free to contact us 👬

About

This is the pytorch implement of the paper "RSRefSeg: Referring Remote Sensing Image Segmentation with Foundation Models"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published