Skip to content

YoungHyun197/ptq4vm

Repository files navigation

[WACV 2025 (Oral)] PTQ4VM: Post-training Quantization for Visual Mamba

This is official code for the paper PTQ4VM.

PTQ4VM can be applied to various Visual Mamba backbones, converting the pretrained model to a quantized format in under 15 minutes without notable quality degradation.

Install

  1. Setting conda
conda create -n ptq4vm python=3.10 -y
conda activate ptq4vm
  1. Clone the PTQ4VM repository
git clone https://github.com/YoungHyun197/ptq4vm
cd ptq4vm
  1. Install the dependencies
pip install -r requirements.txt
pip install causal-conv1d==1.1.1
pip install mamba-ssm==1.2.0.post1
  1. Replace core implementation of Mamba
cp -rf mamba-1p1p1/mamba_ssm /opt/conda/lib/python3.10/site-packages
  1. Install the CUDA kernel
python ./cuda_measure/setup_vim_GEMM.py install

How to use PTQ4VM

Here we use Vision Mamba (Vim) model as an example. Before applying ptq4vm, prepare a pre-trained model. You can download the model from this url.

Generate activation smoothing scale

torchrun --nproc_per_node 1 generate_act_scale.py --resume [model-path] --model vim_tiny_patch16_224_bimambav2_final_pool_mean_abs_pos_embed_with_midclstok_div2 --data-path [imagenet path] --batch-size 256

Joint Learning of Smoothing Scale and Step size (JLSS)

torchrun --nproc_per_node 1 quant.py --eval --resume [model-path] --model vim_tiny_patch16_224_bimambav2_final_pool_mean_abs_pos_embed_with_midclstok_div2 --data-path [imagenet-path] --act_scales [smoothing-path] --batch-size 256 --qmode ptq4vm --train-batch 256 --n-lva 16 --n-lvw 16 --alpha 0.5 --epochs 100 --lr-a 5e-4 --lr-w 5e-4 --lr-s 1e-2

For experimental details and hyper-paramters, please refer to the paper and quant.py file

Speedup using CUDA kernel

  1. Check the layer-wise acceleration
python cuda_sandbox.py
  1. Check the end-to-end acceleration
torchrun --nproc_per_node 1 quant.py --eval --time_compare --resume [model-path] --model vim_tiny_patch16_224_bimambav2_final_pool_mean_abs_pos_embed_with_midclstok_div2 --data-path [imagenet-path] --act_scales [smoothing-path] --batch-size 256 --qmode ptq4vm --train-batch 256 --n-lva 16 --n-lvw 16 --alpha 0.5 --epochs 100 --lr-a 5e-4 --lr-w 5e-4 --lr-s 1e-2

Reference

Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

This example code is based on Vim.

Cite

If you find our code or PTQ4VM paper useful for your research, please consider citing:

@article{cho2024ptq4vm,
  title={PTQ4VM: Post-Training Quantization for Visual Mamba},
  author={Cho, Younghyun and Lee, Changhun and Kim, Seonggon and Park, Eunhyeok},
  journal={arXiv preprint arXiv:2412.20386},
  year={2024}
}

About

ptq4vm official repository

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •