[WACV 2025 (Oral)] PTQ4VM: Post-training Quantization for Visual Mamba

This is official code for the paper PTQ4VM.

PTQ4VM can be applied to various Visual Mamba backbones, converting the pretrained model to a quantized format in under 15 minutes without notable quality degradation.

Install

Setting conda

conda create -n ptq4vm python=3.10 -y
conda activate ptq4vm

Clone the PTQ4VM repository

git clone https://github.com/YoungHyun197/ptq4vm
cd ptq4vm

Install the dependencies

pip install -r requirements.txt
pip install causal-conv1d==1.1.1
pip install mamba-ssm==1.2.0.post1

Replace core implementation of Mamba

cp -rf mamba-1p1p1/mamba_ssm /opt/conda/lib/python3.10/site-packages

Install the CUDA kernel

python ./cuda_measure/setup_vim_GEMM.py install

How to use PTQ4VM

Here we use Vision Mamba (Vim) model as an example. Before applying ptq4vm, prepare a pre-trained model. You can download the model from this url.

Generate activation smoothing scale

torchrun --nproc_per_node 1 generate_act_scale.py --resume [model-path] --model vim_tiny_patch16_224_bimambav2_final_pool_mean_abs_pos_embed_with_midclstok_div2 --data-path [imagenet path] --batch-size 256

Joint Learning of Smoothing Scale and Step size (JLSS)

torchrun --nproc_per_node 1 quant.py --eval --resume [model-path] --model vim_tiny_patch16_224_bimambav2_final_pool_mean_abs_pos_embed_with_midclstok_div2 --data-path [imagenet-path] --act_scales [smoothing-path] --batch-size 256 --qmode ptq4vm --train-batch 256 --n-lva 16 --n-lvw 16 --alpha 0.5 --epochs 100 --lr-a 5e-4 --lr-w 5e-4 --lr-s 1e-2

For experimental details and hyper-paramters, please refer to the paper and quant.py file

Speedup using CUDA kernel

Check the layer-wise acceleration

python cuda_sandbox.py

Check the end-to-end acceleration

torchrun --nproc_per_node 1 quant.py --eval --time_compare --resume [model-path] --model vim_tiny_patch16_224_bimambav2_final_pool_mean_abs_pos_embed_with_midclstok_div2 --data-path [imagenet-path] --act_scales [smoothing-path] --batch-size 256 --qmode ptq4vm --train-batch 256 --n-lva 16 --n-lvw 16 --alpha 0.5 --epochs 100 --lr-a 5e-4 --lr-w 5e-4 --lr-s 1e-2

Reference

Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

This example code is based on Vim.

Cite

If you find our code or PTQ4VM paper useful for your research, please consider citing:

@article{cho2024ptq4vm,
  title={PTQ4VM: Post-Training Quantization for Visual Mamba},
  author={Cho, Younghyun and Lee, Changhun and Kim, Seonggon and Park, Eunhyeok},
  journal={arXiv preprint arXiv:2412.20386},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
causal-conv1d		causal-conv1d
cuda_measure		cuda_measure
mamba-1p1p1		mamba-1p1p1
ptq4vm		ptq4vm
tools		tools
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
generate_act_scale.py		generate_act_scale.py
quant.py		quant.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[WACV 2025 (Oral)] PTQ4VM: Post-training Quantization for Visual Mamba

Install

How to use PTQ4VM

Generate activation smoothing scale

Joint Learning of Smoothing Scale and Step size (JLSS)

Speedup using CUDA kernel

Reference

Cite

About

Releases

Packages

Contributors 3

Languages

License

YoungHyun197/ptq4vm

Folders and files

Latest commit

History

Repository files navigation

[WACV 2025 (Oral)] PTQ4VM: Post-training Quantization for Visual Mamba

Install

How to use PTQ4VM

Generate activation smoothing scale

Joint Learning of Smoothing Scale and Step size (JLSS)

Speedup using CUDA kernel

Reference

Cite

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages