Collect model quantization related papers, data, repositories
[ICLR 2021]
Degree-Quant: Quantization-Aware Training for Graph Neural Networks,Shyam A. Tailor.Arxiv|Github
During the training process, some nodes in the graph neural network are randomly protected from quantization, with nodes having higher in-degrees being more likely to be safeguarded, as they are more significantly affected by the reduction in precision.
The paper proposes a novel method that introduces quantization noise (Quant-Noise) during the training process to train networks to adapt to extreme compression methods, such as Product Quantization, which typically result in severe approximation errors. This method quantizes only a random subset of weights during each forward pass, allowing other weights to pass gradients without bias. By controlling the amount and form of noise, extreme compression is achieved while maintaining the performance of the original model.
[ICLR 2022]
QDrop: Randomly Dropping Quantization for Extremely Low-bit Post-Training Quantization,Xiuying Wei.Arxiv|Github
This paper introduces a novel post-training quantization method called QDROP, aimed at improving the efficiency and accuracy of neural networks under extremely low-bit settings. QDROP achieves this goal by randomly dropping activation quantization during the post-training quantization (PTQ) process. Research shows that properly incorporating activation quantization into PTQ reconstruction can enhance the final model accuracy.
The paper treats the discrete weights in any quantized neural network as searchable variables and uses a differential method for precise search. Specifically, each weight is represented as a probability distribution over a set of discrete values, and these probabilities are optimized during training, with the value having the highest probability being selected to establish the desired quantized network.
[NuerIPS 2022]
Leveraging Inter-Layer Dependency for Post -Training Quantization,Changbao Wang.openview
To alleviate the overfitting issue, NWQ employs Activation Regularization (AR) technology to better control the distribution of activations. To optimize discrete variables, NWQ introduces Annealing Softmax (ASoftmax) and Annealing Mixup (AMixup), which gradually transition the quantized weights and activations from a continuous state to a discrete state.
[CVPR 2023]
Bit-Shrinking: Limiting Instantaneous Sharpness for Improving Post-Training Quantization,Chen Lin.CVF
To smoothen the rough loss surfaces, the paper proposes a method that limits the sharpness term in the loss to reflect the impact of quantization noise. Instead of directly optimizing the target bit-width network, an adaptive bit-width reduction scheduler is designed. This scheduler starts from a higher bit-width and continuously reduces it until it reaches the target bit-width. In this way, the increased sharpness term is kept within an appropriate range.
[CVPR 2024]
TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Models, Huang et al. Arxiv | Github
TIAR optimizes the quantization of the Temporal Information Block, minimizing loss of temporal features. FSC is a calibration strategy that uses different quantization parameters for activations at different time steps, adapting to their range variations.
[CVPR 2024]
Towards Accurate Post-training Quantization for Diffusion Models, Wang et al. Arxiv | Github
Distribution-aware Quantization adapts quantization to accommodate the significant variations in activation distributions. Differentiable Search uses a differentiable search algorithm to optimize the importance weights of quantization functions across timesteps. SRM Principle selects optimal timesteps for informative calibration image generation.
introduce NDTC , a novel calibration method that samples a set of time steps from a skewed normal distribution and generates calibration samples through the denoising process to enhance the diversity of time steps in the calibration set.
[NuerIPS 2023]
PTQD: Accurate Post-Training Quantization for Diffusion Models, He et al. Arxiv | Github
Correlation Disentanglement separates quantization noise into correlated and uncorrelated parts, allowing for targeted corrections to reduce mean deviation and variance mismatch. Quantization Noise Correction employs methods to correct both the correlated and uncorrelated parts of the quantization noise, improving the SNR and sample quality.
[NuerIPS 2023]
Q-DM: An efficient low-bit quantized diffusion model, Li et al. Nips
TaQ addresses the oscillation in activation distributions during training by smoothing fluctuations and introducing precise scaling factors . NeM tackles the accumulation of quantization errors during multi-step denoising by mimicing the noise estimation capabilities of full-precision models.
Shortcut-splitting quantization addresses abnormal activation and weight distributions in shortcut layers by performing split quantization on activations and weights before concatenation.
-
[CVPR 2020]
ZeroQ: A Novel Zero Shot Quantization Framework, Cai et al. Arxiv| Github -
[CVPR 2021]
Diversifying Sample Generation for Accurate Data-Free Quantization, Zhang et al.Arxiv -
[CVPR 2021]
Zero-shot Adversarial Quantization, Zhang et al. Arxiv| Github -
[CVPR 2022]
Data-Free Network Compression via Parametric Non-uniform Mixed Precision Quantization, Chikin et al. CVF -
[CVPR 2023]
Hard Sample Matters a Lot in Zero-Shot Quantization, Li et al. Arxiv| Github -
[CVPR 2023]
Adaptive Data-Free Quantization, Qian et al. Arxiv| Github -
[CVPR 2023]
GENIE: Show Me the Data for Quantization, Jeon et al. Arxiv| Github -
[ECCV 2022]
Patch Similarity Aware Data-Free Quantization for Vision Transformers, Li et al. Arxiv| Github -
[NuerIPS 2023]
REx: Data-Free Residual Quantization Error Expansion, Yvinec et al. Arxiv -
[NuerIPS 2023]
TexQ: Zero-shot Network Quantization with Texture Feature Distribution Calibration, Chen et al. OpenReview| Github -
[NuerIPS 2022]
ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers, Yao et al. Arxiv| Github -
[AAAI 2024]
Norm Tweaking: High-Performance Low-Bit Quantization of Large Language Models, Li et al. Arxiv| Github -
[IJCAI 2022]
MultiQuant: Training Once for Multi-bit Quantization of Neural Networks, Xu et al. IJCAI| Github -
[ICLR 2023]
PowerQuant:Automorphism Search For Non-Uniform Quantization, Yvinec et al. OpenReview